whisper.cpp/ggml
Reese Levine bb5d7e2c31
ggml WebGPU: add support for quantization types (llama/15440)
* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Work on templating for different types in shaders

* Work on shader type generation

* Working q4_0 mul_mat and some templating for different types

* Add q4_0_f16 matmul and fix device init

* Add matmul support for basic quantization types

* Add q2_k and q3_k quantization

* Add rest of k-quants

* Get firt i-quant working

* Closer to supporting all i-quants

* Support rest of i-quants

* Cleanup code

* Fix python formatting

* debug

* Bugfix for memset

* Add padding to end of buffers on creation

* Simplify bit-shifting

* Update usage of StringView
2025-09-20 13:42:39 +03:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) 2025-08-18 20:30:45 +03:00
include ggml: add `conv3d` op (llama/15182) 2025-09-20 13:42:39 +03:00
src ggml WebGPU: add support for quantization types (llama/15440) 2025-09-20 13:42:39 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433) 2025-09-20 13:42:38 +03:00