whisper.cpp/ggml/src/ggml-webgpu
Reese Levine fddedc5cbc ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173)
* K quant speedup (llama/20)

* Basic JIT compilation for mul_mat, get_rows, and scale (llama/17)

* scale jit working

* preliminary working jit for getrows and mulmat, needs refining

* simplified mul_mat preprocessing switch statement

* get_rows fixes, mul_mat refinement

* formatted + last edits

* removed some extraneous prints

* fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish

* small fix

* some changes, working

* get_rows and mul_mat jit fixed and working

* Update formatting

* formatting

* Add header

---------

Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

* Start work on all-encompassing shader library

* refactor argmax, set_rows

* Refactor all but flashattention, mat mul

* no gibberish, all k quants added, merged

* vec memory fix

* q6_k matching metal on my machine, tests passing

* Set tile size for q6_k separately

* Separate out fast shaders

---------

Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>

* Move towards writeBuffer for params

* Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups

* Remove extra file

* Formatting

---------

Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>
2026-03-16 13:10:15 +02:00
..
wgsl-shaders ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173) 2026-03-16 13:10:15 +02:00
CMakeLists.txt ggml webgpu: add support for emscripten builds (llama/17184) 2025-12-12 17:53:16 +02:00
ggml-webgpu-shader-lib.hpp ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173) 2026-03-16 13:10:15 +02:00
ggml-webgpu.cpp ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173) 2026-03-16 13:10:15 +02:00
pre_wgsl.hpp ggml webgpu: initial flashattention implementation (llama/18610) 2026-01-14 09:11:59 +02:00