whisper.cpp

History

Reese Levine fddedc5cbc ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173) * K quant speedup (llama/20) * Basic JIT compilation for mul_mat, get_rows, and scale (llama/17) * scale jit working * preliminary working jit for getrows and mulmat, needs refining * simplified mul_mat preprocessing switch statement * get_rows fixes, mul_mat refinement * formatted + last edits * removed some extraneous prints * fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish * small fix * some changes, working * get_rows and mul_mat jit fixed and working * Update formatting * formatting * Add header --------- Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Start work on all-encompassing shader library * refactor argmax, set_rows * Refactor all but flashattention, mat mul * no gibberish, all k quants added, merged * vec memory fix * q6_k matching metal on my machine, tests passing * Set tile size for q6_k separately * Separate out fast shaders --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com> * Move towards writeBuffer for params * Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups * Remove extra file * Formatting --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>		2026-03-16 13:10:15 +02:00
..
wgsl-shaders	ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173)	2026-03-16 13:10:15 +02:00
CMakeLists.txt	ggml webgpu: add support for emscripten builds (llama/17184)	2025-12-12 17:53:16 +02:00
ggml-webgpu-shader-lib.hpp	ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173)	2026-03-16 13:10:15 +02:00
ggml-webgpu.cpp	ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (llama/20173)	2026-03-16 13:10:15 +02:00
pre_wgsl.hpp	ggml webgpu: initial flashattention implementation (llama/18610)	2026-01-14 09:11:59 +02:00