whisper.cpp

History

Mason Milburn 4ecede8c8b sycl : port multi-column MMVQ from CUDA backend (llama/21845) mmvq: Port the ncols_dst optimization from ggml-cuda/mmvq.cu to SYCL. Read weights once per dispatch instead of once per column. Covers all standard quant types + reorder paths for Q4_0, Q8_0, Q3_K, Q4_K, Q5_K, Q6_K. IQ types (except IQ4_XS) excluded due to incompatible vec_dot signatures. ggml-sycl: The weight reorder was only bootstrapped on single-token mat-vec (ne[1] == 1). Speculative / MTP verify issues only multi-column mat-vec, so it never triggered the reorder and ran on the slower non-reorder kernel. Bootstrap it on small multi-column batches (ne[1] <= 8) too.		2026-06-08 14:36:36 +03:00
..
cmake	ggml : Parallelize quant LUT init (llama/23595)	2026-05-25 12:26:07 +03:00
include	TP: quantized KV cache support (llama/23792)	2026-06-08 14:36:36 +03:00
src	sycl : port multi-column MMVQ from CUDA backend (llama/21845)	2026-06-08 14:36:36 +03:00
.gitignore	…
CMakeLists.txt	ggml : bump version to 0.13.1 (ggml/1523)	2026-05-29 09:47:30 +03:00