whisper.cpp

History

PMZFX 1ebf3cafa0 Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (llama/21527) Extend the existing reorder optimization to Q8_0. The reorder separates scale factors from weight data for coalesced memory access -- was implemented for Q4_0/Q4_K/Q6_K but Q8_0 was missing. On Arc Pro B70 (Xe2), Q8_0 tg goes from 4.88 to 15.24 t/s (3.1x) on Qwen3.5-27B. BW utilization: 21% -> 66%. The key fix beyond the kernels: Q8_0 was missing from the type check in ggml_backend_sycl_buffer_init_tensor() that allocates the extra struct carrying the reorder flag -- so the optimization was silently skipped. AI (Claude) was used to assist with root cause investigation and writing the kernel code. All code was human-reviewed and tested on real hardware. Fixes: #21517		2026-04-30 11:29:02 +03:00
..
cmake	cmake : remove unused file (ggml/1419)	2026-02-08 09:29:10 +02:00
include	ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273)	2026-04-30 11:29:01 +03:00
src	Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (llama/21527)	2026-04-30 11:29:02 +03:00
.gitignore	…
CMakeLists.txt	ggml : bump version to 0.9.11 (ggml/1456)	2026-04-30 11:29:00 +03:00