whisper.cpp/ggml
PMZFX 1ebf3cafa0
Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (llama/21527)
Extend the existing reorder optimization to Q8_0. The reorder
separates scale factors from weight data for coalesced memory
access -- was implemented for Q4_0/Q4_K/Q6_K but Q8_0 was missing.

On Arc Pro B70 (Xe2), Q8_0 tg goes from 4.88 to 15.24 t/s (3.1x)
on Qwen3.5-27B. BW utilization: 21% -> 66%.

The key fix beyond the kernels: Q8_0 was missing from the type
check in ggml_backend_sycl_buffer_init_tensor() that allocates
the extra struct carrying the reorder flag -- so the optimization
was silently skipped.

AI (Claude) was used to assist with root cause investigation and
writing the kernel code. All code was human-reviewed and tested
on real hardware.

Fixes: #21517
2026-04-30 11:29:02 +03:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273) 2026-04-30 11:29:01 +03:00
src Add Q8_0 reorder optimization (~3x tg speedup on Intel Arc) (llama/21527) 2026-04-30 11:29:02 +03:00
.gitignore
CMakeLists.txt ggml : bump version to 0.9.11 (ggml/1456) 2026-04-30 11:29:00 +03:00