whisper.cpp

History

Ruben Ortlam 0f99a47177 vulkan: Flash Attention DP4A shader for quantized KV cache (llama/20797) * use integer dot product for quantized KV flash attention * small improvements * fix SHMEM_STAGING indexing * add missing KV type quants * fixes * add supported quants to FA tests * readd fast paths for <8bit quants * fix mmq gate and shmem checks		2026-04-30 11:29:07 +03:00
..
cmake	cmake : remove unused file (ggml/1419)	2026-02-08 09:29:10 +02:00
include	ggml: backend-agnostic tensor parallelism (experimental) (llama/19378)	2026-04-30 11:29:05 +03:00
src	vulkan: Flash Attention DP4A shader for quantized KV cache (llama/20797)	2026-04-30 11:29:07 +03:00
.gitignore	…
CMakeLists.txt	ggml: backend-agnostic tensor parallelism (experimental) (llama/19378)	2026-04-30 11:29:05 +03:00