whisper.cpp/ggml
Ruben Ortlam 0f99a47177
vulkan: Flash Attention DP4A shader for quantized KV cache (llama/20797)
* use integer dot product for quantized KV flash attention

* small improvements

* fix SHMEM_STAGING indexing

* add missing KV type quants

* fixes

* add supported quants to FA tests

* readd fast paths for <8bit quants

* fix mmq gate and shmem checks
2026-04-30 11:29:07 +03:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml: backend-agnostic tensor parallelism (experimental) (llama/19378) 2026-04-30 11:29:05 +03:00
src vulkan: Flash Attention DP4A shader for quantized KV cache (llama/20797) 2026-04-30 11:29:07 +03:00
.gitignore
CMakeLists.txt ggml: backend-agnostic tensor parallelism (experimental) (llama/19378) 2026-04-30 11:29:05 +03:00