whisper.cpp/ggml
Jeff Bolz d792d2a2dc vulkan: Use unclamped loads for flash attention mask (llama/12720)
nem1 must be a multiple of GGML_KQ_MASK_PAD, and GGML_KQ_MASK_PAD is a multiple
of the number of rows in the matrix. The KV dim is a multiple of the number of
columns for the aligned shader.
2025-04-24 20:39:16 +03:00
..
cmake ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0) 2025-03-27 11:06:03 +02:00
include metal : improve FA + improve MoE (llama/12612) 2025-03-28 21:47:42 +02:00
src vulkan: Use unclamped loads for flash attention mask (llama/12720) 2025-04-24 20:39:16 +03:00
.gitignore
CMakeLists.txt ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0) 2025-03-27 11:06:03 +02:00