whisper.cpp

History

Jeff Bolz b243416918 vulkan: Implement split_k for coopmat2 flash attention. (llama/12627) When using group query attention, we have one workgroup per KV batch and this can be very few workgroups (e.g. just 8 in some models). Enable split_k to spread the work across SMs. This helps a lot when the KV cache is large.		2025-04-24 20:39:16 +03:00
..
cmake	ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)	2025-03-27 11:06:03 +02:00
include	metal : improve FA + improve MoE (llama/12612)	2025-03-28 21:47:42 +02:00
src	vulkan: Implement split_k for coopmat2 flash attention. (llama/12627)	2025-04-24 20:39:16 +03:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)	2025-03-27 11:06:03 +02:00