whisper.cpp

History

Jeff Bolz fadb3233b6 vulkan: optimize flash attention split_k_reduce (llama/14554) * vulkan: allow FA split_k with smaller KV values * vulkan: spread split_k_reduce work across more threads k_num can get rather large. Use the whole workgroup to reduce the M/L values. Launch a thread for each element in the HSV dimension of the output. Helps a lot for large HSV (like deepseek).		2025-07-12 19:23:56 +03:00
..
cmake	ggml-cpu : rework weak alias on apple targets (llama/14146)	2025-06-18 12:40:34 +03:00
include	CUDA: add bilinear interpolation for upscale (llama/14563)	2025-07-12 19:23:56 +03:00
src	vulkan: optimize flash attention split_k_reduce (llama/14554)	2025-07-12 19:23:56 +03:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435)	2025-07-12 19:23:56 +03:00