whisper.cpp/ggml
Jeff Bolz fadb3233b6 vulkan: optimize flash attention split_k_reduce (llama/14554)
* vulkan: allow FA split_k with smaller KV values

* vulkan: spread split_k_reduce work across more threads

k_num can get rather large. Use the whole workgroup to reduce the M/L values.

Launch a thread for each element in the HSV dimension of the output. Helps a
lot for large HSV (like deepseek).
2025-07-12 19:23:56 +03:00
..
cmake ggml-cpu : rework weak alias on apple targets (llama/14146) 2025-06-18 12:40:34 +03:00
include CUDA: add bilinear interpolation for upscale (llama/14563) 2025-07-12 19:23:56 +03:00
src vulkan: optimize flash attention split_k_reduce (llama/14554) 2025-07-12 19:23:56 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (llama/14435) 2025-07-12 19:23:56 +03:00