whisper.cpp/ggml
Gaurav Garg 7b19b94c5d
Write an optimized flash_attn_stream_k_fixup kernel (llama/21159)
* Write an optimized flash_attn_stream_k_fixup kernel

Write a specialized and more optimized kernel for cases where nblocks_stream_k is multiple of ntiles_dst.
Make nblocks_stream_k to multiple of ntiles_dst if nblocks_stream_k > 2 * ntiles_dst

* Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to make sure we have enough concurrency on GPUs

* Address review comments

* Address review comments

* Revert variable names to original
2026-04-30 11:29:01 +03:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include llama: fix llama-model-saver (llama/20503) 2026-03-29 15:04:36 +03:00
src Write an optimized flash_attn_stream_k_fixup kernel (llama/21159) 2026-04-30 11:29:01 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : bump version to 0.9.11 (ggml/1456) 2026-04-30 11:29:00 +03:00