whisper.cpp

History

Gaurav Garg 7b19b94c5d Write an optimized flash_attn_stream_k_fixup kernel (llama/21159) * Write an optimized flash_attn_stream_k_fixup kernel Write a specialized and more optimized kernel for cases where nblocks_stream_k is multiple of ntiles_dst. Make nblocks_stream_k to multiple of ntiles_dst if nblocks_stream_k > 2 * ntiles_dst * Use the new kernel only for nblocks_stream_k_raw > 4 * ntiles_dst to make sure we have enough concurrency on GPUs * Address review comments * Address review comments * Revert variable names to original		2026-04-30 11:29:01 +03:00
..
cmake	cmake : remove unused file (ggml/1419)	2026-02-08 09:29:10 +02:00
include	llama: fix llama-model-saver (llama/20503)	2026-03-29 15:04:36 +03:00
src	Write an optimized flash_attn_stream_k_fixup kernel (llama/21159)	2026-04-30 11:29:01 +03:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	ggml : bump version to 0.9.11 (ggml/1456)	2026-04-30 11:29:00 +03:00