whisper.cpp/ggml
Jeff Bolz 162bbe8220 vulkan: KHR_coopmat flash attention (llama/13506)
This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.
2025-05-19 14:58:39 +03:00
..
cmake ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0) 2025-03-27 11:06:03 +02:00
include mnist: fix segmentation fault (ggml/1227) 2025-05-19 14:58:39 +03:00
src vulkan: KHR_coopmat flash attention (llama/13506) 2025-05-19 14:58:39 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt whisper: remove MSVC warnings pragmas (#3090) 2025-05-05 13:09:35 +02:00