whisper.cpp

History

Max Krasnyansky e6476d4c12 hexagon: further optimizations and refactoring for flash attention (llama/19583) * ggml-hexagon: fa improvements ggml-hexagon: optimize flash attention calculations with improved variable handling ggml-hexagon: streamline flash attention operations by removing redundant checks for FP32 ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable handling for unused elements ggml-hexagon: optimize flash attention by changing slope vector type to F16 * hexfa: fixed test-backend-ops failurs due to leftover element handling * hexagon: refactor and optimize fa to use local context struct * ggml-hexagon: optimize flash-attention using hvx_vec_expf Use HVX for online softmax. --------- Co-authored-by: chraac <chraac@gmail.com>		2026-02-15 21:44:37 +02:00
..
cmake	cmake : remove unused file (ggml/1419)	2026-02-08 09:29:10 +02:00
include	ggml-virtgpu: make the code thread safe (llama/19204)	2026-02-08 09:29:10 +02:00
src	hexagon: further optimizations and refactoring for flash attention (llama/19583)	2026-02-15 21:44:37 +02:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	Bump cmake max version (needed for Windows on Snapdragon builds) (llama/19188)	2026-02-08 09:29:10 +02:00