whisper.cpp/ggml
Max Krasnyansky e6476d4c12 hexagon: further optimizations and refactoring for flash attention (llama/19583)
* ggml-hexagon: fa improvements

ggml-hexagon: optimize flash attention calculations with improved variable handling

ggml-hexagon: streamline flash attention operations by removing redundant checks for FP32

ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable handling for unused elements

ggml-hexagon: optimize flash attention by changing slope vector type to F16

* hexfa: fixed test-backend-ops failurs due to leftover element handling

* hexagon: refactor and optimize fa to use local context struct

* ggml-hexagon: optimize flash-attention using hvx_vec_expf

Use HVX for online softmax.

---------

Co-authored-by: chraac <chraac@gmail.com>
2026-02-15 21:44:37 +02:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml-virtgpu: make the code thread safe (llama/19204) 2026-02-08 09:29:10 +02:00
src hexagon: further optimizations and refactoring for flash attention (llama/19583) 2026-02-15 21:44:37 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt Bump cmake max version (needed for Windows on Snapdragon builds) (llama/19188) 2026-02-08 09:29:10 +02:00