whisper.cpp/ggml
Patrick Buckley 69f0d907ee ggml-cuda: native bf16 flash attention for vec kernel (llama/20525)
* ggml-cuda: native bf16 flash attention for vec and tile kernels

mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo

* ggml-cuda: address code owner review feedback

reverted tile kernel changes to avoid larger refactor

* fix ci failures on turing and hip

* fix bf16 vec kernel compile on hip v_dot2 platforms

* add comments

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-29 15:04:36 +03:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441) 2026-03-18 15:18:24 +02:00
src ggml-cuda: native bf16 flash attention for vec kernel (llama/20525) 2026-03-29 15:04:36 +03:00
.gitignore
CMakeLists.txt ggml : bump version to 0.9.8 (ggml/1442) 2026-03-18 15:18:24 +02:00