whisper.cpp/ggml
Akarshan Biswas 4598eb080b
sycl : add flash-attn support for head size 512 (llama/21654)
* sycl : add flash-attn support for head size 512

This patch extends the SYCL Flash Attention implementation to support head sizes (DKQ/DV) of 512.

Changes:
- Added DKQ/DV 512 cases to both tile and vector Flash Attention kernels.
- Updated kernel selection logic to allow vector kernels for head sizes up to 512 (previously 256).
- Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`.
- Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization.
- Added necessary template instances for the new 512 head size across various quantization types.

* remove defunct mxfp4 reorder from setting buffer type
2026-04-30 11:29:04 +03:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml : deprecate GGML_OP_ADD1 (llama/21363) 2026-04-30 11:29:02 +03:00
src sycl : add flash-attn support for head size 512 (llama/21654) 2026-04-30 11:29:04 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : bump version to 0.9.11 (ggml/1456) 2026-04-30 11:29:00 +03:00