whisper.cpp

History

Akarshan Biswas 4598eb080b sycl : add flash-attn support for head size 512 (llama/21654) * sycl : add flash-attn support for head size 512 This patch extends the SYCL Flash Attention implementation to support head sizes (DKQ/DV) of 512. Changes: - Added DKQ/DV 512 cases to both tile and vector Flash Attention kernels. - Updated kernel selection logic to allow vector kernels for head sizes up to 512 (previously 256). - Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`. - Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization. - Added necessary template instances for the new 512 head size across various quantization types. * remove defunct mxfp4 reorder from setting buffer type		2026-04-30 11:29:04 +03:00
..
cmake	cmake : remove unused file (ggml/1419)	2026-02-08 09:29:10 +02:00
include	ggml : deprecate GGML_OP_ADD1 (llama/21363)	2026-04-30 11:29:02 +03:00
src	sycl : add flash-attn support for head size 512 (llama/21654)	2026-04-30 11:29:04 +03:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	ggml : bump version to 0.9.11 (ggml/1456)	2026-04-30 11:29:00 +03:00