whisper.cpp

History

Johannes Gäßler e62d5893f4 HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (llama/22880) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension. To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128.		2026-05-25 12:26:07 +03:00
..
ggml-blas	vulkan: add get/set tensor 2d functions (llama/22514)	2026-05-01 13:07:35 +03:00
ggml-cann	vulkan: add get/set tensor 2d functions (llama/22514)	2026-05-01 13:07:35 +03:00
ggml-cpu	ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (llama/22863)	2026-05-25 12:26:07 +03:00
ggml-cuda	HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (llama/22880)	2026-05-25 12:26:07 +03:00
ggml-hexagon	hexagon: add unary tanh op (llama/22999)	2026-05-14 21:26:48 +03:00
ggml-hip	ggml: backend-agnostic tensor parallelism (experimental) (llama/19378)	2026-04-30 11:29:05 +03:00
ggml-metal	logs : reduce (llama/23021)	2026-05-25 12:26:07 +03:00
ggml-musa	ggml-cuda: native bf16 flash attention for vec kernel (llama/20525)	2026-03-29 15:04:36 +03:00
ggml-opencl	opencl: add q5_0 and q5_1 MoE for Adreno (llama/22985)	2026-05-14 21:26:48 +03:00
ggml-openvino	openvino: driver setup, CI split, thread safety, and NPU optimizations (llama/21944)	2026-04-30 11:29:15 +03:00
ggml-rpc	rpc : use graph uid instead of graph cache (llama/22701)	2026-05-14 21:26:48 +03:00
ggml-sycl	SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations (llama/21597)	2026-05-25 12:26:07 +03:00
ggml-virtgpu	ggml-virtgpu : include missing mutex header (llama/22810)	2026-05-14 21:26:48 +03:00
ggml-vulkan	vulkan: fix matmul integer pipeline selection (llama/23005)	2026-05-25 12:26:07 +03:00
ggml-webgpu	ggml-webgpu: makes the flash attn vec path subgroup-aware (llama/23040)	2026-05-25 12:26:07 +03:00
ggml-zdnn	vulkan: add get/set tensor 2d functions (llama/22514)	2026-05-01 13:07:35 +03:00
ggml-zendnn	ggml-zendnn : adaptive fallback to CPU backend for small batch sizes (llama/22681)	2026-05-14 21:26:48 +03:00
CMakeLists.txt	ggml : revert to -lm linking instead of find_library (llama/22355)	2026-04-30 11:29:21 +03:00
ggml-alloc.c	ggml : remove ggml-ext.h (llama/21869)	2026-04-30 11:29:09 +03:00
ggml-backend-dl.cpp	hexagon: enable offloading to Hexagon on Windows on Snapdragon (llama/19150)	2026-01-30 15:56:40 +02:00
ggml-backend-dl.h	hexagon: enable offloading to Hexagon on Windows on Snapdragon (llama/19150)	2026-01-30 15:56:40 +02:00
ggml-backend-impl.h	ggml: backend-agnostic tensor parallelism (experimental) (llama/19378)	2026-04-30 11:29:05 +03:00
ggml-backend-meta.cpp	vulkan: add get/set tensor 2d functions (llama/22514)	2026-05-01 13:07:35 +03:00
ggml-backend-reg.cpp	ggml : skip already registered backends and devices (llama/22296)	2026-04-30 11:29:21 +03:00
ggml-backend.cpp	ggml: update SCHED_DEBUG output to use ggml_op_desc() (llama/22825)	2026-05-14 21:26:48 +03:00
ggml-common.h	ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273)	2026-04-30 11:29:01 +03:00
ggml-impl.h	ggml: add graph_reused (llama/21764)	2026-04-30 11:29:11 +03:00
ggml-opt.cpp	fix: free ctx_copy in ggml_opt_free to plug per-training-session leak (llama/21592)	2026-04-30 11:29:03 +03:00
ggml-quants.c	ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273)	2026-04-30 11:29:01 +03:00
ggml-quants.h	ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273)	2026-04-30 11:29:01 +03:00
ggml-threading.cpp	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
ggml-threading.h	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (llama/10797)	2024-12-18 12:52:16 +02:00
ggml.c	ggml : implement fast walsh-hadamard transform for kv rotation (#21352 ) (llama/22631)	2026-05-14 21:26:48 +03:00
ggml.cpp	ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)	2025-05-29 09:56:26 +03:00
gguf.cpp	llama: fix llama-model-saver (llama/20503)	2026-03-29 15:04:36 +03:00