whisper.cpp

History

Gaurav Garg 1a1900f90c Remove padding and multiple D2D copies for MTP (llama/24086) * Make ggml_gated_delta_net take only the initial recurrent state (D, 1, n_seqs) and passes the snapshot count K as an op parameter instead of inferring it from state->ne[1]. Remove the padding hack and copy all emitted snapshots into the recurrent cache with a single strided ggml_cpy * Make GDN changes in all backends. Address review comments. * Fix CI build errors		2026-06-15 10:33:53 +03:00
..
wgsl-shaders	Remove padding and multiple D2D copies for MTP (llama/24086)	2026-06-15 10:33:53 +03:00
CMakeLists.txt	ggml-webgpu: FlashAttention refactor + standardize quantization support (llama/23834)	2026-06-08 14:36:36 +03:00
ggml-webgpu-shader-lib.hpp	ggml-webgpu: Add clang-format job (llama/24308)	2026-06-15 10:33:53 +03:00
ggml-webgpu.cpp	Remove padding and multiple D2D copies for MTP (llama/24086)	2026-06-15 10:33:53 +03:00
pre_wgsl.hpp	ggml-webgpu: FlashAttention refactor + standardize quantization support (llama/23834)	2026-06-08 14:36:36 +03:00