whisper.cpp/ggml/src/ggml-webgpu
Gaurav Garg 1a1900f90c Remove padding and multiple D2D copies for MTP (llama/24086)
* Make ggml_gated_delta_net take only the initial recurrent state (D, 1, n_seqs) and passes the snapshot count K as an op parameter instead of inferring it from state->ne[1].

Remove the padding hack and copy all emitted snapshots into the recurrent cache with a single strided ggml_cpy

* Make GDN changes in all backends. Address review comments.

* Fix CI build errors
2026-06-15 10:33:53 +03:00
..
wgsl-shaders Remove padding and multiple D2D copies for MTP (llama/24086) 2026-06-15 10:33:53 +03:00
CMakeLists.txt ggml-webgpu: FlashAttention refactor + standardize quantization support (llama/23834) 2026-06-08 14:36:36 +03:00
ggml-webgpu-shader-lib.hpp ggml-webgpu: Add clang-format job (llama/24308) 2026-06-15 10:33:53 +03:00
ggml-webgpu.cpp Remove padding and multiple D2D copies for MTP (llama/24086) 2026-06-15 10:33:53 +03:00
pre_wgsl.hpp ggml-webgpu: FlashAttention refactor + standardize quantization support (llama/23834) 2026-06-08 14:36:36 +03:00