whisper.cpp/ggml
Justin Bradford ab7d305b75 kleidiai : fix MUL_MAT support for batched (3D) inputs (llama/20620)
* kleidiai : fix MUL_MAT support for batched (3D) inputs

The supports_op() check incorrectly rejected MUL_MAT operations with 3D
inputs (ne[2] > 1), but the actual compute_forward_qx() implementation
handles batched inputs correctly via a loop over ne12.

This caused models with Q4_0/Q8_0 weights to crash during graph scheduling
when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during
loading (tested with 2D inputs) but the runtime used 3D inputs.

Also relax the buffer check to allow supports_op() to be called during
weight loading when src[0]->buffer is NULL.

Fixes #20608

* Kleidiai support_ops should only return true for 3D inputs, not also 4D
2026-03-29 15:04:36 +03:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441) 2026-03-18 15:18:24 +02:00
src kleidiai : fix MUL_MAT support for batched (3D) inputs (llama/20620) 2026-03-29 15:04:36 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : bump version to 0.9.8 (ggml/1442) 2026-03-18 15:18:24 +02:00