whisper.cpp

History

Georgi Gerganov 158d93c836 metal : optimize concat kernel and fix set kernel threads (llama/23411) * metal : fix GGML_OP_SET kernel threads * tests : extend test_cpy to support different src/dst shapes Extend test_cpy to support different source and destination tensor shapes for CPY operations (reshaping), where the total number of elements must match. - Renamed ne -> ne_src, added ne_dst parameter (default: use src shape) - Added 50 new reshaping test cases covering 1D<->2D<->3D<->4D conversions - Tests exercise 1024 boundary, small shapes, and large dimensionality changes - Fixed dangling reference bug (storing & to temporary std::array) - Updated all existing test calls with permute/transpose args for compatibility Assisted-by: llama.cpp:local pi * metal : optimize concat kernel with row batching for small widths When ne0 < 256, batch multiple rows into a single threadgroup to improve occupancy. This avoids underutilizing the GPU when processing narrow tensors. - Dispatch nth = min(256, ne0) threads per group - Calculate nrptg (rows per threadgroup) to fill up to 256 threads - Update kernel index calculation to handle the row batching - Add boundary check for i1 >= ne1 Assisted-by: llama.cpp:local pi * tests : clean-up * tests : refactor CPY shape tests to use dimension permutations Replace 75 hardcoded test cases with a loop over permutations of {3, 5, 7, 32} (total elements: 3360). Each src permutation is tested against canonical sorted and reverse dst, skipping identical shapes. Covers F32, F16, and Q4_0 (when both src and dst ne0 == 32). Assisted-by: llama.cpp:local pi		2026-05-25 12:26:07 +03:00
..
cmake	cmake : add FindNCCL.cmake (ggml/0)	2026-05-02 15:02:42 +03:00
include	ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500)	2026-05-25 12:26:07 +03:00
src	metal : optimize concat kernel and fix set kernel threads (llama/23411)	2026-05-25 12:26:07 +03:00
.gitignore	…
CMakeLists.txt	ggml : bump version to 0.12.0 (ggml/1494)	2026-05-25 12:26:07 +03:00