whisper.cpp/ggml
shalinib-ibm 587f602ea1 ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (llama/24753)
* ggml-cpu: support K tails in Power10 MMA Q8/Q4 matmul

This patch removes the requirement that K be divisible by kc in the tinyBlas_Q0_PPC tiled matmul path. Process the final K panel using its actual depth and pass the reduced panel size through packing and kernel execution.  This allows more workloads to use the MMA kernel and reduces fallback to mnpack.

* Apply suggestion from @taronaeo

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

---------

Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-06-19 12:53:43 +03:00
..
cmake ggml : Parallelize quant LUT init (llama/23595) 2026-05-25 12:26:07 +03:00
include Remove padding and multiple D2D copies for MTP (llama/24086) 2026-06-15 10:33:53 +03:00
src ggml-cpu: support K tails in power10 Q8/Q4 MMA matmul (llama/24753) 2026-06-19 12:53:43 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt rename GGML_SYCL_SUPPORT_LEVEL_ZERO (llama/24719) 2026-06-19 12:53:43 +03:00