whisper.cpp/ggml/src/ggml-cpu
shalinib-ibm 42938398f9 ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148)
This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>
2025-05-07 15:39:32 +03:00
..
amx ggml : upgrade init_tensor API to return a ggml_status (llama/11854) 2025-03-08 15:13:01 +02:00
cmake ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
kleidiai ggml-cpu : update KleidiAI to v1.5.0 (llama/12568) 2025-03-27 11:06:03 +02:00
llamafile ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148) 2025-05-07 15:39:32 +03:00
CMakeLists.txt feat(ggml-cpu): enable z17 compile (llama/13182) 2025-05-01 13:29:02 +03:00
binary-ops.cpp cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
binary-ops.h cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
common.h cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
cpu-feats-x86.cpp ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (llama/12871) 2025-04-24 20:39:16 +03:00
ggml-cpu-aarch64.cpp whisper: remove MSVC warnings pragmas (#3090) 2025-05-05 13:09:35 +02:00
ggml-cpu-aarch64.h ggml : refactor online repacking (llama/10446) 2024-12-18 12:52:16 +02:00
ggml-cpu-hbm.cpp ggml : refactor online repacking (llama/10446) 2024-12-18 12:52:16 +02:00
ggml-cpu-hbm.h ggml : refactor online repacking (llama/10446) 2024-12-18 12:52:16 +02:00
ggml-cpu-impl.h ggml-cpu-impl.h: do not redefine bool on POWER9 (llama/12856) 2025-04-24 20:39:16 +03:00
ggml-cpu-quants.c whisper: remove MSVC warnings pragmas (#3090) 2025-05-05 13:09:35 +02:00
ggml-cpu-quants.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cpu-traits.cpp ggml : refactor online repacking (llama/10446) 2024-12-18 12:52:16 +02:00
ggml-cpu-traits.h ggml : refactor online repacking (llama/10446) 2024-12-18 12:52:16 +02:00
ggml-cpu.c whisper: remove MSVC warnings pragmas (#3090) 2025-05-05 13:09:35 +02:00
ggml-cpu.cpp cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190) 2025-04-24 20:39:16 +03:00
ops.cpp whisper: remove MSVC warnings pragmas (#3090) 2025-05-05 13:09:35 +02:00
ops.h ggml : Depthwise 2D convolution (ggml/1152) 2025-04-24 20:39:16 +03:00
simd-mappings.h ggml : fix ppc64le build (llama/13176) 2025-05-01 13:29:02 +03:00
unary-ops.cpp cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
unary-ops.h cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
vec.cpp whisper: remove MSVC warnings pragmas (#3090) 2025-05-05 13:09:35 +02:00
vec.h cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) 2025-04-03 10:30:16 +03:00