whisper.cpp

History

shalinib-ibm 42938398f9 ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148) This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type. This change results in 9x - 40x gains in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark. The patch is tested with Meta-Lllama-3-8B, and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine. Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>		2025-05-07 15:39:32 +03:00
..
amx	ggml : upgrade init_tensor API to return a ggml_status (llama/11854)	2025-03-08 15:13:01 +02:00
cmake	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
kleidiai	ggml-cpu : update KleidiAI to v1.5.0 (llama/12568)	2025-03-27 11:06:03 +02:00
llamafile	ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148)	2025-05-07 15:39:32 +03:00
CMakeLists.txt	feat(ggml-cpu): enable z17 compile (llama/13182)	2025-05-01 13:29:02 +03:00
binary-ops.cpp	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-31 14:56:53 +03:00
binary-ops.h	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-31 14:56:53 +03:00
common.h	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-31 14:56:53 +03:00
cpu-feats-x86.cpp	ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (llama/12871)	2025-04-24 20:39:16 +03:00
ggml-cpu-aarch64.cpp	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00
ggml-cpu-aarch64.h	ggml : refactor online repacking (llama/10446)	2024-12-18 12:52:16 +02:00
ggml-cpu-hbm.cpp	ggml : refactor online repacking (llama/10446)	2024-12-18 12:52:16 +02:00
ggml-cpu-hbm.h	ggml : refactor online repacking (llama/10446)	2024-12-18 12:52:16 +02:00
ggml-cpu-impl.h	ggml-cpu-impl.h: do not redefine bool on POWER9 (llama/12856)	2025-04-24 20:39:16 +03:00
ggml-cpu-quants.c	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00
ggml-cpu-quants.h	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
ggml-cpu-traits.cpp	ggml : refactor online repacking (llama/10446)	2024-12-18 12:52:16 +02:00
ggml-cpu-traits.h	ggml : refactor online repacking (llama/10446)	2024-12-18 12:52:16 +02:00
ggml-cpu.c	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00
ggml-cpu.cpp	cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190)	2025-04-24 20:39:16 +03:00
ops.cpp	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00
ops.h	ggml : Depthwise 2D convolution (ggml/1152)	2025-04-24 20:39:16 +03:00
simd-mappings.h	ggml : fix ppc64le build (llama/13176)	2025-05-01 13:29:02 +03:00
unary-ops.cpp	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-31 14:56:53 +03:00
unary-ops.h	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-31 14:56:53 +03:00
vec.cpp	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00
vec.h	cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167)	2025-04-03 10:30:16 +03:00