whisper.cpp

History

shalinib-ibm 0630539c8a ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148) This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type. This change results in 9x - 40x gains in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark. The patch is tested with Meta-Lllama-3-8B, and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine. Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>		2025-05-07 13:17:41 +03:00
..
cmake	ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0)	2025-03-27 11:06:03 +02:00
include	CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (llama/13137)	2025-05-01 13:29:02 +03:00
src	ggml : Enable MMA for BF16 in llamafile_sgemm (llama/13148)	2025-05-07 13:17:41 +03:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00