whisper.cpp

History

amritahs-ibm fc6d343e76 llamafile : ppc64le MMA implementation for Q4_0. (llama/12489) This change upstreams llamafile's cpu matrix multiplication kernels for ppc64le ISA using MMA builtins. This patch handles matrix multiplication between quantised datatypes, block_q4_0 and block_q8_0. This change results in 5% - 50% improvement in total speed(ie all tokens/total time), across various batch sizes. The patch is tested with Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf models on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>		2025-03-27 11:06:03 +02:00
..
cmake	cmake: Comment out GGML_BIN_DIR for now (ggml/1139)	2025-03-27 11:06:03 +02:00
include	llama: Add support for RWKV v7 architecture (llama/12412)	2025-03-27 11:06:03 +02:00
src	llamafile : ppc64le MMA implementation for Q4_0. (llama/12489)	2025-03-27 11:06:03 +02:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	SYCL: using graphs is configurable by environment variable and compile option (llama/12371)	2025-03-27 11:06:03 +02:00