whisper.cpp

History

iacopPBK d91d1e8e6c ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (llama/21168) * ds_read_b128 for q4_0 and q4_1 mmq kernels Current for loop generates ds_read_b32 instructions with hip compiler, the new solution generates ds_read_b128 instructions for the same operation, saving some LDS bandwidth. Tested on MI50 and RX6800XT, its faster on both. * Vectorized lds load update: used ggml_cuda_get_max_cpy_bytes and ggml_cuda_memcpy_1 functions for generic implementation * Explicit for loop in mmq, renamed vec into tmp * Fixed max_cpy usage in the loading loop * Fixed typo in q4_1 kernel * Update ggml/src/ggml-cuda/mmq.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/mmq.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/mmq.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Renoved trailing white line 500 * Update mmq.cuh removed other whitelines * Remove trailing whitespaces --------- Co-authored-by: iacopPBK <iacopPBK@users.noreply.github.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: iacopPBK <iacop@deneb.com>		2026-04-30 11:29:03 +03:00
..
cmake	cmake : remove unused file (ggml/1419)	2026-02-08 09:29:10 +02:00
include	ggml : deprecate GGML_OP_ADD1 (llama/21363)	2026-04-30 11:29:02 +03:00
src	ggml-cuda: ds_read_b128 for q4_0 and q4_1 mmq kernels (llama/21168)	2026-04-30 11:29:03 +03:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	ggml : bump version to 0.9.11 (ggml/1456)	2026-04-30 11:29:00 +03:00