whisper.cpp

History

David Friehs 02a9f660b8 cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (llama/19624) * cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization - load all 8 int8 for a grid position in one load - calculate signs via popcnt instead of fetching from ksigns table - broadcast signs to drop individual shift/mask * cuda: iq2xxs: simplify sum scaling express `(sum * scale + sum / 2) / 4` as `(sum * (scale * 2 + 1)) / 8` express `((aux32 >> 28) * 2 + 1)` as `(aux32 >> 27 \| 1)` saves 3 registers for mul_mat_vec_q (152 -> 149) according to nsight AFAICT no overflow can occur here as iq2xxs values are far too small * uint -> uint32_t error: identifier "uint" is undefined		2026-02-27 20:57:58 +02:00
..
cmake	cmake : remove unused file (ggml/1419)	2026-02-08 09:29:10 +02:00
include	ggml-virtgpu: make the code thread safe (llama/19204)	2026-02-08 09:29:10 +02:00
src	cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (llama/19624)	2026-02-27 20:57:58 +02:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	ggml : bump version to 0.9.7 (ggml/1425)	2026-02-27 20:57:58 +02:00