whisper.cpp/ggml
David Friehs 02a9f660b8 cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (llama/19624)
* cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization

- load all 8 int8 for a grid position in one load
- calculate signs via popcnt instead of fetching from ksigns table
- broadcast signs to drop individual shift/mask

* cuda: iq2xxs: simplify sum scaling

express `(sum * scale + sum / 2) / 4` as `(sum * (scale * 2 + 1)) / 8`
express `((aux32 >> 28) * 2 + 1)` as `(aux32 >> 27 | 1)`

saves 3 registers for mul_mat_vec_q (152 -> 149) according to nsight
AFAICT no overflow can occur here as iq2xxs values are far too small

* uint -> uint32_t

error: identifier "uint" is undefined
2026-02-27 20:57:58 +02:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml-virtgpu: make the code thread safe (llama/19204) 2026-02-08 09:29:10 +02:00
src cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (llama/19624) 2026-02-27 20:57:58 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : bump version to 0.9.7 (ggml/1425) 2026-02-27 20:57:58 +02:00