whisper.cpp

History

Kartik Sirohi 991b5a8b4a ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (llama/22209) * ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 Optimize the inner loop of ggml_vec_dot_q4_1_q8_1_generic using WASM SIMD128 intrinsics, gated behind #ifdef __wasm_simd128__ so non-wasm builds are completely unaffected. Approach: - single wasm_v128_load covers all 32 packed 4-bit weights - nibbles unpacked via AND/SHR into two u8x16 registers - widened to i16 before multiply (WASM SIMD has no i8i8 instruction) - 4x wasm_i32x4_dot_i16x8 calls accumulate all 32 element pairs - horizontal reduce via 4x wasm_i32x4_extract_lane Benchmark (node v25, emcc -O3 -msimd128, 64 blocks x QK8_1=32, 200k iterations): \| impl \| ns/call \| speedup \| \|--------\|---------\|---------\| \| scalar \| 880.7 \| 1.00x \| \| simd \| 257.8 \| 3.42x \| Correctness verified against scalar reference across 10 random seeds with exact output match. ggml: move q4_1_q8_1 WASM SIMD implementation to wasm backend Relocate the SIMD128 implementation of ggml_vec_dot_q4_1_q8_1 to ggml/src/ggml-cpu/arch/wasm/quants.c to follow architecture-specific layout. Restore the generic implementation in ggml/src/ggml-cpu/quants.c. Move for loop in the else block. * ggml: use generic q4_1_q8_1 fallback in wasm backend		2026-06-08 14:36:36 +03:00
..
cmake	ggml : Parallelize quant LUT init (llama/23595)	2026-05-25 12:26:07 +03:00
include	TP: quantized KV cache support (llama/23792)	2026-06-08 14:36:36 +03:00
src	ggml: vectorize ggml_vec_dot_q4_1_q8_1 with WASM SIMD128 (llama/22209)	2026-06-08 14:36:36 +03:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	ggml : bump version to 0.13.1 (ggml/1523)	2026-05-29 09:47:30 +03:00