whisper.cpp/ggml/src/ggml-cpu
Adrien Gallouët f5c3ce17d5
ggml : use 64 bytes aligned tile buffers (llama/21058)
| Model                            | Test   |   t/s OLD |   t/s NEW |   Speedup |
|:---------------------------------|:-------|----------:|----------:|----------:|
| qwen35 0.8B BF16                 | pp512  |    584.59 |    595.41 |      1.02 |
| qwen35 0.8B BF16                 | tg128  |     52.23 |     52.82 |      1.01 |
| qwen35 0.8B IQ2_M - 2.7 bpw      | pp512  |    260.64 |    261.70 |      1.00 |
| qwen35 0.8B IQ2_M - 2.7 bpw      | tg128  |     81.17 |     80.89 |      1.00 |
| qwen35 0.8B IQ2_XXS - 2.0625 bpw | pp512  |    302.36 |    302.56 |      1.00 |
| qwen35 0.8B IQ2_XXS - 2.0625 bpw | tg128  |     84.93 |     85.12 |      1.00 |
| qwen35 0.8B IQ3_XXS - 3.0625 bpw | pp512  |    263.22 |    260.01 |      0.99 |
| qwen35 0.8B IQ3_XXS - 3.0625 bpw | tg128  |     80.29 |     78.94 |      0.98 |
| qwen35 0.8B IQ4_NL - 4.5 bpw     | pp512  |    728.65 |    742.09 |      1.02 |
| qwen35 0.8B IQ4_NL - 4.5 bpw     | tg128  |     82.39 |     84.46 |      1.03 |
| qwen35 0.8B IQ4_XS - 4.25 bpw    | pp512  |    681.33 |    677.06 |      0.99 |
| qwen35 0.8B IQ4_XS - 4.25 bpw    | tg128  |     80.18 |     79.28 |      0.99 |
| qwen35 0.8B Q2_K_M               | pp512  |    413.28 |    415.94 |      1.01 |
| qwen35 0.8B Q2_K_M               | tg128  |     81.90 |     82.78 |      1.01 |
| qwen35 0.8B Q3_K_M               | pp512  |    493.17 |    495.08 |      1.00 |
| qwen35 0.8B Q3_K_M               | tg128  |     82.75 |     83.23 |      1.01 |
| qwen35 0.8B Q3_K_S               | pp512  |    429.35 |    427.64 |      1.00 |
| qwen35 0.8B Q3_K_S               | tg128  |     86.69 |     87.02 |      1.00 |
| qwen35 0.8B Q4_0                 | pp512  |    783.46 |    782.32 |      1.00 |
| qwen35 0.8B Q4_0                 | tg128  |     88.23 |     87.90 |      1.00 |
| qwen35 0.8B Q4_1                 | pp512  |    741.71 |    729.76 |      0.98 |
| qwen35 0.8B Q4_1                 | tg128  |     85.44 |     86.01 |      1.01 |
| qwen35 0.8B Q4_K_M               | pp512  |    676.24 |    681.31 |      1.01 |
| qwen35 0.8B Q4_K_M               | tg128  |     76.59 |     77.06 |      1.01 |
| qwen35 0.8B Q4_K_S               | pp512  |    683.12 |    688.81 |      1.01 |
| qwen35 0.8B Q4_K_S               | tg128  |     80.50 |     81.19 |      1.01 |
| qwen35 0.8B Q5_K_M               | pp512  |    635.33 |    642.11 |      1.01 |
| qwen35 0.8B Q5_K_M               | tg128  |     72.07 |     72.49 |      1.01 |
| qwen35 0.8B Q5_K_S               | pp512  |    660.95 |    658.18 |      1.00 |
| qwen35 0.8B Q5_K_S               | tg128  |     72.19 |     72.95 |      1.01 |
| qwen35 0.8B Q6_K                 | pp512  |    647.97 |    638.84 |      0.99 |
| qwen35 0.8B Q6_K                 | tg128  |     72.83 |     72.49 |      1.00 |
| qwen35 0.8B Q8_0                 | pp512  |    805.01 |    785.49 |      0.98 |
| qwen35 0.8B Q8_0                 | tg128  |     70.10 |     70.13 |      1.00 |

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-04-30 11:29:20 +03:00
..
amx ggml : use 64 bytes aligned tile buffers (llama/21058) 2026-04-30 11:29:20 +03:00
arch ggml-cpu: optimize avx2 q6_k (llama/22345) 2026-04-30 11:29:20 +03:00
cmake ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
kleidiai kleidiai : fix MUL_MAT support for batched (3D) inputs (llama/20620) 2026-03-29 15:04:36 +03:00
llamafile ggml-cpu: fix fallback for RVV kernels without zvfh (llama/21157) 2026-04-30 11:28:58 +03:00
spacemit ggml : fix SpaceMit IME array out-of-bounds in task assignment (llama/16629) 2025-10-22 12:58:11 +03:00
CMakeLists.txt fix(ggml): correct RISC-V ISA string canonical ordering for RVV in CMake (llama/20888) 2026-03-29 15:04:36 +03:00
arch-fallback.h ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (llama/21636) 2026-04-30 11:29:14 +03:00
binary-ops.cpp ggml : extend bin bcast for permuted src1 (llama/19484) 2026-02-15 21:44:37 +02:00
binary-ops.h cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
common.h ggml-cpu: FA add GEMM microkernel (llama/19422) 2026-02-27 20:57:58 +02:00
ggml-cpu-impl.h ggml : fix ARM NEON nvfp4 dot product on non-dotprod targets (llama/21559) 2026-04-30 11:29:08 +03:00
ggml-cpu.c ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273) 2026-04-30 11:29:01 +03:00
ggml-cpu.cpp ggml: backend-agnostic tensor parallelism (experimental) (llama/19378) 2026-04-30 11:29:05 +03:00
hbm.cpp ggml-cpu : split arch-specific implementations (llama/13892) 2025-06-10 12:40:33 +03:00
hbm.h ggml-cpu : split arch-specific implementations (llama/13892) 2025-06-10 12:40:33 +03:00
ops.cpp ggml : fix a few instances of missing GGML_TYPE_Q1_0 cases (llama/21716) 2026-04-30 11:29:06 +03:00
ops.h ggml: add GATED_DELTA_NET op (llama/19504) 2026-03-16 13:10:15 +02:00
quants.c ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (llama/21636) 2026-04-30 11:29:14 +03:00
quants.h ggml: add Q1_0 1-bit quantization support (CPU) (llama/21273) 2026-04-30 11:29:01 +03:00
repack.cpp ggml-cpu: fix RVV checks in quants and repacking (llama/20682) 2026-03-29 15:04:36 +03:00
repack.h ggml-cpu: add RVV repack GEMM and GEMV for quantization types (llama/19121) 2026-03-16 13:10:15 +02:00
simd-gemm.h ggml : implemented simd_gemm kernel for riscv vector extension (llama/20627) 2026-04-30 11:29:11 +03:00
simd-mappings.h ggml : add native AVX512-FP16 support for F16 operations (llama/20529) 2026-03-16 13:10:15 +02:00
traits.cpp ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2025-08-18 20:30:45 +03:00
traits.h ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2025-08-18 20:30:45 +03:00
unary-ops.cpp ggml : unary ops support non-cont src0 + metal F16 unary ops (llama/19511) 2026-02-15 21:44:37 +02:00
unary-ops.h ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (llama/17063) 2025-11-17 21:05:46 +02:00
vec.cpp ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (llama/19399) 2026-02-27 20:57:58 +02:00
vec.h ggml-cpu : re-enable fast gelu_quick_f16 (llama/22339) 2026-04-30 11:29:20 +03:00