Commit Graph

3689 Commits

Author SHA1 Message Date
ixgbe 91fa5b5cac ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (llama/17227)
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-17 21:05:46 +02:00
bagheera 265d326fa8 metal: accelerated conv2d (llama/17175)
* metal: accelerated conv2d

* cont : cleanup

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 21:05:46 +02:00
Georgi Gerganov 6a1d830dfd Revert "ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)" (llama/17233)
This reverts commit 1c398dc9eca9c366ce98deb0e6f3538e444ebc8a.
2025-11-17 21:05:46 +02:00
Diego Devesa 6a91780c3b ggml-cpu : use template for argsort (llama/17222) 2025-11-17 21:05:46 +02:00
TecJesh 726912d1cb CANN: Add cross_entropy_loss op support (llama/16886)
* update L2_NORM op support

* update L2_NORM op support

* remove extra whitespace

* cann: update cross_entropy_loss op support

* remove trailing whitespaces

* rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request.

* undo the l2_norm operator deletion
2025-11-17 21:05:46 +02:00
Aman Gupta 84275fc493 CUDA: fuse rope + set_rows (llama/16884)
* CUDA: add fused rope

* move k forward_expand up

* create helper function instead of re-using params

* make assert statement more in line with comment

* rope_norm: coalesced writes to global mem
2025-11-17 21:05:46 +02:00
Johannes Gäßler 566c4c4469 CUDA: static assert to prevent misuse of memcpy_1 (llama/17198) 2025-11-17 21:05:46 +02:00
Georgi Gerganov 3810a6180b ggml : use std::sort in ggml_argsort CPU implementation (llama/17211)
* ggml : use std::sort in ggml_argsort CPU implementation

* cont : add missing header
2025-11-17 21:05:46 +02:00
Alberto Cabrera Pérez 7df8515824 ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)
* ggml-cpu: handle 3d tensors in repack mul_mat

* Removed unnecessary branch, removed need for <algorithm>

* Fixed dst_ptr pointer in chunk + clang_format

* GGML_ASSERT to check wdata within bounds

* Accidental ggml.h inclusion

* Improved GGML_ASSERT on wdata boundaries
2025-11-17 21:05:46 +02:00
TecJesh e8b66d9f94 CANN: Add L2_NORM op support (llama/16856)
* update L2_NORM op support

* update L2_NORM op support

* remove extra whitespace
2025-11-17 21:05:46 +02:00
Neo Zhang Jianyu 8388350c66 fix ci crash about SSM_CONV (llama/17169)
* fix ci crash

* Update ggml-sycl.cpp

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-17 21:05:46 +02:00
Max Krasnyansky 6748d27f55 hexagon: various Op fixes (llama/17135)
* hexagon: explicitly check for ops with zero nrows

llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows.
Somehow other backends seems to handle this without obvious explicit checks.
In the hexagon case we need to check explicitly and skip them.

* hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL

Co-authored-by: chraac <chraac@gmail.com>

* hexagon: use fastdiv in ADD_ID

* hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs

---------

Co-authored-by: chraac <chraac@gmail.com>
2025-11-17 21:05:46 +02:00
Eve 559091005a disable rms norm mul rope for chips with no fp16 rte (llama/17134) 2025-11-17 21:05:46 +02:00
ixgbe cd8f64d1b5 ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion (llama/17161)
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-17 21:05:46 +02:00
duduta 1cefb03571 ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (llama/16805)
* extract rotate_pairs logic from ggml_compute_forward_rope_f32

* templateify ggml_compute_forward_rope_f32 and _f16

* abort when rope type not supported, remove GLM from test-rope

* add imrope branch to switch

* add rope tests for perf

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 21:05:46 +02:00
Charles Xu 3920ecce3a kleidiai: add optimized per-channel kernels for Q8_0 (llama/16993) 2025-11-17 21:05:46 +02:00
Mike Abbott c01bf73dd1 cmake : add version to all shared object files (llama/17091)
When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned.  This applies a version to all generated so files, allowing the package to build without errors.
2025-11-17 21:05:46 +02:00
lhez 46615d74d3 opencl: add fastdiv and use it in set_rows, ported from cuda (llama/17090)
* opencl: add fastdiv for mm q8_0

* opencl: use uint4 for fastdiv vals

* opencl: use fastdiv for set_rows

* opencl: do not use fastdiv for q8_0 mm
2025-11-17 21:05:46 +02:00
Max Krasnyansky ccf525baf0 cpu: skip NOPs to avoid barriers (llama/17133)
* cpu: skip NOPs to avoid barriers

* cpu: use ggml_op_is_empty
2025-11-17 21:05:46 +02:00
Georgi Gerganov 40aebfe8bf metal : cap threadgroups size of set_rows (llama/17146) 2025-11-17 21:05:46 +02:00
Adrien Gallouët 86be60093e ggml-cpu : inspect -march and -mcpu to found the CPU (llama/16333)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-17 21:05:46 +02:00
Ruben Ortlam ef71d83b76 vulkan: check glslc executable string (llama/17144) 2025-11-17 21:05:46 +02:00
Ruben Ortlam 43f2c1ff54 vulkan: fix validation issue introduced by #16868 (llama/17145) 2025-11-17 21:05:46 +02:00
Georgi Gerganov bb92c79f56 metal : enable tensor API for A19 (llama/17087) 2025-11-17 21:05:46 +02:00
fj-y-saito 4fea91f06e arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… (#15277)
* add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K

* Surround SVE function with compiler directive

* fix compile switch

* fix coding style

* ggml : fix indent

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 21:05:46 +02:00
Acly 58a97d988f cuda/vulkan : bicubic interpolation (llama/17022)
* vulkan : implement upscale with bicubic interpolation

* cuda : implement upscale with bicubic interpolation

* tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests

* adapt OpenCL backend to not support the OP in that case so tests don't fail

* print scale mode & flags in test-backend-ops
2025-11-17 21:05:46 +02:00
Ruben Ortlam 2e04e7a906 vulkan: fix memory allocations (llama/17122) 2025-11-17 21:05:46 +02:00
KITAITI Makoto 27f485a14c
vad : Silero VAD v6.2.0 (#3524)
* Add ggml-silero-v6.2.0 to download candidates

* Make default VAD model ggml-silero-v6.2.0

* Make VAD model in documentations ggml-silero-v6.2.0
2025-11-17 22:26:17 +09:00
KITAITI Makoto d9b7613b34
ruby : VAD separately from ASR (#3518)
* Add Whisper::VAD::Context

* Add test for Whisper::VAD::Context

* Add Whisper::VAD::Segment

* Add Whisper::VAD::Segments

* Add Whisper::VAD::Context#detect

* Define Whisper::VAD::Segments#each

* Define Whisper::VAD::Segment#start_time and #end_time

* Define Whisper::VAD::Segment#deconstruct_keys

* Add tests for Whisper::VAD family

* Add signatures for VAD family

* Add document on VAD in README

* Define Whisper::VAD::Segments#length

* Add test for Whisper::VAD::Segments#length

* Add signature of Segments#length

* Make vad_segments responsible to initialize VAD::Segments

* Remove meaningless argument check

* Check NULL of segments member

* Add tests for Whisper::VAD::Segments

* Initialize Whisper::VAD::Segment on .allocate

* Add tests for Whisper::VAD::Segment

* Check NULL of context member

* Add test for Whisper::VAD::Context.allocate
2025-11-13 10:15:26 +09:00
Georgi Gerganov a1867e0dad sync : llama.cpp 2025-11-09 23:38:03 +02:00
Georgi Gerganov e67dfbc51b sync : ggml 2025-11-09 23:38:03 +02:00
Ruben Ortlam 1993e397bb vulkan: iGPU memory reporting fix (llama/17110)
* vulkan: use all device-local heaps for memory availability reporting

Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>

* use all available heaps for iGPU memory reporting

* Allow multiple memory types per buffer request for devices with split heaps

---------

Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-11-09 23:38:03 +02:00
Ruben Ortlam ee8349cf10 vulkan: fix mmq out of bounds reads (llama/17108)
* vulkan: fix mmq out of bounds reads, streamline outdated matmul host code

* fix mul_mat_id quantization call

* Fix compiler warnings
2025-11-09 23:38:03 +02:00
Jeff Bolz db98e8c5b4 vulkan: fuse mul_mat_id + mul (llama/17095)
* vulkan: fuse mul_mat_id + mul

This comes up in qwen3 moe.

* split mul_mat_id fusion tests into a separate class
2025-11-09 23:38:03 +02:00
Georgi Gerganov a4339e2ea7 metal : retain src and dst buffers during async ops (llama/17101) 2025-11-09 23:38:03 +02:00
Jeff Bolz 6de3404773 vulkan: Use spec constants for conv2d s/d/p and kernel W/H (llama/16978)
* vulkan: Use spec constants for conv2d s/d/p and kernel W/H

Also add some additional unroll hints, which seems to help.

* lock around map lookup
2025-11-09 23:38:03 +02:00
Aman Gupta 8967c9ad9b Revert "CUDA: add expert reduce kernel (ggml/16857)" (llama/17100) 2025-11-09 23:38:03 +02:00
Aman Gupta 522b9bce33 CUDA: skip fusion for repeating adds in bias (llama/17080) 2025-11-09 23:38:03 +02:00
SavicStefan 0caa32c772 vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (llama/16636)
Signed-off-by: Stefan Savic <stefan.savic@huawei.com>
Co-authored-by: Stefan Savic <stefan.savic@huawei.com>
2025-11-09 23:38:03 +02:00
Aleksei Nikiforov 3c975ad523 ggml: disable vxe for cross-compilation by default (llama/16966)
Otherwise compilation will fail due to enabling -mvx -mzvector
and not setting corresponding -march options.
2025-11-09 23:38:03 +02:00
Jeff Bolz 257ce2f5c0 vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (llama/16977)
This change combines the rms_norm+mul and rope+view+set_rows fusions to
allow fusing the whole sequence together. This comes up in Qwen3, Bailing,
and some other models.
2025-11-09 23:38:03 +02:00
Jeff Bolz 4eef518167 vulkan: Fix test-thread-safety crashes (llama/17024)
The std::map pipeline_flash_attn_f32_f16 could be searched and inserted at the
same time, which needs to hold the lock. To be safe, hold the lock for all of
ggml_vk_load_shaders.
2025-11-09 23:38:03 +02:00
Johannes Gäßler 358f77aca7 CUDA: fix MMQ stream-k fixup ne1 indices (llama/17089) 2025-11-09 23:38:03 +02:00
Reese Levine 78ea6c5b67 ggml webgpu: faster matrix multiplication/matrix-vector multiplication (llama/17031)
* Faster tensors (llama/8)

Add fast matrix and matrix/vector multiplication.

* Use map for shader replacements instead of pair of strings
2025-11-09 23:38:03 +02:00
bssrdf 547724b0a5 CUDA: properly handle nb00=nb02 case for cpy (llama/17081) 2025-11-09 23:38:03 +02:00
Acly 11543bf446 vulkan : refactor buffer handling in vk_op_f32 (llama/16840)
* vulkan : refactor/simplify buffer handling in vk_op_* functions

* Combine UMA handling into ggml_vk_tensor_subbuffer
2025-11-09 23:38:03 +02:00
Johannes Gäßler af8a88792f CUDA: fix should_use_mmvf for ne11 == 1 (llama/17085)
* CUDA: fix should_use_mmvf for ne11 == 1

* Apply suggestion from @am17an

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2025-11-09 23:38:03 +02:00
Adrien Gallouët a1746097bc Revert "ggml-cpu: detect correct cpu flags for arm64 (llama/16229) (#16239)" (llama/17084)
This reverts commit 7c23f3f0d4b9f5d6ea140756eb694b562d5acebb.
2025-11-09 23:38:03 +02:00
iron 512592513c ggml-cpu: detect correct cpu flags for arm64 (ggml/16229) (llama/16239)
When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004,
the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags,
which results in compilation failures for certain extended instructions,
but the correct CPU flags can be obtained by using gcc -march.

Signed-off-by: lizhenneng <lizhenneng@kylinos.cn>
Co-authored-by: lizhenneng <lizhenneng@kylinos.cn>
2025-11-09 23:38:03 +02:00
xctan 5bce732795 ggml-cpu : optimize RVV q2_k and q3_k kernels (llama/16887) 2025-11-09 23:38:03 +02:00