whisper.cpp

Commit Graph

Author	SHA1	Message	Date
ixgbe	91fa5b5cac	ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (llama/17227) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-17 21:05:46 +02:00
bagheera	265d326fa8	metal: accelerated conv2d (llama/17175) * metal: accelerated conv2d * cont : cleanup --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 21:05:46 +02:00
Georgi Gerganov	6a1d830dfd	Revert "ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)" (llama/17233) This reverts commit 1c398dc9eca9c366ce98deb0e6f3538e444ebc8a.	2025-11-17 21:05:46 +02:00
Diego Devesa	6a91780c3b	ggml-cpu : use template for argsort (llama/17222)	2025-11-17 21:05:46 +02:00
TecJesh	726912d1cb	CANN: Add cross_entropy_loss op support (llama/16886) * update L2_NORM op support * update L2_NORM op support * remove extra whitespace * cann: update cross_entropy_loss op support * remove trailing whitespaces * rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request. * undo the l2_norm operator deletion	2025-11-17 21:05:46 +02:00
Aman Gupta	84275fc493	CUDA: fuse rope + set_rows (llama/16884) * CUDA: add fused rope * move k forward_expand up * create helper function instead of re-using params * make assert statement more in line with comment * rope_norm: coalesced writes to global mem	2025-11-17 21:05:46 +02:00
Johannes Gäßler	566c4c4469	CUDA: static assert to prevent misuse of memcpy_1 (llama/17198)	2025-11-17 21:05:46 +02:00
Georgi Gerganov	3810a6180b	ggml : use std::sort in ggml_argsort CPU implementation (llama/17211) * ggml : use std::sort in ggml_argsort CPU implementation * cont : add missing header	2025-11-17 21:05:46 +02:00
Alberto Cabrera Pérez	7df8515824	ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries	2025-11-17 21:05:46 +02:00
TecJesh	e8b66d9f94	CANN: Add L2_NORM op support (llama/16856) * update L2_NORM op support * update L2_NORM op support * remove extra whitespace	2025-11-17 21:05:46 +02:00
Neo Zhang Jianyu	8388350c66	fix ci crash about SSM_CONV (llama/17169) * fix ci crash * Update ggml-sycl.cpp * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-17 21:05:46 +02:00
Max Krasnyansky	6748d27f55	hexagon: various Op fixes (llama/17135) * hexagon: explicitly check for ops with zero nrows llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows. Somehow other backends seems to handle this without obvious explicit checks. In the hexagon case we need to check explicitly and skip them. * hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL Co-authored-by: chraac <chraac@gmail.com> * hexagon: use fastdiv in ADD_ID * hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs --------- Co-authored-by: chraac <chraac@gmail.com>	2025-11-17 21:05:46 +02:00
Eve	559091005a	disable rms norm mul rope for chips with no fp16 rte (llama/17134)	2025-11-17 21:05:46 +02:00
ixgbe	cd8f64d1b5	ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion (llama/17161) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-17 21:05:46 +02:00
duduta	1cefb03571	ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (llama/16805) * extract rotate_pairs logic from ggml_compute_forward_rope_f32 * templateify ggml_compute_forward_rope_f32 and _f16 * abort when rope type not supported, remove GLM from test-rope * add imrope branch to switch * add rope tests for perf * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 21:05:46 +02:00
Charles Xu	3920ecce3a	kleidiai: add optimized per-channel kernels for Q8_0 (llama/16993)	2025-11-17 21:05:46 +02:00
Mike Abbott	c01bf73dd1	cmake : add version to all shared object files (llama/17091) When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors.	2025-11-17 21:05:46 +02:00
lhez	46615d74d3	opencl: add fastdiv and use it in set_rows, ported from cuda (llama/17090) * opencl: add fastdiv for mm q8_0 * opencl: use uint4 for fastdiv vals * opencl: use fastdiv for set_rows * opencl: do not use fastdiv for q8_0 mm	2025-11-17 21:05:46 +02:00
Max Krasnyansky	ccf525baf0	cpu: skip NOPs to avoid barriers (llama/17133) * cpu: skip NOPs to avoid barriers * cpu: use ggml_op_is_empty	2025-11-17 21:05:46 +02:00
Georgi Gerganov	40aebfe8bf	metal : cap threadgroups size of set_rows (llama/17146)	2025-11-17 21:05:46 +02:00
Adrien Gallouët	86be60093e	ggml-cpu : inspect -march and -mcpu to found the CPU (llama/16333) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-17 21:05:46 +02:00
Ruben Ortlam	ef71d83b76	vulkan: check glslc executable string (llama/17144)	2025-11-17 21:05:46 +02:00
Ruben Ortlam	43f2c1ff54	vulkan: fix validation issue introduced by #16868 (llama/17145)	2025-11-17 21:05:46 +02:00
Georgi Gerganov	bb92c79f56	metal : enable tensor API for A19 (llama/17087)	2025-11-17 21:05:46 +02:00
fj-y-saito	4fea91f06e	arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… (#15277 ) * add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K * Surround SVE function with compiler directive * fix compile switch * fix coding style * ggml : fix indent --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 21:05:46 +02:00
Acly	58a97d988f	cuda/vulkan : bicubic interpolation (llama/17022) * vulkan : implement upscale with bicubic interpolation * cuda : implement upscale with bicubic interpolation * tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests * adapt OpenCL backend to not support the OP in that case so tests don't fail * print scale mode & flags in test-backend-ops	2025-11-17 21:05:46 +02:00
Ruben Ortlam	2e04e7a906	vulkan: fix memory allocations (llama/17122)	2025-11-17 21:05:46 +02:00
KITAITI Makoto	27f485a14c	vad : Silero VAD v6.2.0 (#3524 ) * Add ggml-silero-v6.2.0 to download candidates * Make default VAD model ggml-silero-v6.2.0 * Make VAD model in documentations ggml-silero-v6.2.0	2025-11-17 22:26:17 +09:00
KITAITI Makoto	d9b7613b34	ruby : VAD separately from ASR (#3518 ) * Add Whisper::VAD::Context * Add test for Whisper::VAD::Context * Add Whisper::VAD::Segment * Add Whisper::VAD::Segments * Add Whisper::VAD::Context#detect * Define Whisper::VAD::Segments#each * Define Whisper::VAD::Segment#start_time and #end_time * Define Whisper::VAD::Segment#deconstruct_keys * Add tests for Whisper::VAD family * Add signatures for VAD family * Add document on VAD in README * Define Whisper::VAD::Segments#length * Add test for Whisper::VAD::Segments#length * Add signature of Segments#length * Make vad_segments responsible to initialize VAD::Segments * Remove meaningless argument check * Check NULL of segments member * Add tests for Whisper::VAD::Segments * Initialize Whisper::VAD::Segment on .allocate * Add tests for Whisper::VAD::Segment * Check NULL of context member * Add test for Whisper::VAD::Context.allocate	2025-11-13 10:15:26 +09:00
Georgi Gerganov	a1867e0dad	sync : llama.cpp	2025-11-09 23:38:03 +02:00
Georgi Gerganov	e67dfbc51b	sync : ggml	2025-11-09 23:38:03 +02:00
Ruben Ortlam	1993e397bb	vulkan: iGPU memory reporting fix (llama/17110) * vulkan: use all device-local heaps for memory availability reporting Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com> * use all available heaps for iGPU memory reporting * Allow multiple memory types per buffer request for devices with split heaps --------- Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-11-09 23:38:03 +02:00
Ruben Ortlam	ee8349cf10	vulkan: fix mmq out of bounds reads (llama/17108) * vulkan: fix mmq out of bounds reads, streamline outdated matmul host code * fix mul_mat_id quantization call * Fix compiler warnings	2025-11-09 23:38:03 +02:00
Jeff Bolz	db98e8c5b4	vulkan: fuse mul_mat_id + mul (llama/17095) * vulkan: fuse mul_mat_id + mul This comes up in qwen3 moe. * split mul_mat_id fusion tests into a separate class	2025-11-09 23:38:03 +02:00
Georgi Gerganov	a4339e2ea7	metal : retain src and dst buffers during async ops (llama/17101)	2025-11-09 23:38:03 +02:00
Jeff Bolz	6de3404773	vulkan: Use spec constants for conv2d s/d/p and kernel W/H (llama/16978) * vulkan: Use spec constants for conv2d s/d/p and kernel W/H Also add some additional unroll hints, which seems to help. * lock around map lookup	2025-11-09 23:38:03 +02:00
Aman Gupta	8967c9ad9b	Revert "CUDA: add expert reduce kernel (ggml/16857)" (llama/17100)	2025-11-09 23:38:03 +02:00
Aman Gupta	522b9bce33	CUDA: skip fusion for repeating adds in bias (llama/17080)	2025-11-09 23:38:03 +02:00
SavicStefan	0caa32c772	vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (llama/16636) Signed-off-by: Stefan Savic <stefan.savic@huawei.com> Co-authored-by: Stefan Savic <stefan.savic@huawei.com>	2025-11-09 23:38:03 +02:00
Aleksei Nikiforov	3c975ad523	ggml: disable vxe for cross-compilation by default (llama/16966) Otherwise compilation will fail due to enabling -mvx -mzvector and not setting corresponding -march options.	2025-11-09 23:38:03 +02:00
Jeff Bolz	257ce2f5c0	vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (llama/16977) This change combines the rms_norm+mul and rope+view+set_rows fusions to allow fusing the whole sequence together. This comes up in Qwen3, Bailing, and some other models.	2025-11-09 23:38:03 +02:00
Jeff Bolz	4eef518167	vulkan: Fix test-thread-safety crashes (llama/17024) The std::map pipeline_flash_attn_f32_f16 could be searched and inserted at the same time, which needs to hold the lock. To be safe, hold the lock for all of ggml_vk_load_shaders.	2025-11-09 23:38:03 +02:00
Johannes Gäßler	358f77aca7	CUDA: fix MMQ stream-k fixup ne1 indices (llama/17089)	2025-11-09 23:38:03 +02:00
Reese Levine	78ea6c5b67	ggml webgpu: faster matrix multiplication/matrix-vector multiplication (llama/17031) * Faster tensors (llama/8) Add fast matrix and matrix/vector multiplication. * Use map for shader replacements instead of pair of strings	2025-11-09 23:38:03 +02:00
bssrdf	547724b0a5	CUDA: properly handle nb00=nb02 case for cpy (llama/17081)	2025-11-09 23:38:03 +02:00
Acly	11543bf446	vulkan : refactor buffer handling in vk_op_f32 (llama/16840) * vulkan : refactor/simplify buffer handling in vk_op_* functions * Combine UMA handling into ggml_vk_tensor_subbuffer	2025-11-09 23:38:03 +02:00
Johannes Gäßler	af8a88792f	CUDA: fix should_use_mmvf for ne11 == 1 (llama/17085) * CUDA: fix should_use_mmvf for ne11 == 1 * Apply suggestion from @am17an Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>	2025-11-09 23:38:03 +02:00
Adrien Gallouët	a1746097bc	Revert "ggml-cpu: detect correct cpu flags for arm64 (llama/16229) (#16239 )" (llama/17084) This reverts commit 7c23f3f0d4b9f5d6ea140756eb694b562d5acebb.	2025-11-09 23:38:03 +02:00
iron	512592513c	ggml-cpu: detect correct cpu flags for arm64 (ggml/16229) (llama/16239) When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004, the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags, which results in compilation failures for certain extended instructions, but the correct CPU flags can be obtained by using gcc -march. Signed-off-by: lizhenneng <lizhenneng@kylinos.cn> Co-authored-by: lizhenneng <lizhenneng@kylinos.cn>	2025-11-09 23:38:03 +02:00
xctan	5bce732795	ggml-cpu : optimize RVV q2_k and q3_k kernels (llama/16887)	2025-11-09 23:38:03 +02:00

... 2 3 4 5 6 ...

3689 Commits All Branches Search

3689 Commits

All Branches