whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Ruben Ortlam	f571655e8e	vulkan: fix MMQ quantize_y condition (llama/17301)	2025-11-17 21:05:46 +02:00
Georgi Gerganov	9549cc1051	metal : remove obosolete asserts (llama/17295)	2025-11-17 21:05:46 +02:00
lhez	a75525cad0	opencl: fix rms_norm_mul (llama/17250) * opencl: use subgrroup reduce for reduction in rms_norm_mul * opencl: add comment about workgroup size	2025-11-17 21:05:46 +02:00
shaofeiqi	c78845bfa9	opencl: add kernel to handle mat mul in attention to improve encoding speed (llama/17181) * Add mul_mm_f16_f32_kq_kqv kernel * Add ggml_cl_mul_mat_kq_kqv_adreno func * fix whitespace * remove unused variable * remove redundant * refactor and clean up * remove trailing whitespace	2025-11-17 21:05:46 +02:00
shani-f	1fd63da9f2	sycl : unify unary kernels with a generic implementation and enable wide operator support (llama/17213) * SYCL: add generic unary op implementation for multiple ops (ABS/SGN/…); unify non-contiguous access * SYCL: update documentation and sycl.csv to reflect new unary op support * update ops.md after syncing SYCL.csv changes * Fix SYCL.csv merge conflict * Update ops.md after fixing SYCL.csv conflicts * Fix SYCL.csv tail after merge conflict and regenerate ops.md * Fix line endings and final newline in SYCL.csv * Remove TOPK_MOE entries from SYCL.csv as requested * Update ops.md after removing TOPK_MOE from SYCL.csv * Regenerated SYCL.csv and synced ops.md with upstream * Update ops.md using create_ops_docs.py	2025-11-17 21:05:46 +02:00
Jeff Bolz	ea3ebd8b0d	vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (llama/17287) These both show up in gpt-oss. Also, cleanup the mul_mat_vec fusion code a bit.	2025-11-17 21:05:46 +02:00
Ruben Ortlam	7caea54450	vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (llama/17285)	2025-11-17 21:05:46 +02:00
Giuseppe Scrivano	4c4e663da0	vulkan: implement ABS and NEG (llama/17245) * docs: update Vulkan ops * vulkan: add NEG op * vulkan: add ABS op --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-11-17 21:05:46 +02:00
Jeff Bolz	e1846fc599	vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (llama/17244) * vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths * set allow_misalign	2025-11-17 21:05:46 +02:00
Jeff Bolz	9614a56314	vulkan: skip all-negative-inf blocks in FA (llama/17186)	2025-11-17 21:05:46 +02:00
Jeff Bolz	37d4bba152	vulkan: change graph_compute to be async and enable get_tensor_async (llama/17158) * vulkan: change graph_compute to be async and enable get_tensor_async This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize. * fix thread safety errors * teardown context cleanly * Handle async read to non-pinned dst	2025-11-17 21:05:46 +02:00
Georgi Gerganov	523a6c27ea	metal : support argsort for ne00 > 1024 (llama/17247) * metal : refactor argsort * cont : sort chunks * cont : merge sorted buckets * cont : cleanup	2025-11-17 21:05:46 +02:00
Georgi Gerganov	b4d7df3ba2	metal : make the FA extra sizes consistent (llama/17143)	2025-11-17 21:05:46 +02:00
Alberto Cabrera Pérez	a81fbfc78e	ggml-cpu: handle 3d tensors in repack mat_mul (llama/17241) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries * Address performance regression in Qwen and llama.cpp due to chunking	2025-11-17 21:05:46 +02:00
Piotr Wilkin (ilintar)	3e684f26c1	ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (llama/17063) * Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-17 21:05:46 +02:00
Ruben Ortlam	e8e0004fe5	vulkan: remove shell call from vulkan-shaders-gen tool, revert file check (llama/17219) * vulkan: remove shell call from vulkan-shaders-gen tool * use string vector for command execution * Fix condition * use string, remove const_cast * Fix dependency file quotation on Windows --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-11-17 21:05:46 +02:00
Diego Devesa	210f0f860b	sched : fix reserve ignoring user tensor assignments (llama/17232)	2025-11-17 21:05:46 +02:00
ixgbe	91fa5b5cac	ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (llama/17227) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-17 21:05:46 +02:00
bagheera	265d326fa8	metal: accelerated conv2d (llama/17175) * metal: accelerated conv2d * cont : cleanup --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 21:05:46 +02:00
Georgi Gerganov	6a1d830dfd	Revert "ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)" (llama/17233) This reverts commit 1c398dc9eca9c366ce98deb0e6f3538e444ebc8a.	2025-11-17 21:05:46 +02:00
Diego Devesa	6a91780c3b	ggml-cpu : use template for argsort (llama/17222)	2025-11-17 21:05:46 +02:00
TecJesh	726912d1cb	CANN: Add cross_entropy_loss op support (llama/16886) * update L2_NORM op support * update L2_NORM op support * remove extra whitespace * cann: update cross_entropy_loss op support * remove trailing whitespaces * rebase the latest code in the main repository and remove the l2_norm operator that already exists in another pull request. * undo the l2_norm operator deletion	2025-11-17 21:05:46 +02:00
Aman Gupta	84275fc493	CUDA: fuse rope + set_rows (llama/16884) * CUDA: add fused rope * move k forward_expand up * create helper function instead of re-using params * make assert statement more in line with comment * rope_norm: coalesced writes to global mem	2025-11-17 21:05:46 +02:00
Johannes Gäßler	566c4c4469	CUDA: static assert to prevent misuse of memcpy_1 (llama/17198)	2025-11-17 21:05:46 +02:00
Georgi Gerganov	3810a6180b	ggml : use std::sort in ggml_argsort CPU implementation (llama/17211) * ggml : use std::sort in ggml_argsort CPU implementation * cont : add missing header	2025-11-17 21:05:46 +02:00
Alberto Cabrera Pérez	7df8515824	ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries	2025-11-17 21:05:46 +02:00
TecJesh	e8b66d9f94	CANN: Add L2_NORM op support (llama/16856) * update L2_NORM op support * update L2_NORM op support * remove extra whitespace	2025-11-17 21:05:46 +02:00
Neo Zhang Jianyu	8388350c66	fix ci crash about SSM_CONV (llama/17169) * fix ci crash * Update ggml-sycl.cpp * Update ggml/src/ggml-sycl/ggml-sycl.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-17 21:05:46 +02:00
Max Krasnyansky	6748d27f55	hexagon: various Op fixes (llama/17135) * hexagon: explicitly check for ops with zero nrows llm_graph_context::build_inp_out_ids() can generate tensors with zero nrows. Somehow other backends seems to handle this without obvious explicit checks. In the hexagon case we need to check explicitly and skip them. * hexagon: introduce fastdiv, fix test-backend-ops for ADD/SUB/MUL Co-authored-by: chraac <chraac@gmail.com> * hexagon: use fastdiv in ADD_ID * hexagon: use ggml_op_is_empty and ggml_is_empty to check for NOPs --------- Co-authored-by: chraac <chraac@gmail.com>	2025-11-17 21:05:46 +02:00
Eve	559091005a	disable rms norm mul rope for chips with no fp16 rte (llama/17134)	2025-11-17 21:05:46 +02:00
ixgbe	cd8f64d1b5	ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion (llama/17161) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-17 21:05:46 +02:00
duduta	1cefb03571	ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (llama/16805) * extract rotate_pairs logic from ggml_compute_forward_rope_f32 * templateify ggml_compute_forward_rope_f32 and _f16 * abort when rope type not supported, remove GLM from test-rope * add imrope branch to switch * add rope tests for perf * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 21:05:46 +02:00
Charles Xu	3920ecce3a	kleidiai: add optimized per-channel kernels for Q8_0 (llama/16993)	2025-11-17 21:05:46 +02:00
Mike Abbott	c01bf73dd1	cmake : add version to all shared object files (llama/17091) When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned. This applies a version to all generated so files, allowing the package to build without errors.	2025-11-17 21:05:46 +02:00
lhez	46615d74d3	opencl: add fastdiv and use it in set_rows, ported from cuda (llama/17090) * opencl: add fastdiv for mm q8_0 * opencl: use uint4 for fastdiv vals * opencl: use fastdiv for set_rows * opencl: do not use fastdiv for q8_0 mm	2025-11-17 21:05:46 +02:00
Max Krasnyansky	ccf525baf0	cpu: skip NOPs to avoid barriers (llama/17133) * cpu: skip NOPs to avoid barriers * cpu: use ggml_op_is_empty	2025-11-17 21:05:46 +02:00
Georgi Gerganov	40aebfe8bf	metal : cap threadgroups size of set_rows (llama/17146)	2025-11-17 21:05:46 +02:00
Adrien Gallouët	86be60093e	ggml-cpu : inspect -march and -mcpu to found the CPU (llama/16333) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-17 21:05:46 +02:00
Ruben Ortlam	ef71d83b76	vulkan: check glslc executable string (llama/17144)	2025-11-17 21:05:46 +02:00
Ruben Ortlam	43f2c1ff54	vulkan: fix validation issue introduced by #16868 (llama/17145)	2025-11-17 21:05:46 +02:00
Georgi Gerganov	bb92c79f56	metal : enable tensor API for A19 (llama/17087)	2025-11-17 21:05:46 +02:00
fj-y-saito	4fea91f06e	arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… (#15277 ) * add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K * Surround SVE function with compiler directive * fix compile switch * fix coding style * ggml : fix indent --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 21:05:46 +02:00
Acly	58a97d988f	cuda/vulkan : bicubic interpolation (llama/17022) * vulkan : implement upscale with bicubic interpolation * cuda : implement upscale with bicubic interpolation * tests : add ggml_interpolate with GGML_SCALE_MODE_BICUBIC to backend tests * adapt OpenCL backend to not support the OP in that case so tests don't fail * print scale mode & flags in test-backend-ops	2025-11-17 21:05:46 +02:00
Ruben Ortlam	2e04e7a906	vulkan: fix memory allocations (llama/17122)	2025-11-17 21:05:46 +02:00
KITAITI Makoto	27f485a14c	vad : Silero VAD v6.2.0 (#3524 ) * Add ggml-silero-v6.2.0 to download candidates * Make default VAD model ggml-silero-v6.2.0 * Make VAD model in documentations ggml-silero-v6.2.0	2025-11-17 22:26:17 +09:00
KITAITI Makoto	d9b7613b34	ruby : VAD separately from ASR (#3518 ) * Add Whisper::VAD::Context * Add test for Whisper::VAD::Context * Add Whisper::VAD::Segment * Add Whisper::VAD::Segments * Add Whisper::VAD::Context#detect * Define Whisper::VAD::Segments#each * Define Whisper::VAD::Segment#start_time and #end_time * Define Whisper::VAD::Segment#deconstruct_keys * Add tests for Whisper::VAD family * Add signatures for VAD family * Add document on VAD in README * Define Whisper::VAD::Segments#length * Add test for Whisper::VAD::Segments#length * Add signature of Segments#length * Make vad_segments responsible to initialize VAD::Segments * Remove meaningless argument check * Check NULL of segments member * Add tests for Whisper::VAD::Segments * Initialize Whisper::VAD::Segment on .allocate * Add tests for Whisper::VAD::Segment * Check NULL of context member * Add test for Whisper::VAD::Context.allocate	2025-11-13 10:15:26 +09:00
Georgi Gerganov	a1867e0dad	sync : llama.cpp	2025-11-09 23:38:03 +02:00
Georgi Gerganov	e67dfbc51b	sync : ggml	2025-11-09 23:38:03 +02:00
Ruben Ortlam	1993e397bb	vulkan: iGPU memory reporting fix (llama/17110) * vulkan: use all device-local heaps for memory availability reporting Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com> * use all available heaps for iGPU memory reporting * Allow multiple memory types per buffer request for devices with split heaps --------- Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-11-09 23:38:03 +02:00
Ruben Ortlam	ee8349cf10	vulkan: fix mmq out of bounds reads (llama/17108) * vulkan: fix mmq out of bounds reads, streamline outdated matmul host code * fix mul_mat_id quantization call * Fix compiler warnings	2025-11-09 23:38:03 +02:00

... 5 6 7 8 9 ...

3856 Commits All Branches Search

3856 Commits

All Branches