whisper.cpp

Commit Graph

Author	SHA1	Message	Date
nullname	cb3ee1b098	ggml-hexagon: fix swiglu failure at `test-backend-ops` (llama/17344) * refactor: use hvx_vec_exp_fp32_guard_inf for overflow handling in hvx_exp_f32 * feat: add fast sigmoid function with overflow guard for fp32 * refactor: replace hvx_vec_inverse_fp32 with hvx_vec_inverse_fp32_guard_inf for improved overflow handling * feat: enhance hvx_add_scalar_f32 with overflow handling using infinity guard * wip * add HVX_Vector_Alias wip * wip * fix: improve handling of src1 tensor in glu_swiglu_fp32_per_thread function * fix nc * wip * wip * handle nan at inverse * wip * fix neg * wip * rename * fix hvx_vec_inverse_fp32_guard_inf to handle infinity and NaN cases correctly * wip * fix hvx_vec_inverse_fp32_guard_inf to handle NaN cases correctly * wip * wip * wip * fix output sign	2025-12-12 17:53:05 +02:00
Piotr Wilkin (ilintar)	46f893c2fa	ggml : Fix transposed SOLVE_TRI result (llama/17323) * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-12 17:53:05 +02:00
Scott Fudally	510805e6c1	DGX Spark: UMA support (llama/17368) * DGX Spark: UMA support * Updates from PR feedback * More PR feedback cleanup * Update ggml/src/ggml-cuda/ggml-cuda.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Remove trailing whitespace * Update ggml/src/ggml-cuda/ggml-cuda.cu --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-12 17:53:05 +02:00
Adrien Gallouët	2f20938b58	ggml : remove useless and error-prone variadic macros (llama/17399) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-12 17:53:04 +02:00
sudhiarm	51f5438089	kleidiai: fix zero-size array declaration (llama/17240)	2025-12-12 17:53:04 +02:00
ixgbe	1d3a525001	ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (llama/17314) * ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * fix comment * fix comment 2 --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-12-12 17:53:04 +02:00
Giuseppe Scrivano	24b14cad87	vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (llama/17319) * vulkan: initialize array * vulkan: implement ADD1 * vulkan: implement ARANGE * vulkan: implement FILL * vulkan: implement SOFTPLUS * vulkan: implement STEP * vulkan: implement ROUND * vulkan: implement CEIL * vulkan: implement FLOOR * vulkan: implement TRUNC * docs: update Vulkan ops Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-12-12 17:53:04 +02:00
Jeff Bolz	95d0b0b0cf	vulkan: support larger argsort (llama/17313) * vulkan: support larger argsort This is an extension of the original bitonic sorting shader that puts the temporary values in global memory and when more than 1024 threads are needed it runs multiple workgroups and synchronizes through a pipelinebarrier. To improve the memory access pattern, a copy of the float value is kept with the index value. I've applied this same change to the original shared memory version of the shader, which is still used when ncols <= 1024. * Reduce the number of shader variants. Use smaller workgroups when doing a single pass, for a modest perf boost * reduce loop overhead * run multiple cols per invocation, to reduce barrier overhead	2025-12-12 17:53:04 +02:00
Jeff Bolz	ae8865c6e6	vulkan: Add copy_transpose shader (llama/17371)	2025-12-12 17:53:04 +02:00
Aman Gupta	73d396826b	cuda: fix rope fusion for gemma3 (llama/17378)	2025-12-12 17:53:03 +02:00
Piotr Wilkin (ilintar)	746cbed20a	Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition (llama/17332) * Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition * Argh. * Making CISC happy ;) * Integrate CONT tests * Use loopy loop * Skip new tests for (B)F16 for now.	2025-12-12 17:53:03 +02:00
Ruben Ortlam	2097a9c1bd	vulkan: force full subgroups for flash attention to fix intel subgroup crash (llama/17356)	2025-12-12 17:53:03 +02:00
Jeremy Rand	27c69271c5	ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (llama/17308)	2025-12-12 17:53:03 +02:00
Chenguang Li	c137d11b81	CANN: fix acl_tensor_ptr usage in ASCEND_310P ROPE (llama/17347) * cann: fix acl_tensor_ptr usage in ASCEND_310P ROPE implementation Fix compilation errors in the ASCEND_310P-specific ROPE operation code by adding .get() calls when passing acl_tensor_ptr smart pointers to functions expecting raw aclTensor* pointers. This fixes the code that was missed in the previous refactoring commit (8981848) which changed ggml_cann_create_tensor() return type from aclTensor* to acl_tensor_ptr. * cann: format code	2025-12-12 17:53:03 +02:00
Jeff Bolz	24b981eff7	vulkan: support noncontig i32 copy (llama/17328)	2025-12-12 17:53:03 +02:00
Ruben Ortlam	b7dfced37f	vulkan: add log RTE support to fix Nvidia CI (llama/17320) * vulkan: add log RTE support to fix Nvidia CI * actually use the rte shader	2025-12-12 17:53:02 +02:00
Adrien Gallouët	9e429c47e1	cmake : fix ARM feature verification (llama/17170) * cmake : fix ARM feature verification Use check_cxx_source_compiles to prevent conflicts with the existing GGML_NATIVE detection code. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : unset __ARM_FEATURE when feature is disabled Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : fix scope, this is really a macro Signed-off-by: Adrien Gallouët <angt@huggingface.co> * arm_neon.h is useless Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-12 17:53:02 +02:00
Adrien Gallouët	bb88c2545f	ggml : add missing AVX512 feature checks (llama/17270) _mm512_cvtepu8_epi16 requires __AVX512BW__ _mm512_srli_epi16 requires __AVX512BW__ __builtin_ia32_inserti32x8 requires __AVX512DQ__ Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-12 17:53:02 +02:00
Daniel Bevenius	418314941e	ggml : remove dirty flag from version string (ggml/1391) This commit removes the "-dirty" suffix from the GGML version string. The motivation for this change is to ensure that the version string works with different ways of checking out ggml and using it in projects. By removing the dirty flag from the version string, we avoid potential artifacts like shared libraries getting a -dirty suffix in their names. Instead, if the project is built from a dirty git state, the dirty flag will be appended to the commit hash in the GGML_BUILD_COMMIT variable. This will enable users to still identify that the build was made from from a modified/dirty state even though the version might match a "real" version. For example, the commit can be produces as follows: ```c++ printf("commit: %s\n", ggml_commit()); ``` Which would print the following for a dirty build: ```console commit: 781baf2a-dirty ``` Refs: https://github.com/ggml-org/ggml/pull/1363#issuecomment-3569691546	2025-12-12 17:53:00 +02:00
Josh Montoya	9f5ed26e43	go : Enable VAD for Go bindings (#3563 ) * reset context.n so that NextSegment can be called for multiple Process calls * enable VAD params	2025-12-10 13:31:36 +02:00
Josh Montoya	a8f45ab11d	go : reset context.n in Process() (#3503 )	2025-12-08 18:33:07 +02:00
Joseph Sellers	a88b93f85f	vad : fix buffer overflow in sample reduction loop (#3558 ) The buffer size calculation loop (line ~6661) uses `n_samples - 1` as the upper bound for segment_end_samples, but the copy loop (line 6696) uses `n_samples`. This inconsistency allows the copy loop to compute segment_length values up to 1 sample larger per segment than what was allocated, causing heap corruption. Symptom: `malloc(): corrupted top size` or `malloc(): invalid size (unsorted)` crashes after VAD completes sample reduction. Fix: Use consistent bounds (`n_samples - 1`) in both loops. Fixes #3403	2025-12-06 12:28:32 +01:00
Daniel Bevenius	d566358a1d	tests : update VAD tests to use Silero V6.2.0 (#3534 ) * tests : update VAD tests to use Silero V6.2.0 This commit updates the VAD tests to use the Silero V6.2.0 instead of V5.1.2. I'm was not sure if we needed to keep testing for both versions, but opted to just update to the latest version for simplicity. * wasm : use C++17 for emscripten builds This commit updates the CMakeLists.txt file to explicitly set the C++ standard to C++17 when building with Emscripten. The motivation for this change is that building with Emscripten will currently fail locally and on CI with the following error: ```console [ 75%] Building CXX object examples/CMakeFiles/common.dir/common-ggml.cpp.o In file included from /home/danbev/work/ai/whisper.cpp/examples/stream.wasm/emscripten.cpp:5: /home/danbev/work/utils/emsdk/upstream/emscripten/cache/sysroot/include/emscripten/bind.h:11:2: error: "embind requires -std=c++17 or newer" 11 \| #error "embind requires -std=c++17 or newer" \| ^ In file included from /home/danbev/work/ai/whisper.cpp/examples/whisper.wasm/emscripten.cpp:4: /home/danbev/work/utils/emsdk/upstream/emscripten/cache/sysroot/include/emscripten/bind.h:11:2: error: "embind requires -std=c++17 or newer" 11 \| #error "embind requires -std=c++17 or newer" \| ^ ```	2025-12-06 10:58:58 +01:00
Daniel Bevenius	19ceec8eac	examples : fix typo in vad-speech-segments command [no ci] (#3535 ) This commit corrects a typo the command-line argument for specifying the VAD model in the vad-speech-segments example.	2025-11-20 13:35:11 +01:00
gzq	40e788a5d1	readme : minor (#3516 )	2025-11-20 13:57:55 +02:00
YangLe	961aec7384	metal : fix compile on macos 11 (#3533 )	2025-11-20 13:54:54 +02:00
Georgi Gerganov	b12abefa9b	sync : llama.cpp	2025-11-17 21:05:46 +02:00
Georgi Gerganov	0e5deca8e2	sync : ggml	2025-11-17 21:05:46 +02:00
Georgi Gerganov	661567357c	metal : support I32 -> I32 copy (llama/17317)	2025-11-17 21:05:46 +02:00
Georgi Gerganov	74bb8a8b23	metal : faster argsort (llama/17315) * metal : faster argsort * cont : keep data in registers	2025-11-17 21:05:46 +02:00
Georgi Gerganov	57c0e6f8b6	metal : add cumsum (llama/17305)	2025-11-17 21:05:46 +02:00
hipudding	d3f5487464	CANN: Use smart pointers to manage ACL objects (llama/17238) * CANN: Use smart pointers to manage ACL objects Previously, ACL objects were managed via manual destruction, which led to multiple memory-leak issues during runtime. This patch replaces manual memory management with smart pointers so that ACL objects are properly released and ownership is clearly defined. Note that the ownership of an ACL object belongs to the function that creates it. Other internal functions should operate on these ACL objects using raw pointers to avoid unintended ownership transfers. Additionally, since aclTensorList automatically frees its contained aclTensor objects, any aclTensor added to a tensor list must release ownership to avoid double free operations. This PR also removes the asynchronous task submission mechanism. Due to changes in recent CANN versions, tiling time has significantly decreased. Even with a dual-thread submission model, the dispatch overhead still falls on the critical path, making async submission less beneficial. Moreover, aclGraph support provides a much better path to reducing operator dispatch latency. * CANN: resolve review comments	2025-11-17 21:05:46 +02:00
Pavels Zaicenkovs	9d95d9a1ee	vulkan: add LOG operation support for F32 and F16 (llama/17183) * vulkan: add LOG operation support for F32 and F16 Part of #14909. * vulkan: Fix LOG operation types * docs: Update operation support documentation for Vulkan LOG operation * vulkan: fix log_f16 shader * docs: restore missing LOG test cases and regenerate ops.md	2025-11-17 21:05:46 +02:00
Ruben Ortlam	f571655e8e	vulkan: fix MMQ quantize_y condition (llama/17301)	2025-11-17 21:05:46 +02:00
Georgi Gerganov	9549cc1051	metal : remove obosolete asserts (llama/17295)	2025-11-17 21:05:46 +02:00
lhez	a75525cad0	opencl: fix rms_norm_mul (llama/17250) * opencl: use subgrroup reduce for reduction in rms_norm_mul * opencl: add comment about workgroup size	2025-11-17 21:05:46 +02:00
shaofeiqi	c78845bfa9	opencl: add kernel to handle mat mul in attention to improve encoding speed (llama/17181) * Add mul_mm_f16_f32_kq_kqv kernel * Add ggml_cl_mul_mat_kq_kqv_adreno func * fix whitespace * remove unused variable * remove redundant * refactor and clean up * remove trailing whitespace	2025-11-17 21:05:46 +02:00
shani-f	1fd63da9f2	sycl : unify unary kernels with a generic implementation and enable wide operator support (llama/17213) * SYCL: add generic unary op implementation for multiple ops (ABS/SGN/…); unify non-contiguous access * SYCL: update documentation and sycl.csv to reflect new unary op support * update ops.md after syncing SYCL.csv changes * Fix SYCL.csv merge conflict * Update ops.md after fixing SYCL.csv conflicts * Fix SYCL.csv tail after merge conflict and regenerate ops.md * Fix line endings and final newline in SYCL.csv * Remove TOPK_MOE entries from SYCL.csv as requested * Update ops.md after removing TOPK_MOE from SYCL.csv * Regenerated SYCL.csv and synced ops.md with upstream * Update ops.md using create_ops_docs.py	2025-11-17 21:05:46 +02:00
Jeff Bolz	ea3ebd8b0d	vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (llama/17287) These both show up in gpt-oss. Also, cleanup the mul_mat_vec fusion code a bit.	2025-11-17 21:05:46 +02:00
Ruben Ortlam	7caea54450	vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (llama/17285)	2025-11-17 21:05:46 +02:00
Giuseppe Scrivano	4c4e663da0	vulkan: implement ABS and NEG (llama/17245) * docs: update Vulkan ops * vulkan: add NEG op * vulkan: add ABS op --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-11-17 21:05:46 +02:00
Jeff Bolz	e1846fc599	vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (llama/17244) * vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths * set allow_misalign	2025-11-17 21:05:46 +02:00
Jeff Bolz	9614a56314	vulkan: skip all-negative-inf blocks in FA (llama/17186)	2025-11-17 21:05:46 +02:00
Jeff Bolz	37d4bba152	vulkan: change graph_compute to be async and enable get_tensor_async (llama/17158) * vulkan: change graph_compute to be async and enable get_tensor_async This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize. * fix thread safety errors * teardown context cleanly * Handle async read to non-pinned dst	2025-11-17 21:05:46 +02:00
Georgi Gerganov	523a6c27ea	metal : support argsort for ne00 > 1024 (llama/17247) * metal : refactor argsort * cont : sort chunks * cont : merge sorted buckets * cont : cleanup	2025-11-17 21:05:46 +02:00
Georgi Gerganov	b4d7df3ba2	metal : make the FA extra sizes consistent (llama/17143)	2025-11-17 21:05:46 +02:00
Alberto Cabrera Pérez	a81fbfc78e	ggml-cpu: handle 3d tensors in repack mat_mul (llama/17241) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries * Address performance regression in Qwen and llama.cpp due to chunking	2025-11-17 21:05:46 +02:00
Piotr Wilkin (ilintar)	3e684f26c1	ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (llama/17063) * Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-17 21:05:46 +02:00
Ruben Ortlam	e8e0004fe5	vulkan: remove shell call from vulkan-shaders-gen tool, revert file check (llama/17219) * vulkan: remove shell call from vulkan-shaders-gen tool * use string vector for command execution * Fix condition * use string, remove const_cast * Fix dependency file quotation on Windows --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-11-17 21:05:46 +02:00
Diego Devesa	210f0f860b	sched : fix reserve ignoring user tensor assignments (llama/17232)	2025-11-17 21:05:46 +02:00

1 2 3 4 5 ...

3689 Commits All Branches Search

3689 Commits

All Branches