whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Sigbjørn Skjæret	3bb52acb46	metal : remove contiguous assertion for src0 in IM2COL (llama/15577) * remove contiguous assertion for src0 in IM2COL * add contiguous check in supports_op	2025-09-20 13:42:42 +03:00
Yoshi_likes_e4	9828caafb5	Add a warning for special devices (llama/15563) * Add warning * Print the devices names * Add newlines * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Fix vector names --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-09-20 13:42:42 +03:00
Jeff Bolz	79e2bd5ea8	vulkan: Remove splitting for mul_mat_id (llama/15568) row_ids only needs to hold the BN rows for the current tile.	2025-09-20 13:42:42 +03:00
Qeeweew	2468074e91	CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (llama/15451) * CUDA: optimize get_int_from_table_16 * CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs * revise documentation --------- Co-authored-by: xix <xiapc@outlook.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-09-20 13:42:41 +03:00
lhez	582ef379ab	opencl: fix support ops condition for `rms_norm` (llama/15560)	2025-09-20 13:42:41 +03:00
Ruben Ortlam	335d2a5405	vulkan: fix min subgroup 16 condition for mmid subgroup optimization (llama/15565)	2025-09-20 13:42:41 +03:00
Ihar Hrachyshka	8851ef5463	metal: fix regression when no metal devices are present (llama/15531)	2025-09-20 13:42:41 +03:00
Johannes Gäßler	1e856b2919	CUDA: MoE helper in device code, better tile sizes (llama/15525) * CUDA: MoE helper in device code, better tile sizes * reduce superfluous CUDA blocks	2025-09-20 13:42:41 +03:00
Georgi Gerganov	54be54f4ce	metal : add FA kernels for HS=40 (llama/15559) ggml-ci	2025-09-20 13:42:41 +03:00
Chenguang Li	86331f74e0	CANN: ROPE cache sin/cos repeat (llama/15501) Signed-off-by: noemotiovon <757486878@qq.com>	2025-09-20 13:42:41 +03:00
Ruben Ortlam	ee11ed42a9	vulkan: apply MUL_MAT_ID subgroup optimization to non-coopmat devices (llama/15524) * vulkan: use subgroup function for mul_mat_id shader even without coopmat * vulkan: fix compile warnings * vulkan: properly check for subgroup size control and require full subgroups for subgroup mul_mat_id * vulkan: disable subgroup mul_mat_id on devices with subgroups < 16	2025-09-20 13:42:41 +03:00
Jeff Bolz	85d4d2c875	vulkan: Support FA with any multiple of 8 head sizes (llama/15537) The scalar FA shader already handled multiples of 8. The coopmat1 FA shader assumed 16x16x16 and the shared memory allocations need the HSK dimensions padded to a multiple of 16. NVIDIA's coopmat2 implementation requires multiples of 16 for N and K, and needs the matrix dimensions padded and loads clamped. Store the FA pipelines in a map, indexed by the pipeline state.	2025-09-20 13:42:40 +03:00
Ruben Ortlam	8c7872d6ed	vulkan: enable Conv2D for Apple after MoltenVK fixed the bug (llama/15526)	2025-09-20 13:42:40 +03:00
Jeff Bolz	27817867cc	vulkan: workaround MoltenVK compile failure in multi_add (llama/15506) * vulkan: workaround MoltenVK compile failure in multi_add * Update ggml/src/ggml-vulkan/vulkan-shaders/multi_add.comp Co-authored-by: 0cc4m <picard12@live.de>	2025-09-20 13:42:40 +03:00
Johannes Gäßler	b0d15e1eb6	CUDA: fix half2 -> half conversion for HIP (llama/15529)	2025-09-20 13:42:40 +03:00
Jeff Bolz	2f6288c33c	vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (llama/15281) * vulkan: optimize rms_norm, and allow the work to spread across multiple SMs There are really two parts to this change: (1) Some optimizations similar to what we have in soft_max, to unroll with different numbers of iterations. (2) A fusion optimization where we detect add followed by rms_norm, and make the add shader atomically accumulate the values^2 into memory. Then the rms_norm shader can just load that sum. This allows the rms_norm to be parallelized across multiple workgroups, it just becomes a simple per-element multiply. The fusion optimization is currently only applied when the rms_norm is on a single vector. This previously always ran on a single SM. It could apply more broadly, but when there are other dimensions the work can already spread across SMs, and there would be some complexity to tracking multiple atomic sums. * Change add+rms_norm optimization to write out an array of partial sums rather than using atomic add, to make it deterministic. The rms_norm shader fetches a subgroup's worth in parallel and uses subgroupAdd to add them up. * complete rebase against fused adds - multi_add shader can also compute partial sums * fix validation errors * disable add_rms_fusion for Intel due to possible driver bug * resolve against #15489, sync after clearing partial sums	2025-09-20 13:42:40 +03:00
Jeff Bolz	d8eb9f7d67	vulkan: Rewrite synchronization to allow some overlap between nodes (llama/15489) Track a list of nodes that need synchronization, and only sync if the new node depends on them (or overwrites them). This allows some overlap which can improve performance, and centralizes a big chunk of the synchronization logic. The remaining synchronization logic involves writes to memory other than the nodes, e.g. for dequantization or split_k. Each of these allocations has a bool indicating whether they were in use and need to be synced. This should be checked before they are written to, and set to true after they are done being consumed.	2025-09-20 13:42:40 +03:00
Acly	5094171c37	vulkan : support ggml_mean (llama/15393) * vulkan : support ggml_mean * vulkan : support sum, sum_rows and mean with non-contiguous tensors * vulkan : fix subbuffer size not accounting for misalign offset * tests : add backend-op tests for non-contiguous sum_rows * cuda : require contiguous src for SUM_ROWS, MEAN support * sycl : require contiguous src for SUM, SUM_ROWS, ARGSORT support * require ggml_contiguous_rows in supports_op and expect nb00=1 in the shader	2025-09-20 13:42:40 +03:00
Jeff Bolz	485c5c3b3b	vulkan: optimize mul_mat_id loading row ids into shared memory (llama/15427) - Spread the work across the whole workgroup. Using more threads seems to far outweigh the synchronization overhead. - Specialize the code for when the division is by a power of two.	2025-09-20 13:42:40 +03:00
Reese Levine	bb5d7e2c31	ggml WebGPU: add support for quantization types (llama/15440) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments * Work on templating for different types in shaders * Work on shader type generation * Working q4_0 mul_mat and some templating for different types * Add q4_0_f16 matmul and fix device init * Add matmul support for basic quantization types * Add q2_k and q3_k quantization * Add rest of k-quants * Get firt i-quant working * Closer to supporting all i-quants * Support rest of i-quants * Cleanup code * Fix python formatting * debug * Bugfix for memset * Add padding to end of buffers on creation * Simplify bit-shifting * Update usage of StringView	2025-09-20 13:42:39 +03:00
rmatif	d7b7498e76	ggml: add `conv3d` op (llama/15182) * add conv3d * bump GGML_OP_COUNT	2025-09-20 13:42:39 +03:00
Yavor Ivanov	18ca4e8f63	cuda : add Pad Reflect 1D support (llama/14659) * Add Pad Reflect 1D CUDA support * Update ggml/src/ggml-cuda/pad_reflect_1d.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-09-20 13:42:39 +03:00
Aaron Teo	380d3db216	ggml-cpu: Support Q5_0 and Q5_1 on s390x (llama/15486) * ggml-cpu: initial q5_0 impl for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: updated q5_0 code for better performance Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: use optimised hsum for better performance Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: introduce q5_1 simd + refactor q5_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix incorrect return type vec_hsum Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: refactor q5_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: q5_1 update loop unroll to 4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update q5_0 unroll to 4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update build-s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update unused variables q5_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update the last update date Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-09-20 13:42:39 +03:00
Chenguang Li	be841c3f6e	CANN: Optimize RMS_NORM using cache (llama/15419) * [CANN] Optimize RMS_NORM using cache Signed-off-by: noemotiovon <757486878@qq.com> * fix typo Signed-off-by: noemotiovon <757486878@qq.com> * fix review comment Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>	2025-09-20 13:42:39 +03:00
Diego Devesa	554f96f385	sched : fix possible use of wrong ids tensor when offloading moe prompt processing (llama/15488)	2025-09-20 13:42:39 +03:00
Acly	9dd5039968	vulkan : support conv_2d_dw with f16 weights (llama/15392)	2025-09-20 13:42:39 +03:00
Dong Won Kim	7eebd498ff	vulkan: add exp operation (llama/15456) Co-authored-by: aeseulgi <kim2h7903@gmail.com>	2025-09-20 13:42:39 +03:00
Jeff Bolz	04d0f9a066	vulkan: Reuse conversion results in prealloc_y (llama/15410) * vulkan: Reuse conversion results in prealloc_y Cache the pipeline and tensor that were most recently used to fill prealloc_y, and skip the conversion if the current pipeline/tensor match. * don't use shared pointer for prealloc_y_last_pipeline_used	2025-09-20 13:42:38 +03:00
Xuan-Son Nguyen	c5874bcf42	ggml : fix condition of im2col on Metal backend (llama/15460)	2025-09-20 13:42:38 +03:00
R0CKSTAR	7c077845fd	musa: add GGML_UNUSED_VARS (llama/15446) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-20 13:42:38 +03:00
Diego Devesa	622dec5bf6	sched : copy only the used experts when offloading prompt processing (llama/15346)	2025-09-20 13:42:38 +03:00
Johannes Gäßler	8f0579a33d	CUDA: refactor FA support/selection code (llama/15454)	2025-09-20 13:42:38 +03:00
Johannes Gäßler	316ed78d68	CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433)	2025-09-20 13:42:38 +03:00
Jeff Bolz	5907ab3e4a	vulkan: shorten pipeline name strings (llama/15431) These detailed strings were causing increased build time on gcc.	2025-09-20 13:42:38 +03:00
R0CKSTAR	0eb2d653bd	musa: fix build warnings (llama/15258) * musa: fix build warnings Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare] Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-20 13:42:38 +03:00
lhez	db1d2380a0	opencl: mark `argsort` unsupported if cols exceed workgroup limit (llama/15375)	2025-09-20 13:42:37 +03:00
SHUAI YANG	2572322bac	CANN: optimize rope operator (llama/15335) * optimize rope ops * amendment * delete trailing whitespace * change the variable name	2025-09-20 13:42:37 +03:00
R0CKSTAR	02b49af98d	musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (llama/15413) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-20 13:42:37 +03:00
Marvin Gießing	2ce5860a62	ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (llama/15385) * Added VSX intrinsics for Power9+ systems Signed-off-by: mgiessing <marvin.giessing@gmail.com> * Manual unrolling for minor perf improvement Signed-off-by: mgiessing <marvin.giessing@gmail.com> * Update ggml/src/ggml-cpu/arch/powerpc/quants.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: mgiessing <marvin.giessing@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-09-20 13:42:37 +03:00
Georgi Gerganov	80447f7412	cuda : remove obsolete sources (ggml/1332) ggml-ci	2025-09-20 13:42:37 +03:00
Carlos Zoido	44fa2f647c	ggml : Fix MKL detection by quoting BLAS_INCLUDE_DIRS (#3426 ) While working on the [whisper-cpp](https://conan.io/center/recipes/whisper-cpp) Conan package for ConanCenter, I noticed that enabling the `with_blas` option fails to build due to an issue in the _MKL_ detection logic. The problem is that the CMake condition currently expands `BLAS_INCLUDE_DIRS` without quotes: ```cmake if (${BLAS_INCLUDE_DIRS} MATCHES "mkl" AND (${GGML_BLAS_VENDOR} MATCHES "Generic" OR ${GGML_BLAS_VENDOR} MATCHES "Intel")) ``` When `BLAS_INCLUDE_DIRS` is a list (as Conan provides it), the `if()` command receives multiple arguments and produces a CMake error: ```bash ... -- BLAS found, Includes: /root/.conan2/p/b/openb034c5a6ca927b/p/include;/root/.conan2/p/b/openb034c5a6ca927b/p/include/openblas CMake Error at ggml/src/ggml-blas/CMakeLists.txt:77 (if): if given arguments: "/root/.conan2/p/b/openb034c5a6ca927b/p/include" "/root/.conan2/p/b/openb034c5a6ca927b/p/include/openblas" "MATCHES" "mkl" "AND" "(" "OpenBLAS" "MATCHES" "Generic" "OR" "OpenBLAS" "MATCHES" "Intel" ")" Unknown arguments specified ... ``` This PR fixes the issue by quoting the variable: ```cmake if ("${BLAS_INCLUDE_DIRS}" MATCHES "mkl" AND (${GGML_BLAS_VENDOR} MATCHES "Generic" OR ${GGML_BLAS_VENDOR} MATCHES "Intel")) ``` With this change, the whole list is treated as a single string and the regex still works correctly.	2025-09-19 05:33:53 +02:00
Siva Mahadevan	edea8a9c3c	whisper : prefer curl over wget in download scripts (#3409 ) On busybox-based systems like Alpine Linux, wget does not have certain CLI flags such as '--no-config'. Thus, search for the existence of 'curl' first in the PATH before wget. wget2 is still the preferred download tool.	2025-09-08 06:32:19 +02:00
Daniel Bevenius	bb0e1fc60f	ci : remove brew installation of cmake for macos-latest (#3408 ) This commit remove the brew install of cmake for macos-latest as this now seems to be pre-installed on the runner. The motivation for this is that this job is failing with the following error: ```console Error: cmake was installed from the local/pinned tap but you are trying to install it from the homebrew/core tap. Formulae with the same name from different taps cannot be installed at the same time. ```	2025-09-05 15:20:32 +02:00
Daniel Bevenius	9bfc535130	tests : use CMake definitions for model/sample paths (#3406 ) This commit modifies the test-vad and test-vad-full tests to use CMake definitions for the model and sample paths. The motivation for this is that currently the tests use relative paths which might not always be correct depending on the working directory. With the changes in this commit the tests can be run usins ctest: ```console $ ctest -R ^test-vad$ --test-dir build ``` Or directly (which is not currently possible without this fix): ``` ./build/bin/test-vad ``` Resolves: https://github.com/ggml-org/whisper.cpp/issues/3404	2025-09-04 15:08:30 +02:00
Treboko	7745fcf328	Handle negative value in padding (#3389 ) this might happen depending on the way the $stderr.winsize is defined. If the expression "$stderr.winsize[1] - line.size" in Line 114 gets negative, we will get a "negative argument" exception in the padding calculation	2025-08-25 01:34:23 +09:00
Thea Mukhi	c09b0e0c4c	models : update`./models/download-ggml-model.cmd` to allow for tdrz download (#3381 ) * added patch to cmd to allow for tdrz download * remove @signs * Update models/download-ggml-model.cmd Add missing closing double quote. --------- Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2025-08-24 11:52:33 +02:00
Georgi Gerganov	fc45bb8625	talk-llama : sync llama.cpp ggml-ci	2025-08-18 20:30:45 +03:00
Georgi Gerganov	33c3c2fe2e	sync : ggml	2025-08-18 20:30:45 +03:00
Reese Levine	5ed45b2518	ggml: Add initial WebGPU backend (llama/14521) ggml-ci	2025-08-18 20:30:45 +03:00
Aaron Teo	03d6607691	ggml : initial zDNN backend (llama/14975)	2025-08-18 20:30:45 +03:00

... 21 22 23 24 25 ...

4210 Commits All Branches Search

4210 Commits

All Branches