whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Jeff Bolz	485c5c3b3b	vulkan: optimize mul_mat_id loading row ids into shared memory (llama/15427) - Spread the work across the whole workgroup. Using more threads seems to far outweigh the synchronization overhead. - Specialize the code for when the division is by a power of two.	2025-09-20 13:42:40 +03:00
Reese Levine	bb5d7e2c31	ggml WebGPU: add support for quantization types (llama/15440) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments * Work on templating for different types in shaders * Work on shader type generation * Working q4_0 mul_mat and some templating for different types * Add q4_0_f16 matmul and fix device init * Add matmul support for basic quantization types * Add q2_k and q3_k quantization * Add rest of k-quants * Get firt i-quant working * Closer to supporting all i-quants * Support rest of i-quants * Cleanup code * Fix python formatting * debug * Bugfix for memset * Add padding to end of buffers on creation * Simplify bit-shifting * Update usage of StringView	2025-09-20 13:42:39 +03:00
rmatif	d7b7498e76	ggml: add `conv3d` op (llama/15182) * add conv3d * bump GGML_OP_COUNT	2025-09-20 13:42:39 +03:00
Yavor Ivanov	18ca4e8f63	cuda : add Pad Reflect 1D support (llama/14659) * Add Pad Reflect 1D CUDA support * Update ggml/src/ggml-cuda/pad_reflect_1d.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-09-20 13:42:39 +03:00
Aaron Teo	380d3db216	ggml-cpu: Support Q5_0 and Q5_1 on s390x (llama/15486) * ggml-cpu: initial q5_0 impl for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: updated q5_0 code for better performance Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: use optimised hsum for better performance Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: introduce q5_1 simd + refactor q5_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix incorrect return type vec_hsum Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: refactor q5_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: q5_1 update loop unroll to 4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update q5_0 unroll to 4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update build-s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update unused variables q5_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update the last update date Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-09-20 13:42:39 +03:00
Chenguang Li	be841c3f6e	CANN: Optimize RMS_NORM using cache (llama/15419) * [CANN] Optimize RMS_NORM using cache Signed-off-by: noemotiovon <757486878@qq.com> * fix typo Signed-off-by: noemotiovon <757486878@qq.com> * fix review comment Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>	2025-09-20 13:42:39 +03:00
Diego Devesa	554f96f385	sched : fix possible use of wrong ids tensor when offloading moe prompt processing (llama/15488)	2025-09-20 13:42:39 +03:00
Acly	9dd5039968	vulkan : support conv_2d_dw with f16 weights (llama/15392)	2025-09-20 13:42:39 +03:00
Dong Won Kim	7eebd498ff	vulkan: add exp operation (llama/15456) Co-authored-by: aeseulgi <kim2h7903@gmail.com>	2025-09-20 13:42:39 +03:00
Jeff Bolz	04d0f9a066	vulkan: Reuse conversion results in prealloc_y (llama/15410) * vulkan: Reuse conversion results in prealloc_y Cache the pipeline and tensor that were most recently used to fill prealloc_y, and skip the conversion if the current pipeline/tensor match. * don't use shared pointer for prealloc_y_last_pipeline_used	2025-09-20 13:42:38 +03:00
Xuan-Son Nguyen	c5874bcf42	ggml : fix condition of im2col on Metal backend (llama/15460)	2025-09-20 13:42:38 +03:00
R0CKSTAR	7c077845fd	musa: add GGML_UNUSED_VARS (llama/15446) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-20 13:42:38 +03:00
Diego Devesa	622dec5bf6	sched : copy only the used experts when offloading prompt processing (llama/15346)	2025-09-20 13:42:38 +03:00
Johannes Gäßler	8f0579a33d	CUDA: refactor FA support/selection code (llama/15454)	2025-09-20 13:42:38 +03:00
Johannes Gäßler	316ed78d68	CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433)	2025-09-20 13:42:38 +03:00
Jeff Bolz	5907ab3e4a	vulkan: shorten pipeline name strings (llama/15431) These detailed strings were causing increased build time on gcc.	2025-09-20 13:42:38 +03:00
R0CKSTAR	0eb2d653bd	musa: fix build warnings (llama/15258) * musa: fix build warnings Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare] Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-20 13:42:38 +03:00
lhez	db1d2380a0	opencl: mark `argsort` unsupported if cols exceed workgroup limit (llama/15375)	2025-09-20 13:42:37 +03:00
SHUAI YANG	2572322bac	CANN: optimize rope operator (llama/15335) * optimize rope ops * amendment * delete trailing whitespace * change the variable name	2025-09-20 13:42:37 +03:00
R0CKSTAR	02b49af98d	musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (llama/15413) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-20 13:42:37 +03:00
Marvin Gießing	2ce5860a62	ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (llama/15385) * Added VSX intrinsics for Power9+ systems Signed-off-by: mgiessing <marvin.giessing@gmail.com> * Manual unrolling for minor perf improvement Signed-off-by: mgiessing <marvin.giessing@gmail.com> * Update ggml/src/ggml-cpu/arch/powerpc/quants.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: mgiessing <marvin.giessing@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-09-20 13:42:37 +03:00
Georgi Gerganov	80447f7412	cuda : remove obsolete sources (ggml/1332) ggml-ci	2025-09-20 13:42:37 +03:00
Carlos Zoido	44fa2f647c	ggml : Fix MKL detection by quoting BLAS_INCLUDE_DIRS (#3426 ) While working on the [whisper-cpp](https://conan.io/center/recipes/whisper-cpp) Conan package for ConanCenter, I noticed that enabling the `with_blas` option fails to build due to an issue in the _MKL_ detection logic. The problem is that the CMake condition currently expands `BLAS_INCLUDE_DIRS` without quotes: ```cmake if (${BLAS_INCLUDE_DIRS} MATCHES "mkl" AND (${GGML_BLAS_VENDOR} MATCHES "Generic" OR ${GGML_BLAS_VENDOR} MATCHES "Intel")) ``` When `BLAS_INCLUDE_DIRS` is a list (as Conan provides it), the `if()` command receives multiple arguments and produces a CMake error: ```bash ... -- BLAS found, Includes: /root/.conan2/p/b/openb034c5a6ca927b/p/include;/root/.conan2/p/b/openb034c5a6ca927b/p/include/openblas CMake Error at ggml/src/ggml-blas/CMakeLists.txt:77 (if): if given arguments: "/root/.conan2/p/b/openb034c5a6ca927b/p/include" "/root/.conan2/p/b/openb034c5a6ca927b/p/include/openblas" "MATCHES" "mkl" "AND" "(" "OpenBLAS" "MATCHES" "Generic" "OR" "OpenBLAS" "MATCHES" "Intel" ")" Unknown arguments specified ... ``` This PR fixes the issue by quoting the variable: ```cmake if ("${BLAS_INCLUDE_DIRS}" MATCHES "mkl" AND (${GGML_BLAS_VENDOR} MATCHES "Generic" OR ${GGML_BLAS_VENDOR} MATCHES "Intel")) ``` With this change, the whole list is treated as a single string and the regex still works correctly.	2025-09-19 05:33:53 +02:00
Siva Mahadevan	edea8a9c3c	whisper : prefer curl over wget in download scripts (#3409 ) On busybox-based systems like Alpine Linux, wget does not have certain CLI flags such as '--no-config'. Thus, search for the existence of 'curl' first in the PATH before wget. wget2 is still the preferred download tool.	2025-09-08 06:32:19 +02:00
Daniel Bevenius	bb0e1fc60f	ci : remove brew installation of cmake for macos-latest (#3408 ) This commit remove the brew install of cmake for macos-latest as this now seems to be pre-installed on the runner. The motivation for this is that this job is failing with the following error: ```console Error: cmake was installed from the local/pinned tap but you are trying to install it from the homebrew/core tap. Formulae with the same name from different taps cannot be installed at the same time. ```	2025-09-05 15:20:32 +02:00
Daniel Bevenius	9bfc535130	tests : use CMake definitions for model/sample paths (#3406 ) This commit modifies the test-vad and test-vad-full tests to use CMake definitions for the model and sample paths. The motivation for this is that currently the tests use relative paths which might not always be correct depending on the working directory. With the changes in this commit the tests can be run usins ctest: ```console $ ctest -R ^test-vad$ --test-dir build ``` Or directly (which is not currently possible without this fix): ``` ./build/bin/test-vad ``` Resolves: https://github.com/ggml-org/whisper.cpp/issues/3404	2025-09-04 15:08:30 +02:00
Treboko	7745fcf328	Handle negative value in padding (#3389 ) this might happen depending on the way the $stderr.winsize is defined. If the expression "$stderr.winsize[1] - line.size" in Line 114 gets negative, we will get a "negative argument" exception in the padding calculation	2025-08-25 01:34:23 +09:00
Thea Mukhi	c09b0e0c4c	models : update`./models/download-ggml-model.cmd` to allow for tdrz download (#3381 ) * added patch to cmd to allow for tdrz download * remove @signs * Update models/download-ggml-model.cmd Add missing closing double quote. --------- Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2025-08-24 11:52:33 +02:00
Georgi Gerganov	fc45bb8625	talk-llama : sync llama.cpp ggml-ci	2025-08-18 20:30:45 +03:00
Georgi Gerganov	33c3c2fe2e	sync : ggml	2025-08-18 20:30:45 +03:00
Reese Levine	5ed45b2518	ggml: Add initial WebGPU backend (llama/14521) ggml-ci	2025-08-18 20:30:45 +03:00
Aaron Teo	03d6607691	ggml : initial zDNN backend (llama/14975)	2025-08-18 20:30:45 +03:00
Georgi Gerganov	7fd2fbde45	common : handle mxfp4 enum ggml-ci	2025-08-18 20:30:45 +03:00
compilade	0fd4a250df	ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (llama/15379) * ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors * ggml-quants : avoid division by zero in make_q3_quants	2025-08-18 20:30:45 +03:00
Jeff Bolz	fcd694ec1a	vulkan: disable spirv-opt for bfloat16 shaders (llama/15352)	2025-08-18 20:30:45 +03:00
Jeff Bolz	6835e0cf77	vulkan: Use larger workgroups for mul_mat_vec when M is small (llama/15355) * vulkan: Use larger workgroups for mul_mat_vec when M is small Also use subgroup instructions for (part of) the reduction when supported. Without this, the more expensive reductions would eat into the benefits of the larger workgroups. * update heuristic for amd/intel Co-authored-by: 0cc4m <picard12@live.de> --------- Co-authored-by: 0cc4m <picard12@live.de>	2025-08-18 20:30:45 +03:00
Dong Won Kim	c225f25907	vulkan: support sqrt (llama/15370)	2025-08-18 20:30:45 +03:00
Jeff Bolz	0a8285186a	vulkan: Optimize argsort (llama/15354) - Launch an appropriate number of invocations (next larger power of two). 32 invocations is common and the barrier is much cheaper there. - Specialize for "needs bounds checking" vs not. - Make the code less branchy and [[unroll]] the loops. In the final code, I see no branches inside the main loop (only predicated stores) when needs_bounds_check is false. - Always sort ascending, then apply the ascending vs descending option when doing the final stores to memory. - Copy the values into shared memory, makes them slightly cheaper to access.	2025-08-18 20:30:45 +03:00
Jeff Bolz	c44d449635	vulkan: fuse adds (llama/15252) * vulkan: fuse adds Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed. * check runtimeDescriptorArray feature * disable multi_add for Intel due to likely driver bug	2025-08-18 20:30:45 +03:00
Jeff Bolz	d14e626e6a	vulkan: Support mul_mat_id with f32 accumulators (llama/15337) * vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id * vulkan: Support mul_mat_id with f32 accumulators, but they are not hooked up - There's no explicit way to request f32 precision for mul_mat_id, but there probably should be, and this gets the code in place for that. - A couple fixes to check_results. - Remove casts to fp16 in coopmat1 FA shader (found by inspection).	2025-08-18 20:30:45 +03:00
Jeff Bolz	5b62995350	vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (llama/15334)	2025-08-18 20:30:45 +03:00
rmatif	e27f4f205d	OpenCL: add initial FA support (llama/14987) * add F16/F16 fa support * fix kernel init * use mad instead of fma * use inline function * mark FA with sinks as unsupported for now * add pragma unroll to loops	2025-08-18 20:30:45 +03:00
lhez	77771b2711	opencl: add initial mxfp4 support via mv (llama/15270) * opencl: add reference `mul_mv_mxfp4_f32` * opencl: add reference `mul_mv_id` for mxfp4 * Q4_0 tranpose fix for Adreno --------- Co-authored-by: shawngu-quic <shawngu@qti.qualcomm.com>	2025-08-18 20:30:45 +03:00
Georgi Gerganov	1e8d692365	vulkan : fix out-of-bounds access in argmax kernel (llama/15342) ggml-ci	2025-08-18 20:30:45 +03:00
Georgi Gerganov	1a92fde1b6	vulkan : fix compile warnings on macos (llama/15340) ggml-ci	2025-08-18 20:30:45 +03:00
Aaron Teo	f797a6f9c8	ggml: initial IBM zDNN backend (llama/14975) * ggml-zdnn: inital backend impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: temp change z17 to arch15 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: fix build bugs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: tensor->extra logging check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add layout name mapping, ztensor information Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: separate logging into its own line Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add shape comparison Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add ggml_tensor shape log Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: fix incorrect shape logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add output buffer check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: run compute and store into tensor->extra Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add set_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more loggers Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update set_tensor logging to check only for matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: last working matmul version Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add comments to prevent accidentally deleting lines Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: support op out_prod Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update op out_prod to use tensor->extra Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rewrite the backend implementation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bugfix new impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix compiler warnings and bugfixes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: test ztensor finding in init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: implement at least 1 op to test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: assign tensor->extra to buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add check for view tensors to prevent init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rework init_tensor to create new buffers Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to std vector instead of array Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch buffers back and set to arbitrary number Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: impl init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update supports_op matmul matrix Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix incorrect ztensor shape, reduce memory padding Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: impl matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix compiler error missing type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing data transform call Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: tighten memory usage, change string allocation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias ztensor and data free Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias data transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more debug info for extra buffer transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add logger to check if mat mul ops go through set_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: activate bias transform in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move weights transform into mulmat Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more safeguards in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix sequencing of transforms Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bugfix transform ztensor vs origtensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: figure out why sigtrap is happening Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move everything back to local declaration Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move bias data to local also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bring back working matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rewrite into mre Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing vector import Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing vector import in header Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt to fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix invalid ztensor buffer release Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * chore: add codeowners Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable batched matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at fixing tensor views during matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: deny all view tensors directly Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix pr comments Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update ops docs for zdnn Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: redo test-backend-ops for ops.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix typo in build-s390x.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * codeowners: remove taronaeo for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "codeowners: remove taronaeo for now" This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f. * ggml-zdnn: remove unused ggml_zdnn macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-08-18 20:30:45 +03:00
Johannes Gäßler	ba32f5df0a	CUDA: fix negative KV_max values in FA (llama/15321)	2025-08-18 20:30:45 +03:00
uvos	0e15332255	HIP: Cleanup hipification header (llama/15285) add expicit conversion operator to support older versions of rocm Switch over to hip_bf16 from legacy hip_bfloat16 Simplify RDNA3 define Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-08-18 20:30:45 +03:00
Jeff Bolz	1d8b21caa0	vulkan: perf_logger improvements (llama/15246) * vulkan: perf_logger improvements - Account for batch dimension in flops calculation. - Fix how "_VEC" is detected for mat_mul_id. - Fix "n" dimension for mat_mul_id (in case of broadcasting). - Include a->type in name. * use <=mul_mat_vec_max_cols rather than ==1	2025-08-18 20:30:45 +03:00
Jason Ni	4a6cf896ad	ggml: fix ggml_conv_1d_dw bug (ggml/1323) * ggml: fix ggml_conv_1d_dw bug * Fixed conv1d_dw weight tensor dimension.	2025-08-18 20:30:45 +03:00

1 2 3 4 5 ...

3192 Commits All Branches Search

3192 Commits

All Branches