Commit Graph

3192 Commits

Author SHA1 Message Date
Jeff Bolz 485c5c3b3b
vulkan: optimize mul_mat_id loading row ids into shared memory (llama/15427)
- Spread the work across the whole workgroup. Using more threads seems to
far outweigh the synchronization overhead.
- Specialize the code for when the division is by a power of two.
2025-09-20 13:42:40 +03:00
Reese Levine bb5d7e2c31
ggml WebGPU: add support for quantization types (llama/15440)
* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Work on templating for different types in shaders

* Work on shader type generation

* Working q4_0 mul_mat and some templating for different types

* Add q4_0_f16 matmul and fix device init

* Add matmul support for basic quantization types

* Add q2_k and q3_k quantization

* Add rest of k-quants

* Get firt i-quant working

* Closer to supporting all i-quants

* Support rest of i-quants

* Cleanup code

* Fix python formatting

* debug

* Bugfix for memset

* Add padding to end of buffers on creation

* Simplify bit-shifting

* Update usage of StringView
2025-09-20 13:42:39 +03:00
rmatif d7b7498e76
ggml: add `conv3d` op (llama/15182)
* add conv3d

* bump GGML_OP_COUNT
2025-09-20 13:42:39 +03:00
Yavor Ivanov 18ca4e8f63
cuda : add Pad Reflect 1D support (llama/14659)
* Add Pad Reflect 1D CUDA support

* Update ggml/src/ggml-cuda/pad_reflect_1d.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-09-20 13:42:39 +03:00
Aaron Teo 380d3db216
ggml-cpu: Support Q5_0 and Q5_1 on s390x (llama/15486)
* ggml-cpu: initial q5_0 impl for s390x

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: updated q5_0 code for better performance

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: use optimised hsum for better performance

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: introduce q5_1 simd + refactor q5_0

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: fix incorrect return type vec_hsum

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: refactor q5_1

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: q5_1 update loop unroll to 4

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: update q5_0 unroll to 4

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: update build-s390x docs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-cpu: update unused variables q5_0

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: update the last update date

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-20 13:42:39 +03:00
Chenguang Li be841c3f6e
CANN: Optimize RMS_NORM using cache (llama/15419)
* [CANN] Optimize RMS_NORM using cache

Signed-off-by: noemotiovon <757486878@qq.com>

* fix typo

Signed-off-by: noemotiovon <757486878@qq.com>

* fix review comment

Signed-off-by: noemotiovon <757486878@qq.com>

* codestyle adjustment

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>
2025-09-20 13:42:39 +03:00
Diego Devesa 554f96f385
sched : fix possible use of wrong ids tensor when offloading moe prompt processing (llama/15488) 2025-09-20 13:42:39 +03:00
Acly 9dd5039968
vulkan : support conv_2d_dw with f16 weights (llama/15392) 2025-09-20 13:42:39 +03:00
Dong Won Kim 7eebd498ff
vulkan: add exp operation (llama/15456)
Co-authored-by: aeseulgi <kim2h7903@gmail.com>
2025-09-20 13:42:39 +03:00
Jeff Bolz 04d0f9a066
vulkan: Reuse conversion results in prealloc_y (llama/15410)
* vulkan: Reuse conversion results in prealloc_y

Cache the pipeline and tensor that were most recently used to fill prealloc_y,
and skip the conversion if the current pipeline/tensor match.

* don't use shared pointer for prealloc_y_last_pipeline_used
2025-09-20 13:42:38 +03:00
Xuan-Son Nguyen c5874bcf42
ggml : fix condition of im2col on Metal backend (llama/15460) 2025-09-20 13:42:38 +03:00
R0CKSTAR 7c077845fd
musa: add GGML_UNUSED_VARS (llama/15446)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-09-20 13:42:38 +03:00
Diego Devesa 622dec5bf6
sched : copy only the used experts when offloading prompt processing (llama/15346) 2025-09-20 13:42:38 +03:00
Johannes Gäßler 8f0579a33d
CUDA: refactor FA support/selection code (llama/15454) 2025-09-20 13:42:38 +03:00
Johannes Gäßler 316ed78d68
CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433) 2025-09-20 13:42:38 +03:00
Jeff Bolz 5907ab3e4a
vulkan: shorten pipeline name strings (llama/15431)
These detailed strings were causing increased build time on gcc.
2025-09-20 13:42:38 +03:00
R0CKSTAR 0eb2d653bd
musa: fix build warnings (llama/15258)
* musa: fix build warnings

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare]

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-09-20 13:42:38 +03:00
lhez db1d2380a0
opencl: mark `argsort` unsupported if cols exceed workgroup limit (llama/15375) 2025-09-20 13:42:37 +03:00
SHUAI YANG 2572322bac
CANN: optimize rope operator (llama/15335)
* optimize rope ops

* amendment

* delete trailing whitespace

* change the variable name
2025-09-20 13:42:37 +03:00
R0CKSTAR 02b49af98d
musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (llama/15413)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-09-20 13:42:37 +03:00
Marvin Gießing 2ce5860a62
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (llama/15385)
* Added VSX intrinsics for Power9+ systems

Signed-off-by: mgiessing <marvin.giessing@gmail.com>

* Manual unrolling for minor perf improvement

Signed-off-by: mgiessing <marvin.giessing@gmail.com>

* Update ggml/src/ggml-cpu/arch/powerpc/quants.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Signed-off-by: mgiessing <marvin.giessing@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-09-20 13:42:37 +03:00
Georgi Gerganov 80447f7412
cuda : remove obsolete sources (ggml/1332)
ggml-ci
2025-09-20 13:42:37 +03:00
Carlos Zoido 44fa2f647c
ggml : Fix MKL detection by quoting BLAS_INCLUDE_DIRS (#3426)
While working on the [whisper-cpp](https://conan.io/center/recipes/whisper-cpp) Conan package for ConanCenter, I noticed that enabling the `with_blas` option fails to build due to an issue in the _MKL_ detection logic.  

The problem is that the CMake condition currently expands `BLAS_INCLUDE_DIRS` without quotes:

```cmake
if (${BLAS_INCLUDE_DIRS} MATCHES "mkl" AND (${GGML_BLAS_VENDOR} MATCHES "Generic" OR ${GGML_BLAS_VENDOR} MATCHES "Intel"))
```
When `BLAS_INCLUDE_DIRS` is a list (as Conan provides it), the `if()` command receives multiple arguments and produces a CMake error:

```bash
...
-- BLAS found, Includes: /root/.conan2/p/b/openb034c5a6ca927b/p/include;/root/.conan2/p/b/openb034c5a6ca927b/p/include/openblas
CMake Error at ggml/src/ggml-blas/CMakeLists.txt:77 (if):
  if given arguments:

    "/root/.conan2/p/b/openb034c5a6ca927b/p/include" "/root/.conan2/p/b/openb034c5a6ca927b/p/include/openblas" "MATCHES" "mkl" "AND" "(" "OpenBLAS" "MATCHES" "Generic" "OR" "OpenBLAS" "MATCHES" "Intel" ")"

  Unknown arguments specified
...
```
This PR fixes the issue by quoting the variable:

```cmake
if ("${BLAS_INCLUDE_DIRS}" MATCHES "mkl" AND (${GGML_BLAS_VENDOR} MATCHES "Generic" OR ${GGML_BLAS_VENDOR} MATCHES "Intel"))
```

With this change, the whole list is treated as a single string and the regex still works correctly.
2025-09-19 05:33:53 +02:00
Siva Mahadevan edea8a9c3c
whisper : prefer curl over wget in download scripts (#3409)
On busybox-based systems like Alpine Linux, wget does not have
certain CLI flags such as '--no-config'. Thus, search for the
existence of 'curl' first in the PATH before wget. wget2 is
still the preferred download tool.
2025-09-08 06:32:19 +02:00
Daniel Bevenius bb0e1fc60f
ci : remove brew installation of cmake for macos-latest (#3408)
This commit remove the brew install of cmake for macos-latest
as this now seems to be pre-installed on the runner.

The motivation for this is that this job is failing with the following
error:
```console
Error: cmake was installed from the local/pinned tap
but you are trying to install it from the homebrew/core tap.
Formulae with the same name from different taps cannot be installed at the same time.
```
2025-09-05 15:20:32 +02:00
Daniel Bevenius 9bfc535130
tests : use CMake definitions for model/sample paths (#3406)
This commit modifies the test-vad and test-vad-full tests to use CMake
definitions for the model and sample paths.

The motivation for this is that currently the tests use relative paths
which might not always be correct depending on the working directory.
With the changes in this commit the tests can be run usins ctest:
```console
$ ctest -R ^test-vad$ --test-dir build
```
Or directly (which is not currently possible without this fix):
```
./build/bin/test-vad
```

Resolves: https://github.com/ggml-org/whisper.cpp/issues/3404
2025-09-04 15:08:30 +02:00
Treboko 7745fcf328
Handle negative value in padding (#3389)
this might happen depending on the way the $stderr.winsize is defined. If the expression "$stderr.winsize[1] - line.size" in Line 114 gets negative, we will get a "negative argument" exception in the padding calculation
2025-08-25 01:34:23 +09:00
Thea Mukhi c09b0e0c4c
models : update`./models/download-ggml-model.cmd` to allow for tdrz download (#3381)
* added patch to cmd to allow for tdrz download

* remove @signs

* Update models/download-ggml-model.cmd

Add missing closing double quote.

---------

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2025-08-24 11:52:33 +02:00
Georgi Gerganov fc45bb8625 talk-llama : sync llama.cpp
ggml-ci
2025-08-18 20:30:45 +03:00
Georgi Gerganov 33c3c2fe2e sync : ggml 2025-08-18 20:30:45 +03:00
Reese Levine 5ed45b2518 ggml: Add initial WebGPU backend (llama/14521)
ggml-ci
2025-08-18 20:30:45 +03:00
Aaron Teo 03d6607691 ggml : initial zDNN backend (llama/14975) 2025-08-18 20:30:45 +03:00
Georgi Gerganov 7fd2fbde45 common : handle mxfp4 enum
ggml-ci
2025-08-18 20:30:45 +03:00
compilade 0fd4a250df ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (llama/15379)
* ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors

* ggml-quants : avoid division by zero in make_q3_quants
2025-08-18 20:30:45 +03:00
Jeff Bolz fcd694ec1a vulkan: disable spirv-opt for bfloat16 shaders (llama/15352) 2025-08-18 20:30:45 +03:00
Jeff Bolz 6835e0cf77 vulkan: Use larger workgroups for mul_mat_vec when M is small (llama/15355)
* vulkan: Use larger workgroups for mul_mat_vec when M is small

Also use subgroup instructions for (part of) the reduction when supported.
Without this, the more expensive reductions would eat into the benefits of
the larger workgroups.

* update heuristic for amd/intel

Co-authored-by: 0cc4m <picard12@live.de>

---------

Co-authored-by: 0cc4m <picard12@live.de>
2025-08-18 20:30:45 +03:00
Dong Won Kim c225f25907 vulkan: support sqrt (llama/15370) 2025-08-18 20:30:45 +03:00
Jeff Bolz 0a8285186a vulkan: Optimize argsort (llama/15354)
- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.
2025-08-18 20:30:45 +03:00
Jeff Bolz c44d449635 vulkan: fuse adds (llama/15252)
* vulkan: fuse adds

Fuse adds that have the same shape, which are common in MoE models.
It will currently fuse up to 6 adds, because we assume no more than
8 descriptors per dispatch. But this could be changed.

* check runtimeDescriptorArray feature

* disable multi_add for Intel due to likely driver bug
2025-08-18 20:30:45 +03:00
Jeff Bolz d14e626e6a vulkan: Support mul_mat_id with f32 accumulators (llama/15337)
* vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id

* vulkan: Support mul_mat_id with f32 accumulators, but they are not hooked up

- There's no explicit way to request f32 precision for mul_mat_id, but there
probably should be, and this gets the code in place for that.
- A couple fixes to check_results.
- Remove casts to fp16 in coopmat1 FA shader (found by inspection).
2025-08-18 20:30:45 +03:00
Jeff Bolz 5b62995350 vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (llama/15334) 2025-08-18 20:30:45 +03:00
rmatif e27f4f205d OpenCL: add initial FA support (llama/14987)
* add F16/F16 fa support

* fix kernel init

* use mad instead of fma

* use inline function

* mark FA with sinks as unsupported for now

* add pragma unroll to loops
2025-08-18 20:30:45 +03:00
lhez 77771b2711 opencl: add initial mxfp4 support via mv (llama/15270)
* opencl: add reference `mul_mv_mxfp4_f32`

* opencl: add reference `mul_mv_id` for mxfp4

* Q4_0 tranpose fix for Adreno

---------

Co-authored-by: shawngu-quic <shawngu@qti.qualcomm.com>
2025-08-18 20:30:45 +03:00
Georgi Gerganov 1e8d692365 vulkan : fix out-of-bounds access in argmax kernel (llama/15342)
ggml-ci
2025-08-18 20:30:45 +03:00
Georgi Gerganov 1a92fde1b6 vulkan : fix compile warnings on macos (llama/15340)
ggml-ci
2025-08-18 20:30:45 +03:00
Aaron Teo f797a6f9c8 ggml: initial IBM zDNN backend (llama/14975)
* ggml-zdnn: inital backend impl

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: temp change z17 to arch15

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: fix build bugs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: tensor->extra logging check

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add layout name mapping, ztensor information

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: separate logging into its own line

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add shape comparison

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add ggml_tensor shape log

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: fix incorrect shape logging

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add output buffer check

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: run compute and store into tensor->extra

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add set_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add more loggers

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update set_tensor logging to check only for matmul

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: last working matmul version

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add comments to prevent accidentally deleting lines

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: support op out_prod

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update op out_prod to use tensor->extra

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rewrite the backend implementation

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: bugfix new impl

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix compiler warnings and bugfixes

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: test ztensor finding in init_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: implement at least 1 op to test

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: assign tensor->extra to buffer

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add check for view tensors to prevent init_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rework init_tensor to create new buffers

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to std vector instead of array

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch buffers back and set to arbitrary number

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: impl init_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update supports_op matmul matrix

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: impl matmul

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix compiler error missing type

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix missing data transform call

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add bias init_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: tighten memory usage, change string allocation

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add bias ztensor and data free

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add bias data transform

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add more debug info for extra buffer transform

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add logger to check if mat mul ops go through set_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: activate bias transform in matmul

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move weights transform into mulmat

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add more safeguards in matmul

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix sequencing of transforms

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: bugfix transform ztensor vs origtensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: figure out why sigtrap is happening

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix sigsegv

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move everything back to local declaration

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: move bias data to local also

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: bring back working matmul

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: rewrite into mre

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix missing vector import

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix missing vector import in header

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: attempt to fix sigsegv

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix missing load tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix invalid ztensor buffer release

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add logging to debug free buffer

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: remove free_buffer debug info

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add parmblkformat detections

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add nnpa installed detection

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add zdnn_init call for static libs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add init_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: attempt at fixing invalid buffer

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: switch to using deque to fix pointer deref problem

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add weights logging to check

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: attempt to use unique ptr

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add tensor to pre_tfm_desc logging

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add inputs logging

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: disable op_none initialisation for testing

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix missing return from init_tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: load ztensors in cgraph exec

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: work on moving output ztensor as well

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: disable logging and breakpoints for full test

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: attempt at manually changing the layout

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: attempt at using default nwhc format instead

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: disable global load ztensor for now

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix errorenous output load tensor

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: add guards to prevent loading ztensor if transformed

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code cleanup

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: bring load ztensor back to init routine

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: code clean up

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix ztensor deallocation abort

stabilise ggml <-> zdnn api

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: clean up matmul selection

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: clean up project structure

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: update documentation, prepare for upstream

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* chore: add codeowners

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: disable batched matmul

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: attempt at fixing tensor views during matmul

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: deny all view tensors directly

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix pr comments

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* docs: update ops docs for zdnn

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: redo test-backend-ops for ops.md

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ggml-zdnn: fix typo in build-s390x.md

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* codeowners: remove taronaeo for now

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Revert "codeowners: remove taronaeo for now"

This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f.

* ggml-zdnn: remove unused ggml_zdnn macro

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-18 20:30:45 +03:00
Johannes Gäßler ba32f5df0a CUDA: fix negative KV_max values in FA (llama/15321) 2025-08-18 20:30:45 +03:00
uvos 0e15332255 HIP: Cleanup hipification header (llama/15285)
add expicit conversion operator to support older versions of rocm
Switch over to hip_bf16 from legacy hip_bfloat16
Simplify RDNA3 define
Reduce swap over of new hipblas api to rocm 6.5 as this version is used for rocm 7.0 previews

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-08-18 20:30:45 +03:00
Jeff Bolz 1d8b21caa0 vulkan: perf_logger improvements (llama/15246)
* vulkan: perf_logger improvements

- Account for batch dimension in flops calculation.
- Fix how "_VEC" is detected for mat_mul_id.
- Fix "n" dimension for mat_mul_id (in case of broadcasting).
- Include a->type in name.

* use <=mul_mat_vec_max_cols rather than ==1
2025-08-18 20:30:45 +03:00
Jason Ni 4a6cf896ad ggml: fix ggml_conv_1d_dw bug (ggml/1323)
* ggml: fix ggml_conv_1d_dw bug

* Fixed conv1d_dw weight tensor dimension.
2025-08-18 20:30:45 +03:00