whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Dmytro Minochkin	88dd9e0d45	vulkan: throw system error instead of SIGABRT during init on older devices (llama/16156) * Throw system error on old Vulkan driver rather than SIGABRT * Optionally handle any potential error in vulkan init	2025-09-29 15:18:11 +03:00
Jeff Bolz	97bd65f90f	vulkan: support GET_ROWS for k-quants (llama/16235) The dequantize functions are copy/pasted from mul_mm_funcs.comp with very few changes - add a_offset and divide iqs by 2. It's probably possible to call these functions from mul_mm_funcs and avoid the duplication, but I didn't go that far in this change.	2025-09-29 15:18:11 +03:00
Aaron Teo	23b3598952	devops: add s390x & ppc64le CI (llama/15925) * devops: move s390x and ppc64le ci build we have access to ubuntu-24.04-s390x and ppc64le images now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le for now since they have compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: stop warnings as errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: switch to non-macro flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: going the llama macro route Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian gguf test models Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le to test s390x, check test build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.inp files for big-endian tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.out files for big-endian too Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add python setup and endian byteswap Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: pooring thing does not have s390x python3 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add missing rust compiler for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try rust actions runner Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "devops: try rust actions runner" This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try a different path for rust Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dump home directory and user info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: install gguf-py only Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: missed relative path Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: remove big-endian files since local swapping is working Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: revert test-tokenizer-0 cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix unicode flags conversion from and to uint16_t Bitfields are allocated in different order on s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Simplify byteswap command Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix endianness detection in vocab loader Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Disable test-thread-safety on s390x In this test a model is downloaded, then immediately loaded to check if more downloads are needed, and then used for test. There is no clean way to separate all those steps to add byteswapping between them, so just skip this test. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q8_0 test in test-quantize-fns vec_signed uses unexpected rounding mode. Explicitly use different rounding function. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian stories260K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add s390x test-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix test does not exist Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix model not found llama-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q3_K dot product error in test-quantize-fns on s390x Array q8bytes had only 4 elements allocated, but 8 elements accessed. This lead to write out of bounds and later read of overwritten values out of bounds and incorrect result. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: re-enable ppc64le for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: activate test-thread-safety for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le tests for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: LLAMA_FATAL_WARNINGS=ON Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Correct repository URL for s390x for test-thread-safety model Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix fs_get_cache_directory Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Re-enable CI for ppc64le Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fortify ggml_rope_impl Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way * Update URL for big-endian model * Update .github/workflows/build.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update remaining mentions of BE models to ggml-org/models repo --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Co-authored-by: Aleksei Nikiforov <103434461+AlekseiNikiforovIBM@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-09-29 15:18:11 +03:00
Georgi Gerganov	670d54ef5d	metal : report OOM errors (llama/16274)	2025-09-29 15:18:11 +03:00
Adrien Gallouët	9823c5cc51	common : use cpp-httplib as a cURL alternative for downloads (llama/16185) * vendor : update httplib Signed-off-by: Adrien Gallouët <angt@huggingface.co> * common : use cpp-httplib as a cURL alternative for downloads The existing cURL implementation is intentionally left untouched to prevent any regressions and to allow for safe, side-by-side testing by toggling the `LLAMA_CURL` CMake option. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * ggml : Bump to Windows 10 Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-09-29 15:18:11 +03:00
Aaron Teo	89a7b4d22c	ggml-cpu: implement MXFP4 SIMD for s390x (llama/16193) * ggml-cpu: impl mxfp4 s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: missing s = sumf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix incorrect kval_mxfp4 type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rework mxfp4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: missing delta calc Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix typo Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix typo for vec_splats Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: expand to 2 blocks per loop Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add unroll to boost perf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: back to 1 block per loop to test perf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: back to 1 block per loop to test perf" This reverts commit 1fe55724e2dc295701101bf838bdd4a512237492. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rm unroll from single block Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-09-29 15:18:11 +03:00
R0CKSTAR	98ac209ae1	musa: fix build warnings (llama/15611) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-29 15:18:10 +03:00
Aman Gupta	d9bf63cfb8	CUDA: add a fused top-K MoE kernel (llama/16130) * CUDA: add a fused top-K MoE kernel This kernel does the following: 1. softmax over the logits per token [n_experts, n_tokens] 2. argmax reduce over the top-k (n_experts_used) logits 3. write weights + ids to global memory It is intended as fusion of softmax->top-k->get_rows pipeline for MoE models * Refactor into ggml_cuda_should_use_topk_moe * Review: Use better coalescing pattern, use WARP_SIZE, store logits into registers before * Review: format + micro-optimizations * Fix bug: fix tie breakers * Add optional norm + clean-up code * Use smem for final write * Add bounds check * Use better memory pattern for writeback	2025-09-29 15:18:10 +03:00
junchao-zhao	24ea5476de	ggml : fix loongarch lsx compilation error (llama/15864)	2025-09-29 15:18:10 +03:00
Daniel Bevenius	611ff19f20	ggml : remove -dev suffix from release version (ggml/1355) This commit removes the `-dev` suffix from the version string in CMakeLists.txt and the release script. The version will now be just be formatted as `MAJOR.MINOR.PATCH`.	2025-09-29 15:18:10 +03:00
Daniel Bevenius	06d7b3d124	ggml : bump version to 0.9.3 (ggml/1353)	2025-09-29 15:18:10 +03:00
Georgi Gerganov	ac678efb35	metal : fuse NORM + MUL + ADD, support non-multiples of 4 (llama/16220) * metal : fuse NORM + MUL + ADD * metal : support norms of non-multiple of 4 * cont : fix comment [no ci]	2025-09-29 15:18:10 +03:00
Georgi Gerganov	268f1c961b	metal : relax reorder conditions (llama/16216)	2025-09-29 15:18:10 +03:00
Georgi Gerganov	0a5b811f2e	metal : restore im2col perf (llama/16219)	2025-09-29 15:18:10 +03:00
Radoslav Gerganov	0946619662	rpc : use ggml logging facilities Use RPC_DEBUG environment variable to enable debug messages. Add helper macro LOG_DBG() which does an early check of the env var before calling GGML_LOG_DEBUG(). Make sure we log a debug message for every server function.	2025-09-29 15:18:10 +03:00
Johannes Gäßler	cd431223e0	llama: print memory breakdown on exit (llama/15860) * llama: print memory breakdown on exit	2025-09-29 15:18:10 +03:00
Acly	5069c08034	ggml : split graph allocations according to backend max buffer size (llama/15815) * ggml : make gallocr respect the backend's max buffer size * if the graph requires more memory than can fit into a single allocation, split it into multiple backend buffers * vulkan: report the actual max allocation size in buffer type interface * fix missing newline, apple-clang warning * track size of individual chunks in ggml_dyn_tallocr and raise max chunks. revert to use suballocation_block_size as max chunk size for vulkan. * track (chunk, offset) pairs instead of "global" offsets through gallocr. * simpler, don't need loops to map between local/global offsets * touches more code * fix dyn_tallocr_max_size and initialization * fix memory leak when buffers are reused due to same buffer type appearing multiple times * make vbuffer allocation follow the same logic as backend_buffer did before * continue to use leftover unallocated space of previous chunks after a new one has been created * treat free blocks of each chunk as separate list * they're still allocated together, but start/end of each chunk is tracked, and allocate/free iterate over sub-ranges * exhaust freed blocks of all chunks before considering their last blocks with unallocated space * start with 0 chunks/blocks and create chunks as needed * allow the last chunk to grow beyond max size * refactor: move adding new free block and new chunk into separate functions * allocate chunks individually with a separate free-blocks list for each one * needs a bit more memory/allocations/indirections, but code is simpler * fix warnings (missing static) & debug checks	2025-09-29 15:18:09 +03:00
Xiangyan Sun	41245891c1	ggml-cpu: Respect cpumask settings (llama/16164)	2025-09-29 15:18:09 +03:00
Sigbjørn Skjæret	73e8f3acb8	ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (llama/15928) * fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl * change initialization to true	2025-09-29 15:18:09 +03:00
Aaron Teo	c706a50746	zdnn: refactor codebase + add docs (llama/16178) * zdnn: initial matmul refactor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm static from funcs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update ggml-zdnn.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: change header files to hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to common.hpp Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move mulmat forward around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rm inline from utils Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add zDNN docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-09-29 15:18:09 +03:00
Daniel Bevenius	d8d31e3638	ggml-cpu : fix typo in gemm comments [no ci] (llama/16189)	2025-09-29 15:18:09 +03:00
Sigbjørn Skjæret	4e32ee733b	ggml : implement set_rows with i32 index (llama/16159) * implement set_rows with i32 index * template fix * test quantized path warnings-- * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * forgotten name change * deduplicate cuda/sycl and test-fix * indent++ * vulkan: support set_rows with i32 index type (llama/16162) * disable i32 index for webgpu for now --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-09-29 15:18:09 +03:00
Georgi Gerganov	df672c6372	ggml : extend ggml_can_fuse to work with non-sequential nodes (llama/16123) * ggml : extend ggml_can_fuse to work with non-sequential nodes in the graph * cont : fix wrong bounds check condition * cont : remove unnecessary overload	2025-09-29 15:18:09 +03:00
Georgi Gerganov	973054a8cd	ggml : add ggml_op_is_empty (llama/16122) * ggml : add ggml_op_is_empty * ggml : move to ggml-impl.h	2025-09-29 15:18:09 +03:00
Shin-myoung-serp	9f673df08d	Vulkan: add conv_transpose_2d operation (llama/16022) * Vulkan: add conv_transpose_2d operation * Vulkan: fix typo in conv_transpose_2d shader(s0mp, s0L, s1mp, s1L) * Vulkan: fix incorrect indentation in conv_transpose_2d shader * Vulkan: add checking the push constants size limit and reuse conv2d_mm.comp for conv_transpose_2d operation * Vulkan: revert the order of the index calculation and bound check in conv_2d shader * Vulkan: explicity check push constants limit in supports_op() for conv_transpose_2d operation. * Vulkan: remove unnecessary lower bound checks for H/W_idx in the conv_2d shader.	2025-09-29 15:18:09 +03:00
Jeff Bolz	14723f25a1	vulkan: add RTE variants of exp shader (llama/16165) This fixes some failures on Turing where "round to zero" rounds to the max f16 value but the CPU reference value is infinite.	2025-09-29 15:18:08 +03:00
Ruben Ortlam	95b29fab78	vulkan: vec dot matrix multiplication fix (llama/16151) * vulkan: fix matrix multiplication index calculation for odd m/n and odd k in combination with batching * add odd m/n + odd k test with batching	2025-09-29 15:18:08 +03:00
lhez	4b7f09ac0b	opencl: fix concat crash on win arm64 with Adreno (llama/15944)	2025-09-29 15:18:08 +03:00
lhez	0a7096f4f3	opencl: initial `q8_0` mv support (llama/15732)	2025-09-29 15:18:08 +03:00
Giuseppe Scrivano	eae2be0ca2	vulkan: optimize UMA buffer operations and fix driver hangs (llama/16059) * vulkan: optimize UMA buffer operations and fix driver hangs The previous implementation was blocking the GPU for extended periods, causing the i915 driver to reset the context due to the hangcheck protection. [32628.443070] i915 0000:00:02.0: [drm] GPU HANG: ecode 12:1:85dffffb, in llama-server [194114] [32628.443091] i915 0000:00:02.0: [drm] llama-server[194114] context reset due to GPU hang * vulkan: implement deferred_memset on UMA --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-09-29 15:18:08 +03:00
Jeff Bolz	9a6c2036a9	vulkan: fix validation error about VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR (llama/16086)	2025-09-29 15:18:08 +03:00
Georgi Gerganov	8d10ded025	ggml : prepare for development of 0.9.2-dev	2025-09-29 15:18:08 +03:00
Georgi Gerganov	d89164a08d	ggml : bump version to 0.9.1	2025-09-29 15:18:05 +03:00
Georgi Gerganov	36778bd8b8	talk-llama : sync llama.cpp	2025-09-20 13:58:28 +03:00
Georgi Gerganov	66ad624d5b	sync : ggml	2025-09-20 13:46:41 +03:00
Ruben Ortlam	76d0934287	vulkan: use vec dot for matrix matrix multiplications (llama/16056) * vulkan: Change the mul_mm shared memory and register caching system to use vec2 instead of scalars, to enable using dot2 instructions * use fma instead of dot to fix Nvidia and Apple performance issues	2025-09-20 13:46:39 +03:00
Xuan-Son Nguyen	2ad00d5586	ggml : refactor forward_dup for cpu backend (llama/16062) * ggml : refactor forward_dup for cpu backend * clean up a bit * add quant/dequant perf test	2025-09-20 13:46:39 +03:00
Adrien Gallouët	4d8cd07825	ggml-amx : fix ggml_amx_init() on generic Linux (llama/16049) Generalize Linux check to `__linux__` to support non-glibc systems (like musl). Also, return `false` on unknown/untested OS. Without this commit, the code compiles (with warnings) but fails: register_backend: registered backend CPU (1 devices) register_device: registered device CPU (Intel(R) Xeon(R) Platinum 8488C) build: 6487 (51c4cac6) with x86_64-linux-musl-gcc (GCC) 15.1.0 for x86_64-linux-musl (debug) system info: n_threads = 8, n_threads_batch = 8, total_threads = 16 .... print_info: n_ctx_orig_yarn = 262144 print_info: rope_finetuned = unknown print_info: model type = 4B Illegal instruction (core dumped) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-09-20 13:46:39 +03:00
Adrien Gallouët	4575f96873	cmake : fix static linking for OpenMP on Unix-like systems (llama/16031) When compiling with GGML_STATIC=ON, the build process would produce a binary that was still dynamically linked to OpenMP. This defeats the purpose of a static build: $ cmake -B build \ -DBUILD_SHARED_LIBS=OFF \ -DLLAMA_CURL=OFF \ -DGGML_CCACHE=OFF \ -DGGML_NATIVE=OFF \ -DGGML_STATIC=ON $ ldd llama-server linux-vdso.so.1 (0x0000e1a434e3b000) libgomp.so.1 => /lib/aarch64-linux-gnu/libgomp.so.1 (0x0000e1a4345a0000) libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000e1a434300000) libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000e1a434240000) libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000e1a434200000) libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000e1a434030000) /lib/ld-linux-aarch64.so.1 (0x0000e1a434df0000) This commit resolves the issue by modifying `CMAKE_FIND_LIBRARY_SUFFIXES` to prioritize `.a` files, forcing CMake to link the static version of the library. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-09-20 13:46:39 +03:00
Shawn Gu	f4a225cea6	opencl: optimize mxfp4 kernels (llama/16037) - flatten mxfp4 and packed fp4->fp16 bit-wise convert function (replace lut) - MoE kernel optimizations --------- Co-authored-by: Li He <lih@qti.qualcomm.com>	2025-09-20 13:46:39 +03:00
Jeff Bolz	7fcb7e83ec	rename optimize_graph to graph_optimize (llama/16082)	2025-09-20 13:46:39 +03:00
Bowen Han	fce6354e0f	CUDA: Optimize PAD_REFLECT_1D (llama/15957) * CUDA: Optimize PAD_REFLECT_1D feat: add more test cases for PAD_REFLECT_1D * use fast_div to improve performance * Apply suggestion from JohannesGaessler Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Apply suggestion from JohannesGaessler Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * optimize * use a concise expression to further speedup the cuda kernel --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-09-20 13:46:38 +03:00
Johannes Gäßler	05bdfd4380	CUDA: fix compilation on CC 6.0 (llama/16091)	2025-09-20 13:46:38 +03:00
Georgi Gerganov	960aaa9904	metal : use function constants for mul_mv_ext kernels (llama/16074) * metal : use function constants for mul_mv_ext kernels ggml-ci * metal : remove NW template argument ggml-ci * metal : adjust constants ggml-ci	2025-09-20 13:46:38 +03:00
Sigbjørn Skjæret	225d7c1d5a	cuda : add missing F32<->I32 entries in ggml_cuda_cpy_fn (llama/16060)	2025-09-20 13:46:38 +03:00
Georgi Gerganov	d37f590a77	metal : improve F32, F16 and BF16 mat-vec multiplication (llama/16057) * metal : improve F32, F16 and BF16 mat-vec multiplication ggml-ci * metal : make the NSG a function constant in mul_mv kernels ggml-ci	2025-09-20 13:46:38 +03:00
Jhen-Jie Hong	32b6d9c134	metal : avoid call free for non-owned buffer (llama/16067)	2025-09-20 13:46:38 +03:00
Georgi Gerganov	1f24b1df4d	metal : handle nil cv during pipeline creation (llama/16065) ggml-ci	2025-09-20 13:46:38 +03:00
Chenguang Li	c46adc0817	CANN: Remove print (llama/16044) Signed-off-by: noemotiovon <757486878@qq.com>	2025-09-20 13:46:38 +03:00
Reese Levine	1361f679cc	GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (llama/16018) * Add paramater buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow * some f32 tests passing * Disable set_rows until it's implemented * f32 add all tests passing * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments * Add templated addition, clean up code * Get addition and multiplication working * Implement rms_norm * Add get_rows implementation * Add new get_rows files * Refactor use of wg size entry * Fix compilation * Try manually unrolled q4_0 quant * Revert "Try manually unrolled q4_0 quant" This reverts commit 77f8b96515f7e640ae4b0e44f066321fbc4a6166. * Move to constant max wg size * Check for tensor size in supports_op * Vectorize f32 and change default workgroup size * Move f32 get_rows from < 4 to % 4 != 0 * fix linter errors * Add in-place tests --------- Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>	2025-09-20 13:46:37 +03:00

... 15 16 17 18 19 ...

4079 Commits All Branches Search

4079 Commits

All Branches