whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Neo Zhang	6494251197	ehance UPSCALE to support all UT cases (llama/20637) * [SYCL] ehance UPSCALE to support more cases * rm test case result of SYCL1	2026-03-29 15:04:36 +03:00
Neo Zhang	72c7a2532d	fix for failed UT case: ACC, L2_NORM, UPSCALE, fused_glu, unary (llama/20283)	2026-03-16 13:10:15 +02:00
Nechama Krashinski	f2f7320817	sycl: add F16 support for GGML_OP_CEIL (llama/19306) * Fix SYCL CEIL operator * sycl: implement GGML_OP_CEIL	2026-02-08 09:29:10 +02:00
s8322	2a16e7a67f	sycl: implement GGML_UNARY_OP_SOFTPLUS (llama/19114) * sycl: add softplus unary op implementation * sycl: add softplus unary op implementation * docs(ops): mark SYCL SOFTPLUS as supported * docs: update SYCL status for SOFTPLUS	2026-01-30 15:56:40 +02:00
Neo Zhang Jianyu	b5e352a52f	Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (llama/17826) * support gpt-oss GPU by OP add-id, mul_mat for mxfp4, swiglu_oai, fix warning * fix fault ut case, update ops.md * rebase, fix format issue	2025-12-18 08:20:56 +02:00
shani-f	1fd63da9f2	sycl : unify unary kernels with a generic implementation and enable wide operator support (llama/17213) * SYCL: add generic unary op implementation for multiple ops (ABS/SGN/…); unify non-contiguous access * SYCL: update documentation and sycl.csv to reflect new unary op support * update ops.md after syncing SYCL.csv changes * Fix SYCL.csv merge conflict * Update ops.md after fixing SYCL.csv conflicts * Fix SYCL.csv tail after merge conflict and regenerate ops.md * Fix line endings and final newline in SYCL.csv * Remove TOPK_MOE entries from SYCL.csv as requested * Update ops.md after removing TOPK_MOE from SYCL.csv * Regenerated SYCL.csv and synced ops.md with upstream * Update ops.md using create_ops_docs.py	2025-11-17 21:05:46 +02:00
safranowith	bb76672081	SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators (llama/16613) * SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators Clean up unrelated changes from previous commit * Chore: remove empty lines and fix indentation * Clean up: remove leftover blank lines and fix spacing * chore: fix trailing whitespace and ensure final newline * Cleanup: remove redundant declarations already defined in header * Sync docs/ops.md with updated backend operation support * docs: update ops.md after rebase * docs: update ops.md - Vulkan supports SSM_CONV and SSM_SCAN	2025-10-22 12:58:11 +03:00
GittyBurstein	7bb53032b3	sycl : add ARANGE operator (llama/16362) * SYCL: update element-wise ops and presets * clean arange * Re-trigger CI --------- Co-authored-by: Gitty Burstein <gitty@example.com>	2025-10-22 12:58:11 +03:00
Neo Zhang Jianyu	be778c992f	fix UT fault cases: count-equal, argsort, pad OPs (llama/16521) * fix/refactor OP argsort, pad * fix count-equal op * update SYCL OP list * fix format issue --------- Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>	2025-10-15 09:29:17 +03:00
Neo Zhang Jianyu	cd764eaf2b	Revert "sycl: add usage of enqueue_functions extension (llama/14244)" (llama/15910) * Revert "sycl: add usage of enqueue_functions extension (#14244)" This reverts commit 8308f98c7fb778e54bf75538f5234d8bd20915e9. * fix missed revert code, format the code	2025-09-20 13:45:28 +03:00
Sigbjørn Skjæret	e8138c51d2	ggml : implement GEGLU_ERF and GEGLU_QUICK ops (llama/14445)	2025-07-12 19:23:56 +03:00
Sigbjørn Skjæret	8bc638ee56	ggml : implement REGLU/GEGLU/SWIGLU ops (llama/14158) * implement unary REGLU/GEGLU/SWIGLU cpu ops * relax constraints * duplicate shape of source * fix ggml_vec_geglu_f16 * special case gated ops * implement unary REGLU/GEGLU/SWIGLU cuda ops * tighten constraints again * refactor into GGML_GLU_OP * metal : add glu kernels ggml-ci * add CUDA_GLU_BLOCK_SIZE [no ci] * more constraints and use 64bit ints ggml-ci * 64bit multiplication [no ci] * implement swapped variants (cpu/cuda) * update comment [no ci] ggml-ci * Vulkan: Add GLU ops and shaders * SYCL: Implement fused kernel GEGLU, SWIGLU and REGLU for single up+gate * ggml : implement GLU for split up/gate (llama/14181) * implement GLU for split up/gate * add tests for ggml_glu_split * Vulkan: Implement glu_split logic and shader support * add split to logging [no ci] * SYCL: refactor element_size ops and add split up and gate support to gated kernels * SYCL: switch GEGLU to use tanh approximation --------- Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> * GGML: increase OP count in assertion * Refactor: Optimize SYCL element-wise operations with unary function inlining This commit refactors the SYCL element-wise operations to improve performance by: - Inlining unary operations (sgn, abs, elu, gelu, silu, etc.) to reduce kernel launch overhead. - Introducing helper functions `op_xxx` for each unary operation to encapsulate the logic. - Replacing direct kernel calls with calls to these inlined functions. - Using `__dpct_inline__` to encourage compiler inlining. - Minor code cleanup and consistency improvements. The changes aim to reduce kernel launch overhead and improve the overall efficiency of element-wise operations on SYCL devices. * vulkan: Increase workgroup size for GLU, for performance (llama/14345) * vulkan: Increase workgroup size for GLU, for performance * vulkan: change GLU shaders to do one element per invocation rather than one row per workgroup * merge fix * metal : add support for split and swap ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: 0cc4m <picard12@live.de> Co-authored-by: Akarshan <akarshan@menlo.ai> Co-authored-by: Jeff Bolz <jbolz@nvidia.com>	2025-07-01 17:54:53 +03:00
Nicolò Scipione	a455dcb04c	sycl: add usage of enqueue_functions extension (llama/14244) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-21 07:34:17 +03:00
Akarshan Biswas	3d5c7ca4bc	SYCL: add gelu_erf kernel (llama/13749) * SYCL: add gelu_erf kernel * refactor code Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com> * Use scope_op_debug_print --------- Co-authored-by: Atharva Dubey <atharva.dubey@codeplay.com>	2025-06-01 15:14:44 +03:00
Romain Biessy	25e27904ca	sycl: Add more debug prints (llama/13640)	2025-05-27 18:03:00 +03:00
Łukasz Ślusarczyk	a8e17a244d	sycl : fixed compilation warnings (llama/13582)	2025-05-19 14:58:39 +03:00
Akarshan Biswas	1a76e97c28	SYCL: Add all missing unary kernels (llama/13074) * SYCL: Add all missing unary kernels ggml-ci * decouple kernel launch range from data size using strided loop * use ciel_div helper for num_blocks ggml-ci * clean auto imported header files	2025-05-01 13:29:02 +03:00
Akarshan Biswas	0287a5c51b	SYCL: Refactor and enable FP16 in binary broadcast OPs (llama/12975) * SYCL: refactor move to a separate file * Fix binbcast * Remove duplicates * fix include formatting * fix typo	2025-04-24 20:39:16 +03:00
Akarshan Biswas	e9ce285135	SYCL: Add fp16 type support to unary op kernels (llama/12788) * SYCL: Add fp16 support to some elementwise OP kernels * remove comment ggml-ci * Use static_cast directly * remove not needed cast from tanh * Use static cast and remove unneeded castings * Adjust device_support_op for unary OPs * Use cast_data and typed_data struct to deduplicate casting code	2025-04-24 20:39:16 +03:00
Akarshan Biswas	2e2f0f954b	SYCL: Remove misleading ggml_sycl_op_flatten function (llama/12387) * SYCL: Remove misleading ggml_sycl_op_flatten function * remove trailing whitespace * Fix L2 norm from rebase * remove try catch block from element_wise.cpp * remove comment from common.hp * ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward * norm.cpp: remove try catch exception block	2025-03-31 14:56:53 +03:00
Łukasz Ślusarczyk	6c15539c54	fixed compilation warnings in ggml-sycl (llama/12424)	2025-03-27 11:06:03 +02:00
Akarshan Biswas	c3235bd81e	SYCL: Refactor ggml_sycl_compute_forward (llama/11121) * SYCL: refactor ggml_sycl_compute_forward * SYCL: add back GGML_USED(dst) to ggml_sycl_cpy * SYCL: add function name to noop debug * SYCL: Some device info print refactoring and add details of XMX availability	2025-01-14 10:38:01 +02:00
Akarshan Biswas	26c9fd0cdc	SYCL: Reduce most of the compiler warnings (llama/10748) * Try to reduce some unused and typecast warnings * Reduce compiler warnings step 2 * add a newline at the end of the file * Initialize nreduce as size_t * [SYCL] Remove pragma directives from mmq.cpp * SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0 * SYCL softmax: Initialize nreduce as size_t * ggml-sycl.cpp: fix some trailing whitespaces * SYCL: remove the unused variables instead of commenting it out * SYCL poo2d kernel: set NAN for invalid pooling op * SYCL gemm.hpp: remove pragma directives * SYCL gemm.hpp: use const cast to properly support dnnl::memory * SYCL: wkv6 remove a comment * SYCL: clean comments step 2 * SYCL: clean comments and variables step 3 * SYCL: Use GGML_UNUSED for unused variables * SYCL: remove extra empty lines and a comment * Remove TODO * cleanup spaces * add a stdout for unsupported op * use sycl printf over fprintf * remove prints for CI * SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-12-18 12:52:16 +02:00
Zhiyuan Li	42398f13b0	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (llama/10133) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-11-15 15:21:04 +02:00

24 Commits