whisper.cpp/ggml/src/ggml-cpu
Piotr Wilkin (ilintar) 33ca8355c4 model : Apertus model implementation (llama/15852)
* First attempt

* No permute during convert (fixes qk tensors), proper norm application.

* RoPE = NeoX

* Coherence!

* Migrate xielu params from tensors to hyperparameters

* Simple CUDA kernel

* Revert stupid LLM refactorings

* Chat template support

* configchecker / flake8 errors

* Reorder unary.cu

* I do conclude that LLMs are, in fact, stupid.

* Fix after merge

* Final newline

* Make xIELU an UNARY_OP

* Final newline

* Correctly account for parameter shift

* Argh.

* Update ggml/src/ggml-cpu/unary-ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Refactor: remove unused methods, inline and factorize softplus, add const modifiers

* Revert CUDA changes, implement xIELU as a separate OP

* Pesky newline

* Add float2half / half2float for F16 inputs/outputs

* CUDA variants, attempt 2

* Actually, attempt 3

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Missing convert header

* Proper formula and reference for xIELU in the comments.

* Modify unary-ops.cpp to add the functor-based logic besides the template system to retain optimizations

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Add tensor mappings for Apertus to global list instead

* Fix lazy on scalars

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Add comment about the constraints on positive/negative alpha

* Change `softplus` to `ggml_softplus`

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-12 11:16:23 +03:00
..
amx ggml-amx : fix ggml_amx_init() on generic Linux (llama/16049) 2025-09-20 13:46:39 +03:00
arch devops: add s390x & ppc64le CI (llama/15925) 2025-09-29 15:18:11 +03:00
cmake ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
kleidiai kleidiai : fix work size and threads sync for fp16 (llama/16246) 2025-09-30 12:31:04 +03:00
llamafile llamafile: PowerPC Sgemm Optimization (llama/15558) 2025-09-20 13:42:42 +03:00
spacemit ggml: riscv: add riscv spacemit backend (llama/15288) 2025-09-30 12:31:03 +03:00
CMakeLists.txt kleidiai : fix work size and threads sync for fp16 (llama/16246) 2025-09-30 12:31:04 +03:00
arch-fallback.h ggml-cpu: implement MXFP4 SIMD for s390x (llama/16193) 2025-09-29 15:18:11 +03:00
binary-ops.cpp cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
binary-ops.h cpu: de-duplicate some of the operators and refactor (ggml/1144) 2025-03-31 14:56:53 +03:00
common.h ggml : refactor forward_dup for cpu backend (llama/16062) 2025-09-20 13:46:39 +03:00
ggml-cpu-impl.h ggml-cpu: clean up s390x SIMD (llama/15855) 2025-09-20 13:42:51 +03:00
ggml-cpu.c model : Apertus model implementation (llama/15852) 2025-10-12 11:16:23 +03:00
ggml-cpu.cpp ggml: riscv: add riscv spacemit backend (llama/15288) 2025-09-30 12:31:03 +03:00
hbm.cpp ggml-cpu : split arch-specific implementations (llama/13892) 2025-06-10 12:40:33 +03:00
hbm.h ggml-cpu : split arch-specific implementations (llama/13892) 2025-06-10 12:40:33 +03:00
ops.cpp model : Apertus model implementation (llama/15852) 2025-10-12 11:16:23 +03:00
ops.h ggml: add ops for WAN video model (cuda && cpu) (llama/15669) 2025-09-20 13:42:49 +03:00
quants.c llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
quants.h llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
repack.cpp ggml : repack block_iq4_nlx8 (llama/14904) 2025-08-18 20:30:45 +03:00
repack.h ggml : repack block_iq4_nlx8 (llama/14904) 2025-08-18 20:30:45 +03:00
simd-mappings.h ggml : fix loongarch lsx compilation error (llama/15864) 2025-09-29 15:18:10 +03:00
traits.cpp ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2025-08-18 20:30:45 +03:00
traits.h ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2025-08-18 20:30:45 +03:00
unary-ops.cpp model : Apertus model implementation (llama/15852) 2025-10-12 11:16:23 +03:00
unary-ops.h model : Apertus model implementation (llama/15852) 2025-10-12 11:16:23 +03:00
vec.cpp ggml-cpu : optimize RVV kernels (llama/15720) 2025-09-20 13:42:48 +03:00
vec.h ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (llama/16307) 2025-09-29 15:18:12 +03:00