whisper.cpp/ggml/include
Piotr Wilkin (ilintar) 33ca8355c4 model : Apertus model implementation (llama/15852)
* First attempt

* No permute during convert (fixes qk tensors), proper norm application.

* RoPE = NeoX

* Coherence!

* Migrate xielu params from tensors to hyperparameters

* Simple CUDA kernel

* Revert stupid LLM refactorings

* Chat template support

* configchecker / flake8 errors

* Reorder unary.cu

* I do conclude that LLMs are, in fact, stupid.

* Fix after merge

* Final newline

* Make xIELU an UNARY_OP

* Final newline

* Correctly account for parameter shift

* Argh.

* Update ggml/src/ggml-cpu/unary-ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Refactor: remove unused methods, inline and factorize softplus, add const modifiers

* Revert CUDA changes, implement xIELU as a separate OP

* Pesky newline

* Add float2half / half2float for F16 inputs/outputs

* CUDA variants, attempt 2

* Actually, attempt 3

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Missing convert header

* Proper formula and reference for xIELU in the comments.

* Modify unary-ops.cpp to add the functor-based logic besides the template system to retain optimizations

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Add tensor mappings for Apertus to global list instead

* Fix lazy on scalars

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Add comment about the constraints on positive/negative alpha

* Change `softplus` to `ggml_softplus`

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-12 11:16:23 +03:00
..
ggml-alloc.h ggml : upgrade init_tensor API to return a ggml_status (llama/11854) 2025-03-08 15:13:01 +02:00
ggml-backend.h llama: print memory breakdown on exit (llama/15860) 2025-09-29 15:18:10 +03:00
ggml-blas.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cann.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cpp.h ggml : fix ggml_gallocr_ptr type (ggml/1205) 2025-05-01 13:29:02 +03:00
ggml-cpu.h ggml: allow casting between f32 and i32 (llama/15783) 2025-09-20 13:42:51 +03:00
ggml-cuda.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-metal.h metal : refactor + optimize v2 (llama/15995) 2025-09-20 13:46:10 +03:00
ggml-opencl.h Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693) 2024-12-18 12:52:16 +02:00
ggml-opt.h finetune: SGD optimizer, more CLI args (llama/13873) 2025-08-18 20:30:45 +03:00
ggml-rpc.h rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (llama/12943) 2025-05-01 13:29:02 +03:00
ggml-sycl.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-vulkan.h vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494) 2025-02-27 08:55:36 +02:00
ggml-webgpu.h ggml: Add initial WebGPU backend (llama/14521) 2025-07-20 00:23:50 +03:00
ggml-zdnn.h zdnn: refactor codebase + add docs (llama/16178) 2025-09-29 15:18:09 +03:00
ggml.h model : Apertus model implementation (llama/15852) 2025-10-12 11:16:23 +03:00
gguf.h GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030) 2025-01-14 10:38:01 +02:00