whisper.cpp/ggml/include
Molly Sophia 06209f6683 llama: add support for QRWKV6 model architecture (llama/11001)
llama: add support for QRWKV6 model architecture (llama/11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-14 10:38:01 +02:00
..
ggml-alloc.h ggml : fix typo in example usage ggml_gallocr_new (ggml/984) 2024-10-05 15:23:51 +03:00
ggml-backend.h ggml: load all backends from a user-provided search path (llama/10699) 2024-12-18 12:52:16 +02:00
ggml-blas.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cann.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-cpp.h GGUF: C++ refactor, backend support, misc fixes (llama/11030) 2025-01-14 10:38:01 +02:00
ggml-cpu.h ggml : refactor online repacking (llama/10446) 2024-12-18 12:52:16 +02:00
ggml-cuda.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-kompute.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-metal.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-opencl.h Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693) 2024-12-18 12:52:16 +02:00
ggml-opt.h ggml: new optimization interface (ggml/988) 2024-11-20 21:00:08 +02:00
ggml-rpc.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-sycl.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml-vulkan.h ggml : build backends as libraries (llama/10256) 2024-11-20 21:00:08 +02:00
ggml.h llama: add support for QRWKV6 model architecture (llama/11001) 2025-01-14 10:38:01 +02:00