whisper.cpp/ggml
fairydreaming c50e951afd model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (llama/23346)
* llama : support DeepSeek V3.2 model family (with DSA lightning indexer)

* convert : handle DeepseekV32ForCausalLM architecture

* ggml : support for f16 GGML_OP_FILL

* memory : separate hparams argument in llama_kv_cache constructor

* memory : add llama_kv_cache_dsa memory (KV cache + lightning indexer cache)

* llama : support for LLM_ARCH_DEEPSEEK32

* model : llama_model_deepseek32 implementation

* model : merge two scale operations into one in DSA lightning indexer implementation

* chore : remove unused code

* model : support NVFP4 in DeepSeek V3.2

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* memory : refactoring TODO

Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>
2026-06-08 14:36:36 +03:00
..
cmake ggml : Parallelize quant LUT init (llama/23595) 2026-05-25 12:26:07 +03:00
include ggml: `gguf_init_from_callback` and `gguf_init_from_buffer` (llama/22341) 2026-05-25 12:44:04 +03:00
src model : support for DeepseekV32ForCausalLM with generic DeepSeek Sparse Attention (DSA) implementation (llama/23346) 2026-06-08 14:36:36 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : bump version to 0.13.1 (ggml/1523) 2026-05-29 09:47:30 +03:00