whisper.cpp/ggml
Aman Gupta d5a49ebec8 cuda: reserve space for quantize kv-cache at startup (llama/23907)
* cuda: reserve space for quantize kv-cache at startup

* address review comments

* remove forward decl

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* remove assert in ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-06-08 14:36:36 +03:00
..
cmake ggml : Parallelize quant LUT init (llama/23595) 2026-05-25 12:26:07 +03:00
include TP: quantized KV cache support (llama/23792) 2026-06-08 14:36:36 +03:00
src cuda: reserve space for quantize kv-cache at startup (llama/23907) 2026-06-08 14:36:36 +03:00
.gitignore
CMakeLists.txt ggml : bump version to 0.13.1 (ggml/1523) 2026-05-29 09:47:30 +03:00