whisper.cpp/ggml
Georgi Gerganov ee2cbeeb74 llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)
* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
2025-04-24 20:39:16 +03:00
..
cmake ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0) 2025-03-27 11:06:03 +02:00
include ggml : add bilinear upscale support (ggml/1185) 2025-04-24 20:39:16 +03:00
src llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825) 2025-04-24 20:39:16 +03:00
.gitignore
CMakeLists.txt ggml : sync/merge cmake,riscv,powerpc, add common.cmake (ggml/0) 2025-03-27 11:06:03 +02:00