whisper.cpp/ggml
Chenguang Li 9b773acac0
CANN: implement LRU cache for ACL graphs (llama/15814)
* CANN: implement LRU cache for ACL graphs in CANN backend

- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
- Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
- Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
- Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.

* fix typo

* The LRU cache capacity can be configured via an env variable

Signed-off-by: noemotiovon <757486878@qq.com>

* refactory acl graph

* refactory && fix review comments

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>
2025-09-20 13:42:53 +03:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) 2025-08-18 20:30:45 +03:00
include cuda : fix supports_op condition for get_rows when number of blocks is too large (llama/15868) 2025-09-20 13:42:52 +03:00
src CANN: implement LRU cache for ACL graphs (llama/15814) 2025-09-20 13:42:53 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml-cpu: drop support for nnpa intrinsics (llama/15821) 2025-09-20 13:42:50 +03:00