whisper.cpp/ggml
Francois Dugast dd1a6ca897 sycl: Add optional USM system allocations (llama/22526)
This introduces an optional feature to allocate large GPU buffers (≥ 1GB)
using USM system allocations if supported by the device. It allows using
buffers from the system allocator then letting the system manage memory
migrations between host and device as necessary.

This feature is disabled by default and requires the GGML_SYCL_USM_SYSTEM
environment variable to enable. If USM system allocations are not supported
by the device or the system, we fallback to regular allocations.

This feature can allow VRAM overcommit. For example, the test below fails
on B580 due to lack of memory for allocation, but it passes when enabling
USM system allocations:

  ./examples/sycl/test.sh -m Qwen3.5-27B-Q3_K_M.gguf -lv 4

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
2026-06-19 12:53:43 +03:00
..
cmake ggml : Parallelize quant LUT init (llama/23595) 2026-05-25 12:26:07 +03:00
include Remove padding and multiple D2D copies for MTP (llama/24086) 2026-06-15 10:33:53 +03:00
src sycl: Add optional USM system allocations (llama/22526) 2026-06-19 12:53:43 +03:00
.gitignore
CMakeLists.txt ggml : bump version to 0.15.1 (ggml/1541) 2026-06-15 10:33:53 +03:00