This introduces an optional feature to allocate large GPU buffers (≥ 1GB) using USM system allocations if supported by the device. It allows using buffers from the system allocator then letting the system manage memory migrations between host and device as necessary. This feature is disabled by default and requires the GGML_SYCL_USM_SYSTEM environment variable to enable. If USM system allocations are not supported by the device or the system, we fallback to regular allocations. This feature can allow VRAM overcommit. For example, the test below fails on B580 due to lack of memory for allocation, but it passes when enabling USM system allocations: ./examples/sycl/test.sh -m Qwen3.5-27B-Q3_K_M.gguf -lv 4 Signed-off-by: Francois Dugast <francois.dugast@intel.com> |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||