whisper.cpp/ggml
Zheyuan Chen 13133ab299 ggml-webgpu: makes the flash attn vec path subgroup-aware (llama/23040)
* ggml-webgpu: makes the flash attn vec path compile and size its split/reduce work from the device’s reported subgroup range instead of assuming 32 subgroup size.

* ggml-webgpu: remove the extra max_wg_size >= max_subgroup_size guard. Remove hardcoded 32 when determine the value of reduce_wg_size and vec_nwg_cap
2026-05-25 12:26:07 +03:00
..
cmake cmake : add FindNCCL.cmake (ggml/0) 2026-05-02 15:02:42 +03:00
include CUDA: lower-case PCI bus id, standardize for ggml (llama/22820) 2026-05-14 21:26:48 +03:00
src ggml-webgpu: makes the flash attn vec path subgroup-aware (llama/23040) 2026-05-25 12:26:07 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt SYCL: fix multi-GPU system RAM exhaustion by using Level Zero allocations (llama/21597) 2026-05-25 12:26:07 +03:00