whisper.cpp/ggml
Jeff Bolz c66c71e9f4
vulkan: Use one row per workgroup for f32 mmv (llama/17711)
The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before
the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs
well. I think even for larger m, f32 is so bandwidth-limited that running
multiple rows doesn't help.
2025-12-12 17:53:20 +02:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) 2025-08-18 20:30:45 +03:00
include rpc : fix alloc size logic (llama/17116) 2025-12-12 17:53:18 +02:00
src vulkan: Use one row per workgroup for f32 mmv (llama/17711) 2025-12-12 17:53:20 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt build : move _WIN32_WINNT definition to headers (llama/17736) 2025-12-12 17:53:16 +02:00