whisper.cpp/ggml
Jeff Bolz 0a8285186a vulkan: Optimize argsort (llama/15354)
- Launch an appropriate number of invocations (next larger power of two).
32 invocations is common and the barrier is much cheaper there.
- Specialize for "needs bounds checking" vs not.
- Make the code less branchy and [[unroll]] the loops. In the final code,
I see no branches inside the main loop (only predicated stores) when
needs_bounds_check is false.
- Always sort ascending, then apply the ascending vs descending option when
doing the final stores to memory.
- Copy the values into shared memory, makes them slightly cheaper to access.
2025-08-18 20:30:45 +03:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) 2025-08-18 20:30:45 +03:00
include ggml: initial IBM zDNN backend (llama/14975) 2025-08-18 20:30:45 +03:00
src vulkan: Optimize argsort (llama/15354) 2025-08-18 20:30:45 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml: initial IBM zDNN backend (llama/14975) 2025-08-18 20:30:45 +03:00