whisper.cpp/ggml
Jeff Bolz 46e9e5b9a7 vulkan: optimizations for direct convolution (llama/14933)
* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <picard12@live.de>

---------

Co-authored-by: 0cc4m <picard12@live.de>
2025-08-18 20:30:45 +03:00
..
cmake cmake : Fix BLAS link interface (ggml/1316) 2025-08-18 20:30:45 +03:00
include ggml : remove old kompute, cann (skip) (#3349) 2025-07-30 16:08:57 +03:00
src vulkan: optimizations for direct convolution (llama/14933) 2025-08-18 20:30:45 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930) 2025-08-18 20:30:45 +03:00