whisper.cpp

History

Gaurav Garg ae6a9bb9a5 CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183) - Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value. - Prefer vector flash attention kernels over MMA kernel for BS=1 Fixes Issue: #12182 --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>		2025-03-27 11:06:03 +02:00
..
cmake	cmake: Comment out GGML_BIN_DIR for now (ggml/1139)	2025-03-27 11:06:03 +02:00
include	llama: Add support for RWKV v7 architecture (llama/12412)	2025-03-27 11:06:03 +02:00
src	CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (llama/12183)	2025-03-27 11:06:03 +02:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	SYCL: using graphs is configurable by environment variable and compile option (llama/12371)	2025-03-27 11:06:03 +02:00