whisper.cpp/ggml
Oliver Simons ef85b26d9f CUDA: Fix ssm_scan_f32 data-races (llama/24360)
* Add missing syncthreads before resuing cub_temp_storage

__syncthreads() is required before being allowed to resue TempStorage
smem:
https://nvidia.github.io/cccl/unstable/cub/api/classcub_1_1BlockLoad.html#_CPPv4I0EN3cub9BlockLoad4LoadEv20RandomAccessIteratorRA14ItemsPerThread_1Ti

* Add one more missing __syncthreads

Could also double-buffer, but alternative is to simply ensure all
threads have read smem* before writing to it again in the next loop
iteration

* Remove unused smem from ssm_scan_f32
2026-06-15 10:33:53 +03:00
..
cmake ggml : Parallelize quant LUT init (llama/23595) 2026-05-25 12:26:07 +03:00
include ggml : add GGML_OP_COL2IM_1D (llama/24206) 2026-06-15 10:33:53 +03:00
src CUDA: Fix ssm_scan_f32 data-races (llama/24360) 2026-06-15 10:33:53 +03:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : bump version to 0.14.0 (ggml/1533) 2026-06-08 14:36:36 +03:00