whisper.cpp

History

Chenguang Li 02e8b23137 CANN: refactor mask handling and improve performance in FA (llama/15561) * CANN(flash-attn): refactor mask handling and improve performance 1. Refactored the mask computation in Flash Attention, unified the logic without separating prefill and decode. 2. Optimized performance in non-alibi scenarios by reducing one repeat operation. 3. Updated operator management to explicitly mark unsupported cases on 310P devices and when dim is not divisible by 16. Signed-off-by: noemotiovon <757486878@qq.com> * [CANN]: fix review Signed-off-by: noemotiovon <757486878@qq.com> * [CANN]: Optimization FA BNSD to BSND Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>		2025-09-20 13:42:43 +03:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094)	2025-08-18 20:30:45 +03:00
include	ggml: add `conv3d` op (llama/15182)	2025-09-20 13:42:39 +03:00
src	CANN: refactor mask handling and improve performance in FA (llama/15561)	2025-09-20 13:42:43 +03:00
.gitignore	…
CMakeLists.txt	CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433)	2025-09-20 13:42:38 +03:00