whisper.cpp

History

Pascal a0c421f7ab cuda: fuse snake activation (mul, sin, sqr, mul, add) (llama/22667) * cuda: fuse snake activation (mul, sin, sqr, mul, add) Add ggml_cuda_op_snake_fused with F32 / F16 / BF16 templates. The matcher recognizes the naive 5 op decomposition emitted by audio decoders (BigVGAN, Vocos) for snake activation y = x + sin(ax)^2 inv_b and rewrites it to a single elementwise kernel. Add test_snake_fuse comparing CPU naive vs CUDA fused across F32 / F16 / BF16. * cuda: address review feedback from @am17an Use ggml_cuda_cast for F32/F16/BF16 conversions and rename kernel_snake to snake_kernel to match upstream conventions. * cuda: snake fusion fastdiv on T_len, Suggested-by: @am17an * Update tests/test-backend-ops.cpp Co-authored-by: Aman Gupta <amangupta052@gmail.com> * cuda: snake fusion check add->type matches x->type Address review feedback from @am17an * cuda: snake fusion check add->type matches x->type Moved for readability (equivalent) Address review feedback from @am17an --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>		2026-05-10 17:26:32 +03:00
..
cmake	cmake : add FindNCCL.cmake (ggml/0)	2026-05-02 15:02:42 +03:00
include	CUDA: lower-case PCI bus id, standardize for ggml (llama/22820)	2026-05-10 17:26:31 +03:00
src	cuda: fuse snake activation (mul, sin, sqr, mul, add) (llama/22667)	2026-05-10 17:26:32 +03:00
.gitignore	…
CMakeLists.txt	ggml : bump version to 0.11.0 (ggml/1478)	2026-05-10 17:26:30 +03:00