whisper.cpp

History

Jeff Bolz 00b36237ba vulkan: Add fusion support for RMS_NORM+MUL (llama/14366) * vulkan: Add fusion support for RMS_NORM+MUL - Add a use_count to ggml_tensor, so we can detect if an output is used more than once. - Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor. - Add detection logic and basic fusion logic in ggml-vulkan. - Add some testing support for fusion. Rather than computing one node at a time, allow for computing the whole graph and just testing one node's results. Add rms_norm_mul tests and enable a llama test. * extract some common fusion logic * fix -Winconsistent-missing-override * move ggml_can_fuse to a common function * build fix * C and C++ versions of can_fuse * move use count to the graph to avoid data races and double increments when used in multiple threads * use hash table lookup to find node index * change use_counts to be indexed by hash table slot * minimize hash lookups style fixes * last node doesn't need single use. fix type. handle mul operands being swapped. * remove redundant parameter --------- Co-authored-by: slaren <slarengh@gmail.com>		2025-07-01 17:54:53 +03:00
..
ggml-alloc.h	ggml : upgrade init_tensor API to return a ggml_status (llama/11854)	2025-03-08 15:13:01 +02:00
ggml-backend.h	vulkan: Add fusion support for RMS_NORM+MUL (llama/14366)	2025-07-01 17:54:53 +03:00
ggml-blas.h	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
ggml-cann.h	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
ggml-cpp.h	ggml : fix ggml_gallocr_ptr type (ggml/1205)	2025-05-01 13:29:02 +03:00
ggml-cpu.h	ggml : add ggml_set_rows (llama/14274)	2025-07-01 17:54:53 +03:00
ggml-cuda.h	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
ggml-kompute.h	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
ggml-metal.h	repo : update links to new url (llama/11886)	2025-02-27 08:55:36 +02:00
ggml-opencl.h	Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693)	2024-12-18 12:52:16 +02:00
ggml-opt.h	mnist: fix segmentation fault (ggml/1227)	2025-05-19 14:58:39 +03:00
ggml-rpc.h	rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (llama/12943)	2025-05-01 13:29:02 +03:00
ggml-sycl.h	ggml : build backends as libraries (llama/10256)	2024-11-20 21:00:08 +02:00
ggml-vulkan.h	vulkan: Make Vulkan optional at runtime (ggml/11493). (llama/11494)	2025-02-27 08:55:36 +02:00
ggml.h	ggml : add ggml_set_rows (llama/14274)	2025-07-01 17:54:53 +03:00
gguf.h	GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)	2025-01-14 10:38:01 +02:00