whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Pascal	28c7ed3db7	ggml : add GGML_OP_COL2IM_1D (llama/24206) * cpu: add GGML_OP_COL2IM_1D Add the overlap-add (scatter-add) step of a 1D transposed convolution. A ConvTranspose1d factorizes as a GEMM followed by col2im: a weight pre-permuted to [IC, KOC] is contracted against the [IC, T_in] input with mul_mat to produce a column matrix [KOC, T_in], and col2im_1d scatters those columns back into the [T_out, OC] signal, with T_out = (T_in - 1)s0 + K - 2p0. Keeping the contraction as a plain mul_mat leaves the heavy work on the optimized (and quantizable) matmul kernels, so col2im_1d only does the cheap overlap-add. CPU uses a gather formulation parallelized over output channels, supporting F32, F16 and BF16 with an F32 accumulator. * tests: add backend coverage for GGML_OP_COL2IM_1D Add test_col2im_1d next to the conv_transpose_1d cases, covering F32, F16 and BF16 across eight geometries: the canonical kernel = 2stride DAC upsampling shape, overlap, no overlap, cropping (p0 = 1 and p0 = stride/2), kernel < stride with zeroed gaps, kernel not a multiple of stride, and a single column unfold. Perf mode gets three real vocoder stage shapes reporting memory bandwidth. max_nmse_err relaxes to 5e-4 for F16 and BF16. cpu: harden GGML_OP_COL2IM_1D ggml_col2im_1d validates s0, oc, p0 and input contiguity at graph build time, before the oc division, protecting every backend at once. The kernel asserts the contiguity its flat indexing assumes and its doc states the full output length including the crop term. The kernel parallelizes over the time axis: the split stays balanced down to OC = 1, where the previous channel split was single threaded. Values are bit identical on the three real vocoder chains, two out of three improve. * tests: extend the GGML_OP_COL2IM_1D grid The eval grid grows to eleven geometries: OC = 1 (mono output stage), K = 1 with stride > 1 (sparse scatter, every gap position zeroed) and a crop down to T_out = 2 where all the gather bounds act at once. * tests: add col2im_1d equivalence test tests/test-col2im-1d.cpp proves mul_mat + col2im_1d matches the native ggml_conv_transpose_1d on the CPU backend, F32 bit exact, F16 and BF16 through casts of the column matrix. test-backend-ops cannot cover this for a CPU only op since the CPU backend is its own reference there. * rpc: bump protocol patch version for GGML_OP_COL2IM_1D GGML_OP_COUNT goes from 96 to 97 with the new op, which trips the static_assert in ggml-rpc.h. Bump RPC_PROTO_PATCH_VERSION since the op is appended and no existing op code shifts.	2026-06-15 10:33:53 +03:00
Valeriy Dubov	182db04cb2	rpc : add native RDMA transport for RPC backend (RoCEv2) (llama/20590)	2026-04-30 11:29:09 +03:00
Georgi Gerganov	1e05b10d67	ggml : bump RPC version (llama/20330)	2026-03-16 13:10:15 +02:00
Georgi Gerganov	aefcd75f4f	rpc : fix alloc size logic (llama/17116) * rpc : fix alloc size logic * rpc : bump version	2025-12-12 17:53:18 +02:00
Radoslav Gerganov	d26d1c8b85	rpc : cache and reuse compute graphs (llama/15405) Store the last computed graph and reuse it when possible. Also do not return response from GRAPH_COMPUTE and assume it always completes successfully. If this this is not the case, the server closes the connection. This saves us a network round trip to the server.	2025-12-12 17:53:11 +02:00
Radoslav Gerganov	6aa18cccd8	rpc : report actual free memory (llama/16616) * rpc : report actual free memory Start reporting the free memory on every device instead of using fixed values. Now llama-cli users can get a nice memory breakdown when using RPC devices. * drop --mem in rpc-server	2025-10-22 12:58:11 +03:00
Radoslav Gerganov	af51bbab88	rpc : add support for multiple devices (llama/16276) * rpc : add support for multiple devices Allow rpc-server to expose multiple devices from a single endpoint. Change RPC protocol to include device identifier where needed. closes: #15210 * fixes * use ggml_backend_reg_t * address review comments * fix llama-bench backend report * address review comments, change device naming * fix cmd order	2025-10-12 11:16:23 +03:00
Radoslav Gerganov	fe21ddf0dc	rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (llama/12943) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.	2025-05-01 13:29:02 +03:00
Radoslav Gerganov	24d29c55df	rpc : add RPC_CMD_HELLO (llama/12955) Add RPC_CMD_HELLO for getting the version of the protocol implemend by the server. Follow the semantic versioning rules at https://semver.org Hopefully this bring better user experience when we make breaking changes at the protocol level and avoid issues like #12465	2025-04-24 20:39:16 +03:00
Radoslav Gerganov	5bad2e5099	rpc : send hash when tensor data is above some fixed threshold (llama/12496) * rpc : send hash when tensor data is above some fixed threshold ref #10095 * rpc : put cache under $HOME/.cache/llama.cpp * try to fix win32 build * another try to fix win32 build * remove llama as dependency	2025-03-28 21:47:42 +02:00
Diego Devesa	746bf2596f	ggml : build backends as libraries (llama/10256) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-20 21:00:08 +02:00
Diego Devesa	c313723860	rpc : add backend registry / device interfaces (llama/9812) * rpc : add backend registry / device interfaces * llama : add llama_supports_rpc API * ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server	2024-11-01 10:19:05 +02:00
Diego Devesa	1acfadb721	ggml-backend : add device and backend reg interfaces (llama/9707) Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-05 15:23:51 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00

14 Commits