whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Kitaiti Makoto	2cfefa926b	Add Parakeet::Segment	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	46a3a2cb93	Add TestParakeetContext	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	fdaf031858	Implement Parakeet::Context#initialize	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	8615ac87ec	Free parakeet_full_params	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	3bae1e2f1b	Reduce if	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	f55f3f347c	Check callback container in GetParakeetParams	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	a3515ac9fc	Fix typo	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	d051ab6261	Add hook methods to Parakeet::Params	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	105f7a86b9	Define Parakeet	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	c5894984b3	Simplify params registration	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	17bd819585	Remove unnecessary macros	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	1e7c734a1d	Fix memsize	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	09eff4d1ba	Use ITERATE_CALLBACK_PARAMS instead of ITERATE_USER_DATA_PARAMS	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	d051c08841	Use ITERATE_CALLBACK_PARAMS	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	cd0e91175a	Remove unused variable	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	b1dbf7452d	Define GetParakeetParams	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	f412e289ea	Undefine local macros	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	f39b100bb0	Group callback and user_data params	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	555569481c	Add callbacks to Parakeet::Params	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	703fe18e60	Remove unused variabel	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	30abb35db8	Add tests for Parakeet::Params	2026-06-17 10:30:56 +09:00
Kitaiti Makoto	f3b2ed68e5	Add Whisper::Parakeet::Params	2026-06-17 10:30:56 +09:00
Daniel Bevenius	9efddafb91	parakeet : add support for NVIDIA Parakeet (#3735 ) * parakeet : add support for NVIDIA Parakeet Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-06-16 20:44:10 +02:00
Daniel Bevenius	3805e602d3	ci : only trigger release jobs for tags (#3883 ) * ci : only trigger release jobs for tags This commit removes the building of the release jobs on pushed to master. The motivation for this is that it can be confusing at the momement when releasing that the push to master also triggers the release jobs but the actual release will be skipped. With this change the release job is only run when a tag is pushed which should result in a single Release github actions job and make it easier to follow. * ci : add GGML_NATIVE=OFF for ubuntu-22-gcc	2026-06-16 14:33:42 +02:00
Daniel Bevenius	48f628a848	release : v1.8.7 (#3881 )	2026-06-16 12:28:23 +02:00
Rum Nguyen	db5a84bd79	cli : add --version flag (#3878 ) Adds a `--version` option to whisper-cli that prints the library version via `whisper_version()` and exits, plus a corresponding entry in the help output. Mirrors the existing `-h`/`--help` handling. Closes #608	2026-06-16 08:58:09 +02:00
Georgi Gerganov	0ec0845110	talk-llama : sync llama.cpp	2026-06-15 10:33:53 +03:00
Georgi Gerganov	0a3fa9ca17	sync : ggml	2026-06-15 10:33:53 +03:00
Georgi Gerganov	f35f47b5d2	ggml : bump version to 0.15.1 (ggml/1541)	2026-06-15 10:33:53 +03:00
ZihaoMu	882736f886	ggml: support concat for scalar types at cuda backend (llama/24011) * cuda: support concat for scalar types * Update concat.cu * fix metal ci issue	2026-06-15 10:33:53 +03:00
shaofeiqi	2dcfd49d59	opencl: add q5_0/q5_1 gemm and gemv kernels for Adreno (llama/24319) * opencl: add q5_0 adreno support * opencl: add q5_1 adreno support * opencl: cosmetic fix --------- Co-authored-by: Li He <lih@qti.qualcomm.com>	2026-06-15 10:33:53 +03:00
Jeff Bolz	afd559279c	vulkan: ifdef eMesaHoneykrisp (build fix) (llama/24479) Fixes build/CI after #24306.	2026-06-15 10:33:53 +03:00
Georgi Gerganov	b04008fcec	ggml : bump version to 0.15.0 (ggml/1539)	2026-06-15 10:33:53 +03:00
Winston Ma	6870cfd616	vulkan: add fast path for contiguous buffer transfers (llama/23973)	2026-06-15 10:33:53 +03:00
Kevin Liu	a512e4c5c3	vulkan: use medium matmul tile on Asahi Linux (llama/24306) * vulkan: use medium matmul tile on Asahi Linux * vulkan: switch Apple detection to Honeykrisp driver id	2026-06-15 10:33:53 +03:00
Gaurav Garg	1a1900f90c	Remove padding and multiple D2D copies for MTP (llama/24086) * Make ggml_gated_delta_net take only the initial recurrent state (D, 1, n_seqs) and passes the snapshot count K as an op parameter instead of inferring it from state->ne[1]. Remove the padding hack and copy all emitted snapshots into the recurrent cache with a single strided ggml_cpy * Make GDN changes in all backends. Address review comments. * Fix CI build errors	2026-06-15 10:33:53 +03:00
Oliver Simons	ef85b26d9f	CUDA: Fix ssm_scan_f32 data-races (llama/24360) * Add missing syncthreads before resuing cub_temp_storage __syncthreads() is required before being allowed to resue TempStorage smem: https://nvidia.github.io/cccl/unstable/cub/api/classcub_1_1BlockLoad.html#_CPPv4I0EN3cub9BlockLoad4LoadEv20RandomAccessIteratorRA14ItemsPerThread_1Ti * Add one more missing __syncthreads Could also double-buffer, but alternative is to simply ensure all threads have read smem* before writing to it again in the next loop iteration * Remove unused smem from ssm_scan_f32	2026-06-15 10:33:53 +03:00
Jeff Bolz	dc794303d8	vulkan: reduce iq1 shared memory usage for mul_mm (llama/24287)	2026-06-15 10:33:53 +03:00
Ruben Ortlam	686bc802d1	vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (llama/24123) * vulkan: add support for valve fp16 dot2 extension * use macro for dot2 path choice * properly check for the feature * add dot_product abstraction to reduce preprocessor branching	2026-06-15 10:33:53 +03:00
Pascal	28c7ed3db7	ggml : add GGML_OP_COL2IM_1D (llama/24206) * cpu: add GGML_OP_COL2IM_1D Add the overlap-add (scatter-add) step of a 1D transposed convolution. A ConvTranspose1d factorizes as a GEMM followed by col2im: a weight pre-permuted to [IC, KOC] is contracted against the [IC, T_in] input with mul_mat to produce a column matrix [KOC, T_in], and col2im_1d scatters those columns back into the [T_out, OC] signal, with T_out = (T_in - 1)s0 + K - 2p0. Keeping the contraction as a plain mul_mat leaves the heavy work on the optimized (and quantizable) matmul kernels, so col2im_1d only does the cheap overlap-add. CPU uses a gather formulation parallelized over output channels, supporting F32, F16 and BF16 with an F32 accumulator. * tests: add backend coverage for GGML_OP_COL2IM_1D Add test_col2im_1d next to the conv_transpose_1d cases, covering F32, F16 and BF16 across eight geometries: the canonical kernel = 2stride DAC upsampling shape, overlap, no overlap, cropping (p0 = 1 and p0 = stride/2), kernel < stride with zeroed gaps, kernel not a multiple of stride, and a single column unfold. Perf mode gets three real vocoder stage shapes reporting memory bandwidth. max_nmse_err relaxes to 5e-4 for F16 and BF16. cpu: harden GGML_OP_COL2IM_1D ggml_col2im_1d validates s0, oc, p0 and input contiguity at graph build time, before the oc division, protecting every backend at once. The kernel asserts the contiguity its flat indexing assumes and its doc states the full output length including the crop term. The kernel parallelizes over the time axis: the split stays balanced down to OC = 1, where the previous channel split was single threaded. Values are bit identical on the three real vocoder chains, two out of three improve. * tests: extend the GGML_OP_COL2IM_1D grid The eval grid grows to eleven geometries: OC = 1 (mono output stage), K = 1 with stride > 1 (sparse scatter, every gap position zeroed) and a crop down to T_out = 2 where all the gather bounds act at once. * tests: add col2im_1d equivalence test tests/test-col2im-1d.cpp proves mul_mat + col2im_1d matches the native ggml_conv_transpose_1d on the CPU backend, F32 bit exact, F16 and BF16 through casts of the column matrix. test-backend-ops cannot cover this for a CPU only op since the CPU backend is its own reference there. * rpc: bump protocol patch version for GGML_OP_COL2IM_1D GGML_OP_COUNT goes from 96 to 97 with the new op, which trips the static_assert in ggml-rpc.h. Bump RPC_PROTO_PATCH_VERSION since the op is appended and no existing op code shifts.	2026-06-15 10:33:53 +03:00
Yash Raj Pandey	2d68a3066f	ggml-cpu : fix rms_norm_back wrong output under in-place aliasing (llama/24305) * ggml-cpu : fix rms_norm_back wrong output under in-place aliasing * cont : clean-up comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-06-15 10:33:53 +03:00
ravel7524	72894aa250	Remove case for GGML_TYPE_Q4_K in mvvq.cu (llama/23528)	2026-06-15 10:33:53 +03:00
Reese Levine	e69e5138fe	ggml-webgpu: Add clang-format job (llama/24308) * Add clang-format job * try local formatting	2026-06-15 10:33:53 +03:00
Masashi Yoshimura	aa42b48312	ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants (llama/24225) * ggml-webgpu: Improve prefill speeds + refactor matmul for quants * Fixes for editroconfig checker	2026-06-15 10:33:53 +03:00
Nikhil Jain	15e5d401d1	Handle buffer overlap / buffer aliasing for concat operator (llama/24000) * Only run webgpu CI on my fork * Add webgpu only workflow * handle buffer overlap case for concat operator * restore build-webgpu.yml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Run clang-format * Update ggml/src/ggml-webgpu/wgsl-shaders/concat.wgsl --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Reese Levine <reeselevine1@gmail.com>	2026-06-15 10:33:53 +03:00
Nikhil Jain	490e50056c	Implement 2D workgroups for scale, binary, and unary ops (llama/24044) * Only run webgpu CI on my fork * Add webgpu only workflow * Implement 2d workgroups for more operations * fix * Fix type * Move back to global_invocation_id	2026-06-15 10:33:53 +03:00
Jeff Bolz	fbf720dc9f	vulkan: Use cm2 decode_vector for mul_mat_id B matrix loads (llama/23991) This allows vec4 loads of the B elements. Also increase BK to 64 when this is enabled. Neither of these alone is consistently faster, but together these give a nice speedup. In ggml-vulkan.cpp, we need to make sure the B matrix alignment and stride are multiples of 4.	2026-06-15 10:33:53 +03:00
Ruben Ortlam	782f1226c8	cuda: reset cuda context after reading memory size (llama/23935) * cuda: reset device in get_memory function if no backend is active * also count device and host buffers * exclude hip and musa from counting and device reset * use device mutex instead of atomic * undo backend_free function move	2026-06-15 10:33:53 +03:00
Daniel Bevenius	df7638d822	ci : pin github actions to commit sha's (#3865 )	2026-06-09 12:51:00 +02:00
Christopher Albert	ba573929cd	coreml : fix --quantize crash for mlprogram format; fix --optimize-ane label (#3868 ) commit `8b92060` switched ct.convert() to mlprogram, but did not update the --quantize path. quantize_weights() from neural_network.quantization_utils only works with the legacy neuralnetwork format. Running with --quantize crashed with: Exception: MLModel of type mlProgram cannot be loaded just from the model spec object. It also needs the path to the weights file. Fix: pass compute_precision=ct.precision.FLOAT16 into ct.convert() when --quantize is set. This matches the original intent of nbits=16 (F16 storage) without changing the quantization scheme or model accuracy. Also fix the three boolean CLI flags (--encoder-only, --quantize, --optimize-ane) to use a _str_to_bool helper so that both --flag True and --flag False parse correctly. The type=bool form accepted "False" as True because bool("False") == True. Remove the "currently broken" label from --optimize-ane: the ANE path (WhisperANE with Conv2d attention and LayerNormANE) converts and loads correctly with both PyTorch 2.x and coremltools 9.x.	2026-06-09 08:34:31 +02:00

1 2 3 4 5 ...

4699 Commits All Branches Search

4699 Commits

All Branches