whisper.cpp

History

texasich 27101c01dc cli : merge tokens split across UTF-8 boundaries in JSON output (#3751 ) * cli : merge tokens split across UTF-8 boundaries in JSON output When a multi-byte UTF-8 codepoint (most commonly a CJK character, 3 bytes) is split across multiple whisper tokens, the -ojf/--output-json-full writer emitted each token's partial bytes as its own JSON string, producing invalid UTF-8 that chokes downstream parsers. Merge adjacent tokens in output_json whenever the accumulated text still ends on an incomplete UTF-8 sequence. The merged entry keeps the first token's id/p/t_dtw and extends t1 to the last absorbed token, which matches how segment text is assembled elsewhere. Refs #1798 * fix: address review — add braces for consistency, use full issue URL - Add braces to if/else chain for codebase consistency - Use full URL for issue #1798 reference Review: @danbev --------- Co-authored-by: texasich <texasich@users.noreply.github.com> Co-authored-by: texasich <texasich@gmail.com>		2026-05-26 06:23:41 +02:00
..
addon.node	vad : Silero VAD v6.2.0 (#3524 )	2025-11-17 22:26:17 +09:00
bench	bench : sync submit-results URL to ggml-org (#3769 )	2026-04-20 07:12:57 +02:00
bench.wasm	bench : sync submit-results URL to ggml-org (#3769 )	2026-04-20 07:12:57 +02:00
cli	cli : merge tokens split across UTF-8 boundaries in JSON output (#3751 )	2026-05-26 06:23:41 +02:00
command	whisper : enable flash attention by default (#3441 )	2025-09-30 15:47:20 +03:00
command.wasm	examples : add wchess.wasm to wasm examples build (#3443 )	2025-09-30 16:23:01 +02:00
deprecation-warning	examples : add WHISPER_SDL2 check to deprecation executables (#2911 )	2025-03-20 18:36:02 +01:00
lsp	examples : fix executable example targets (#3600 )	2026-01-13 08:08:18 +01:00
python	readme : remove invalid flag from Python example (#2396 )	2024-08-30 14:00:38 +03:00
quantize	examples : fix executable example targets (#3600 )	2026-01-13 08:08:18 +01:00
server	common : fix server /inference fails to decode in-memory audio (regression) (#3818 )	2026-05-22 08:27:35 +02:00
stream	whisper : enable flash attention by default (#3441 )	2025-09-30 15:47:20 +03:00
stream.wasm	examples : add wchess.wasm to wasm examples build (#3443 )	2025-09-30 16:23:01 +02:00
sycl	sycl: fix example build (#2570 )	2024-11-18 14:57:23 +02:00
talk-llama	talk-llama : sync llama.cpp	2026-05-25 12:26:07 +03:00
vad-speech-segments	examples : fix executable example targets (#3600 )	2026-01-13 08:08:18 +01:00
wchess	wchess : fix link [no ci]	2025-09-30 21:28:03 +03:00
whisper.android	whisper : add version function (#3289 )	2025-06-26 18:09:42 +02:00
whisper.android.java	whisper : add version function (#3289 )	2025-06-26 18:09:42 +02:00
whisper.nvim	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
whisper.objc	docs : update README.md for whisper.objc app (#2569 )	2025-05-13 06:03:50 +02:00
whisper.swiftui	examples : clarify Core ML encoder model usage [no ci] (#2987 )	2025-04-02 08:32:14 +02:00
whisper.wasm	wasm : fix Hebrew ID (#3487 )	2025-10-27 08:49:32 +02:00
CMakeLists.txt	examples : add wchess.wasm to wasm examples build (#3443 )	2025-09-30 16:23:01 +02:00
coi-serviceworker.js	ci : add github pages workflow for wasm examples (#2969 )	2025-03-31 11:34:40 +02:00
common-ggml.cpp	examples : update to Q1_0	2026-05-01 13:07:33 +03:00
common-ggml.h	…
common-sdl.cpp	common : more general m_audio_len update logic (#2855 )	2025-03-07 10:10:03 +02:00
common-sdl.h	sdl : fix audio callback (#1523 )	2023-11-20 13:16:38 +02:00
common-whisper.cpp	common : fix server /inference fails to decode in-memory audio (regression) (#3818 )	2026-05-22 08:27:35 +02:00
common-whisper.h	common : fix server /inference fails to decode in-memory audio (regression) (#3818 )	2026-05-22 08:27:35 +02:00
common.cpp	whisper: remove MSVC warnings pragmas (#3090 )	2025-05-05 13:09:35 +02:00
common.h	examples : add --print-confidence option to cli (#3150 )	2025-05-14 19:21:48 +02:00
ffmpeg-transcode.cpp	examples : fix deprecated FFmpeg functions (#3073 )	2025-04-28 06:16:50 +02:00
generate-karaoke.sh	examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759 )	2025-02-27 09:06:54 +02:00
grammar-parser.cpp	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
grammar-parser.h	whisper : add grammar-based sampling (#1229 )	2023-11-13 10:51:34 +02:00
helpers.js	js : remove un-needed request header from fetchRemote (#2119 )	2024-05-13 15:13:19 +03:00
json.hpp	examples : clean up common code (#1871 )	2024-02-19 10:50:15 +02:00
livestream.sh	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
miniaudio.h	examples : update miniaudio library to 0.11.24 (#3672 )	2026-02-27 11:15:15 +01:00
server.py	examples : add wchess.wasm to wasm examples build (#3443 )	2025-09-30 16:23:01 +02:00
stb_vorbis.c	examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759 )	2025-02-27 09:06:54 +02:00
twitch.sh	rename : ggerganov -> ggml-org (#3005 )	2025-04-04 16:11:52 +03:00
yt-wsp.sh	examples : update usage/help in yt-wsp.sh (#3251 )	2025-06-16 12:21:16 +02:00