* cli : merge tokens split across UTF-8 boundaries in JSON output
When a multi-byte UTF-8 codepoint (most commonly a CJK character, 3 bytes)
is split across multiple whisper tokens, the -ojf/--output-json-full
writer emitted each token's partial bytes as its own JSON string, producing
invalid UTF-8 that chokes downstream parsers.
Merge adjacent tokens in output_json whenever the accumulated text still
ends on an incomplete UTF-8 sequence. The merged entry keeps the first
token's id/p/t_dtw and extends t1 to the last absorbed token, which
matches how segment text is assembled elsewhere.
Refs #1798
* fix: address review — add braces for consistency, use full issue URL
- Add braces to if/else chain for codebase consistency
- Use full URL for issue #1798 reference
Review: @danbev
---------
Co-authored-by: texasich <texasich@users.noreply.github.com>
Co-authored-by: texasich <texasich@gmail.com>