* cli : merge tokens split across UTF-8 boundaries in JSON output
When a multi-byte UTF-8 codepoint (most commonly a CJK character, 3 bytes)
is split across multiple whisper tokens, the -ojf/--output-json-full
writer emitted each token's partial bytes as its own JSON string, producing
invalid UTF-8 that chokes downstream parsers.
Merge adjacent tokens in output_json whenever the accumulated text still
ends on an incomplete UTF-8 sequence. The merged entry keeps the first
token's id/p/t_dtw and extends t1 to the last absorbed token, which
matches how segment text is assembled elsewhere.
Refs #1798
* fix: address review — add braces for consistency, use full issue URL
- Add braces to if/else chain for codebase consistency
- Use full URL for issue #1798 reference
Review: @danbev
---------
Co-authored-by: texasich <texasich@users.noreply.github.com>
Co-authored-by: texasich <texasich@gmail.com>
* common: add memory buffer overload of read_audio_data
whisper-server /inference without --convert passed the uploaded file
bytes to read_audio_data as a filename, so ma_decoder_init_file tried
to open a path starting with "RIFF" and failed. every request returned
HTTP 400 "Invalid request" on builds without WHISPER_FFMPEG, which is
the default.
factor the PCM extraction into a shared helper and add an overload that
decodes straight from a memory buffer via ma_decoder_init_memory, which
the function already used for the stdin path. server now calls it with
the upload content. the filename overload behavior is unchanged.
This commit addresses a memory leak in the `read_audio_data` function
where it is currently possible that a call to `ma_decoder_init_file`
succeeds and the function returns early without calling
`ma_decoder_uninit`. A similar situation can occur with
`ma_decoder_init_memory`.
Refs: https://bugs.debian.org/1124796
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
The project moved from ggerganov/ to ggml-org/ and the README already
references the new URL in both places it mentions issue #89 (README.md
and examples/bench/README.md). Syncing the two remaining hardcoded URLs
in examples/bench/bench.cpp and examples/bench.wasm/emscripten.cpp.
The old URL still redirects, so this is cosmetic.
Several error paths in the /inference and /load endpoints returned
HTTP 200 with a JSON error body, making it impossible for clients
to distinguish errors from successful responses by status code.
Set 400 for client errors (missing file field, unreadable audio,
missing/invalid model) and 500 for server errors (ffmpeg conversion
failure). The two existing status-code sites (499 for client
disconnect, 500 for processing failure) are unchanged.
* cmake:
- added `whisper-` prefix to unprefixed targets: `quantize`, `lsp`,
`vad-speech-segments`
- added `install(TARGETS ${TARGET} RUNTIME)` where it was missing
Signed-off-by: Peter A. <ink.splatters@pm.me>
* .github/workflows/build.yml: quantize -> whisper-quantize
Signed-off-by: Peter A. <ink.splatters@pm.me>
---------
Signed-off-by: Peter A. <ink.splatters@pm.me>
* Add support for --carry-initial-prompt
* PR fixes for ruby and go
* Refactoring for readability
* WIP 1
* WIP 2
* PR fixes
* More PR fixes
* PR fix
* Further simplification
* d'oh
* One more logic fix
* Update src/whisper.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Truncate prompt_past0 upon initialization
* Slight simplification
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* cli: Fix assignment for vad_min_silence_duration_ms
Found and fixed this simple copy/paste error
* server : fix vad_min_silence_duration_ms assignment
---------
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>