whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	c64f3e8ada	common : separate whisper sources (#2846 ) * common : separate whisper sources * examples : add chrono * examples : add more headers	2025-02-27 12:50:32 +02:00
Georgi Gerganov	9f83f67221	common : fix build min/max (#2845 ) * common : try to fix build * cont : try another fix	2025-02-27 10:39:13 +02:00
Dmitry Atamanov	7d3da68f79	examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759 )	2025-02-27 09:06:54 +02:00
petterreinholdtsen	b5d21359c1	stream : stop on ^C when no audio is received (#2822 ) Add check for ctrl-c in potentially endless loop while calling audio.get() to receive sound. Co-authored-by: Petter Reinholdtsen <pere@debian.org>	2025-02-27 08:59:51 +02:00
masahji	dfc6ca62f3	stream : add beam size parameter(#2836 ) * feat: Add beam size parameter to stream.cpp for beam search configuration * feat: Add beam size parameter to whisper full params in stream example * fix: Remove duplicate beam search size assignment in server.cpp	2025-02-25 11:39:33 +02:00
Judd	d682e15090	Fixes for Windows (#2790 ) Fixes for Windows: * MSVC default to utf-8 without BOM. * Console output code page changed to utf-8. --------- Co-authored-by: Judd <foldl@boxvest.com>	2025-02-06 15:37:21 +08:00
billyct	cadfc50eab	node : add max_len params in node addon (#2760 )	2025-02-03 22:49:06 +02:00
Georgi Gerganov	3f91832352	talk-llama : sync llama.cpp	2025-02-03 22:42:26 +02:00
Corey Earwood	7a423f1c00	whisper.objc : fix build and CI	2025-01-18 12:06:06 +02:00
Georgi Gerganov	99b011a9f5	talk-llama : sync llama.cpp	2025-01-14 10:38:01 +02:00
Georgi Gerganov	e940fbf283	server : fix build (#2718 )	2025-01-13 08:57:33 +02:00
Georgi Gerganov	35d0e02c72	talk-llama : sync llama.cpp (#2709 )	2025-01-13 08:55:48 +02:00
NETZkultur GmbH	45d3faf961	server : generate unique tmp filenames (#2718 ) #Summary This Merge Request adds a mechanism to generate unique filenames for FFmpeg conversions in whisper_server.cpp. Previously, a single fixed filename was used (e.g., whisper-server-tmp.wav), which could result in unexpected file overwrites under certain circumstances. By generating a unique filename per request, any risk of overwriting temporary files is eliminated. #Background / Motivation • Problem: Relying on a static filename for temporary audio files may lead to overwrites if multiple operations occur simultaneously or if the same file name is reused. • Goal: Dynamically generate unique filenames, ensuring each request or operation uses an isolated temporary file.	2025-01-13 08:55:21 +02:00
Yusuf Redžić	ece3ff88f6	cli : fix segfault on missing argument (#2700 )	2025-01-04 10:47:41 +02:00
Alter	c81b8b910b	objc : rename ggml-cpu-aarch64.c to .cpp (#2687 )	2025-01-02 12:05:09 +02:00
Georgi Gerganov	5136fd92c2	examples : handle "main.exe" deprecation	2024-12-30 13:00:18 +02:00
Andreas Lubbe	7d55637f0b	cli : add --suppress_nst support (#2664 )	2024-12-24 09:30:07 +02:00
Andreas Lubbe	0994506054	cli : add no_speech_thold (#2663 )	2024-12-24 09:29:19 +02:00
Georgi Gerganov	ed09075ca0	server : fix help print	2024-12-22 15:32:05 +02:00
Sacha Arbonel	4183517076	server : add no-speech threshold parameter and functionality (#2654 )	2024-12-21 17:00:08 +02:00
Georgi Gerganov	f4668169a0	whisper : rename suppress_non_speech_tokens to suppress_nst (#2653 )	2024-12-21 12:54:35 +02:00
Sacha Arbonel	944ce49439	server : add option to suppress non-speech tokens (#2649 ) * The parameter will suppress non-speech tokens like [LAUGH], [SIGH], etc. from the output when enabled. * add to whisper_params_parse * add missing param	2024-12-21 12:05:05 +02:00
Georgi Gerganov	2e59dced12	whisper : rename binaries + fix install (#2648 ) * whisper : rename binaries + fix install * cont : try to fix ci * cont : fix emscripten builds	2024-12-21 09:43:49 +02:00
Georgi Gerganov	ba6c2a8fd9	android : try to fix build	2024-12-18 12:52:16 +02:00
Georgi Gerganov	6576af00d7	files : remove old sources	2024-12-18 12:52:16 +02:00
Georgi Gerganov	61edb117a0	talk-llama : sync llama.cpp	2024-12-18 12:52:16 +02:00
Georgi Gerganov	60dc6d003f	common : remove old types ggml-ci	2024-12-18 12:52:16 +02:00
crummyh	d34445e960	stream : improve consistency in README (#2642 )	2024-12-18 08:43:48 +02:00
Georgi Gerganov	199579652e	common : add cstdio header	2024-12-16 08:57:04 +02:00
Georgi Gerganov	d17e7139d8	stream : update build instructions	2024-12-15 21:55:36 +02:00
Thamster	6a52eaea74	android : fix build and ci (#2624 ) * Adding missing CMakeLists.txt include for ggm-cpu needed by whisper.android * attempt to re-enable CI for JNI android --------- Co-authored-by: Your Name <you@example.com>	2024-12-14 17:25:53 +02:00
Georgi Gerganov	472464453d	ci : disable CUDA and Android builds	2024-12-08 20:14:35 +02:00
Georgi Gerganov	11dddfbc9e	ci : disable Obj-C build + fixes	2024-12-08 20:14:35 +02:00
Georgi Gerganov	f2c680f893	talk-llama : sync llama.cpp	2024-12-08 20:14:35 +02:00
Georgi Gerganov	02c6fcbc2c	common : fix compile warning ggml-ci	2024-12-08 20:14:35 +02:00
Georgi Gerganov	7fd8d9c220	whisper : adapt to new ggml (wip)	2024-11-20 21:00:08 +02:00
Georgi Gerganov	06e059b8f8	talk-llama : sync llama.cpp	2024-11-20 21:00:08 +02:00
Stefan Sydow	d24f981fb2	sycl: fix example build (#2570 )	2024-11-18 14:57:23 +02:00
Jhen-Jie Hong	c4e95fb74d	whisper.swiftui : switch Mac dest to Mac (Designed for iPad) (#2562 )	2024-11-15 15:21:53 +02:00
Georgi Gerganov	6477b84eb6	build : fixes	2024-11-15 15:21:04 +02:00
Georgi Gerganov	24d706774d	talk-llama : sync llama.cpp	2024-11-15 15:21:04 +02:00
Jhen-Jie Hong	5f8a086e22	whisper.swiftui : add model download list & bench methods (#2546 ) * swift : fix resources & exclude build * whisper : impl whisper_timings struct & api * whisper.swiftui : model list & bench methods * whisper : return ptr for whisper_get_timings * revert unnecessary change * whisper : avoid designated initializer * whisper.swiftui: code style changes * whisper.swiftui : get device name / os from UIDevice * whisper.swiftui : fix UIDevice usage * whisper.swiftui : add memcpy and ggml_mul_mat (commented)	2024-11-13 21:51:34 +02:00
Stefan Sydow	300c07b94d	examples : fix ffmpeg v5 build (#2543 ) remove call to 'av_register_all()' which does not exist in ffmpeg v5 anymore.	2024-11-13 21:41:52 +02:00
Georgi Gerganov	c65d0fd3c8	talk-llama : sync llama.cpp	2024-11-01 10:19:05 +02:00
Rotem Dan	b6049060dd	whisper : add dtw preset for large-v3-turbo (#2481 )	2024-10-15 21:00:21 +03:00
Georgi Gerganov	6e40108a59	objc : fix build	2024-10-05 15:23:51 +03:00
Georgi Gerganov	941912467d	whisper : adapt to latest ggml (skip) (#0 )	2024-10-05 15:23:51 +03:00
Rahul Vadhyar	2944cb72d9	examples : update dr_wav.h to newer version (#2449 )	2024-10-04 11:04:51 +03:00
Georgi Gerganov	ccc2547210	talk-llama : sync llama.cpp	2024-10-03 12:22:17 +03:00
gilbertgong	ede1718f6d	server : ffmpeg overwrite leftover temp file (#2431 ) * Remove possible leftover ffmpeg temp file from a previous failed conversion * Revert "Remove possible leftover ffmpeg temp file from a previous failed conversion" This reverts commit `00797403bd`. * Flag to force ffmpeg to overwrite output file if it exists	2024-10-02 15:06:40 +03:00
Georgi Gerganov	2ef717b293	whisper : add large-v3-turbo (#2440 )	2024-10-01 15:57:06 +03:00
Georgi Gerganov	451e9ee92c	make : remove "talk" target until updated	2024-09-24 19:45:08 +03:00
Georgi Gerganov	fe18c29ab8	talk-llama : sync llama.cpp	2024-09-24 19:45:08 +03:00
Georgi Gerganov	54e5095765	examples : adapt to ggml.h changes (ggml/0) ggml-ci	2024-09-24 19:45:08 +03:00
Toliver	5b1ce40fa8	server : use OS-generated temp file name for converted files (#2419 )	2024-09-17 15:56:32 +03:00
UsernamesLame	9600fc3eb1	readme : remove invalid flag from Python example (#2396 ) * Update README.md Fix broken C-style API link * Update whisper_processor.py Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call. * Add pywhispercpp to the Pybind11 Python wrapper list abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?) It was referenced in issue #9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp	2024-08-30 14:00:38 +03:00
Georgi Gerganov	da9809f243	talk-llama : sync llama.cpp	2024-08-28 13:22:20 +03:00
Justine Tunney	7f78675008	examples : use colorblind friendly TTY color scheme (#2360 ) This change updates the -pc flag, so that a new xterm256 color scheme is used. This color scheme is believed to be better for three reasons: 1. It should be friendlier to the colorblind. The scheme was designed by Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it since 2017, so it's already popular in the machine learning community 2. It should appear to be the same colors as before to people who aren't i.e. it's still a red-green spectrum like before but lightly modified 3. It is readable in both white and black background terminals. The neon colors before were probably a bit too intense for white backgrounds.	2024-08-20 10:49:10 +03:00
Georgi Gerganov	58323bf8ed	build : fix aarch64 (#0 )	2024-08-08 22:48:46 +03:00
Georgi Gerganov	22058f2dbc	talk-llama : sync llama.cpp	2024-08-08 22:48:46 +03:00
Georgi Gerganov	c7ea4fd235	common : handle new quant types (ggml/0)	2024-08-08 22:48:46 +03:00
Georgi Gerganov	dbf9c15e30	talk-llama : sync llama.cpp	2024-07-08 14:53:55 +03:00
Georgi Gerganov	d3f6c34976	examples : fix compile warnings [no ci] (#0 )	2024-07-08 14:53:55 +03:00
Emmanuel Schmidbauer	bec9836849	server : add inference path to make OAI API compatible (#2270 )	2024-07-08 14:24:58 +03:00
Georgi Gerganov	4a62efbb95	cmake : minor fixes	2024-06-26 21:42:39 +03:00
Georgi Gerganov	dc8cc2dd6f	whisper : disable CUDA mel + fix FFMPEG	2024-06-26 20:11:38 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00
Georgi Gerganov	e293f17d34	talk-llama : sync llama.cpp	2024-06-18 09:45:37 +03:00
slaren	de29b193f6	move BLAS to a separate backend (cont) (llama/6210) ggml-ci	2024-06-18 09:39:40 +03:00
Georgi Gerganov	3b1ac03828	ggml : remove OpenCL (#0 )	2024-06-16 18:19:48 +03:00
Georgi Gerganov	061eeb9f61	talk-llama : sync llama.cpp	2024-06-16 18:19:48 +03:00
Borislav Stanimirov	af5833e298	whisper : remove `speed_up` and `phase_vocoder` functions (#2198 ) whisper : fix cast warning * whisper : remove phase_vocoder functions, ref #2195 * whisper : remove speed_up from whisper_full_params, closes #2195	2024-05-31 11:37:29 +03:00
Daniel Valdivia	a7dc2aab16	server : fix typo (#2181 ) A simple comment typo, PR can be dismissed	2024-05-25 10:46:22 +03:00
William Tambellini	1b51fdf170	examples : add support for decoding input with ffmpeg (Linux) (#2133 ) - search for ffmpeg libs/headers at cmake time - added ffmpeg-transcode.cpp into libcommon if ffmpeg on - hooked ffmpeg trancoding in common read_wav(...) - passed test: ./main -m ggml-base.en.bin -f samples/jfk.mp3	2024-05-21 18:31:41 +03:00
Pedro Probst	adee3f9c1f	node : add flash_attn param (#2170 )	2024-05-20 09:08:48 +03:00
Georgi Gerganov	7094ea5e75	whisper : use flash attention (#2152 ) * whisper : use flash attention in the encoder * whisper : add kv_pad * whisper : remove extra backend instance (huh?) * whisper : use FA for cross-attention * whisper : use FA for self-attention * whisper : simplify encoder FA * whisper : add flash_attn runtime parameter * scripts : add bench log * scripts : add M1 Pro bench log	2024-05-15 09:38:19 +03:00
petterreinholdtsen	9d5771ae43	talk-llama : reject runs without required arguments (#2153 ) * Extended talk-llama example to reject runs without required arguments. Print warning and exit if models are not specified on the command line. * Update examples/talk-llama/talk-llama.cpp * Update examples/talk-llama/talk-llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-14 21:32:41 +03:00
Georgi Gerganov	4ef8d9f44e	server : return utf-8 (#2138 )	2024-05-13 15:33:46 +03:00
Pedro Probst	3928dbd206	node : add audio_ctx and audio buffer params (#2123 ) * node : add audio_ctx param * node : support passing audio buffer directly * node : parse audio_ctx in index.js --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-13 15:22:23 +03:00
valVk	30f73109b8	node : add additional params (#2000 ) * Add additional params to addon.node * Add comma_in_time as parameter * Fix tests	2024-05-13 15:15:43 +03:00
Mark Karpelès	17fa62d3d3	js : remove un-needed request header from fetchRemote (#2119 )	2024-05-13 15:13:19 +03:00
Daniel Ziegenberg	0bb05b113d	main : dont print timings with --no-prints (#2108 ) Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>	2024-05-13 15:00:19 +03:00
Daniel Ziegenberg	f141b2b938	main : add options for temperature control (#2088 ) Add two options: ``` -tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1 -tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1 ``` The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit. Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>	2024-05-13 14:59:44 +03:00
zhangjixiong	e93081f83f	whisper.android : update example, add field to print timestamp (#2072 )	2024-05-13 14:30:03 +03:00
Xingchen Song(宋星辰)	b6bbce4ae9	cmake : fix json INTERFACE library (#2069 )	2024-05-13 14:29:39 +03:00
mashizora	7705dc52da	main : fix double quote escaping in csv output (#2090 )	2024-05-13 11:55:32 +03:00
Georgi Gerganov	3fa7d29876	talk-llama : sync llama.cpp	2024-05-13 11:02:26 +03:00
Georgi Gerganov	accada542a	ggml : resolve merge (ggml/0) ggml-ci	2024-05-13 11:02:26 +03:00
Pedro Probst	58210d6a76	examples : fix node compilation (#2115 ) * node : fix compilation and update examples * node : fix readme * Update addon.node test	2024-05-02 22:52:55 +01:00
Georgi Gerganov	b0c3cbf2e8	main : pass nullptr when regex is empty (#2070 )	2024-04-17 12:23:47 +03:00
Emmanuel Schmidbauer	9fab28135c	server : add dtw (#2044 ) * server.cpp: add dtw * Update examples/server/server.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-15 22:16:58 +03:00
Pedro Probst	1b5439a6c2	node : support no timestamps (#2048 ) * fix: node: do not compute timestamps if you do not need them * feat: add no_timestamps parameter to node addon	2024-04-15 20:03:34 +03:00
Kendrick Taylor	5c554c04ff	whisper.nvim : fix missing reference to "model" variable (#2049 )	2024-04-15 19:41:28 +03:00
Ikko Eltociear Ashimine	c383f091a1	whisper : update grammar-parser.cpp (#2058 ) preceeding -> preceding	2024-04-15 19:40:27 +03:00
ulatekh	c15b4cda7d	common : fix file-handle leak in read_wav() (#2026 ) Now it cleans up in case of error.	2024-04-09 18:34:34 +03:00
Rotem Dan	d3cfb6ca2b	main : set stdin to binary mode on Windows (#2025 )	2024-04-09 18:33:32 +03:00
ulatekh	671b4bde6c	main : allow a response-file as the sole parameter (#2019 ) * The "main" example now allows a response-file as the sole parameter. A response-file is a text file with command-line parameters, one per line. Prefix the name of the response-file with "@" to identify it as such. It's used under MS Windows to work around command-line length limits. It may be useful under other platforms to simplify character-escaping. * minor : style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:31:16 +03:00
ulatekh	c8eeb93a6a	whisper : suppress tokens with a regex (#1997 ) * Allow a regular expression to describe tokens to suppress. Example: --suppress-tokens-re "[,\.]\|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens. Technique inspired by https://github.com/openai/whisper/discussions/1041 Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Blind change to fix Java test. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-09 18:27:28 +03:00
ulatekh	319fe5146e	cmake : create solution folders (#2004 ) * Create solution folders in the CMake build. * Fixed non-SDL2 build. * Fixed emscripten build.	2024-04-09 18:23:33 +03:00
Georgi Gerganov	81a3c41aa0	talk-llama : sync llama.cpp	2024-04-07 16:21:08 +03:00
ulatekh	fc366b807a	main : add command-style grammar (#1998 ) * Implemented command-style grammar in the main example. Mostly just copied the relevant parts from the command example. * main : code style --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-28 12:02:10 +02:00
Georgi Gerganov	9fb308d90f	make : add grammar parser to common objects	2024-03-28 11:59:48 +02:00
Georgi Gerganov	2948c740a2	sync : ggml (#2001 ) * sync : update scripts * sync : ggml * talk-llama : sync llama.cpp * make : WHISPER_CUBLAS -> WHISPER_CUDA * ci : try to fix sycl build * talk-llama : fix make build	2024-03-27 18:55:10 +02:00
Georgi Gerganov	1558ec5a16	whisper : improve handling of prompts (#1981 ) * whisper : improve handling of prompts * whisper : add whisper_token_count helper	2024-03-25 14:48:19 +02:00
Mohammadreza Hendiani	04e48094e4	readme : add Fedora dependencies (#1970 ) * README.md fix documentaion and added fedora liunx dependencies for stream build * fix documentaion and added fedora liunx dependencies for command build * fix documentaion and added fedora liunx dependencies for talk build * fix documentaion and added fedora liunx dependencies for talk-llama build * reverted back mistakenly removed MacOS documentaion	2024-03-20 18:42:11 +02:00
denersc	741abb162c	whisper : token-level timestamps with DTW (#1485 ) * whisper.cpp: impl dtw algo * WIP: producing and placing DTW timestamps on tokens * Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false. * Fix mistake causing incorrect alignment of dtw timestamps * implement N_TOP_MOST and CUSTOM alignment heads setting * whisper: fix typo on alignment heads enum * Fix issues related to changes in whisper.cpp * Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function * decoder: save cross QKs only if requested * Calling median filter with ggml_map_custom1 * Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads * Copying cross QKs from decoder backend correctly * dtw: cleanup * Fix incorrect n_frames passed to dtw when near end of audio * Fix aheads_masks_init for backend != CPU * whisper : minor style * main : add dtw (wip) * whisper: fix invalid memory access in aheads_masks_init * main : add dtw (cont) * whisper : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 18:25:26 +02:00
Jo Liss	e7794a868f	examples : rename --audio-context to --audio-ctx per help text (#1953 )	2024-03-18 17:53:33 +02:00
Georgi Gerganov	de4d067f1e	talk-llama : sync llama.cpp	2024-03-15 14:21:59 +02:00
slaren	f60ccfd83b	update examples and tests	2024-03-15 14:01:14 +02:00
Georgi Gerganov	2f5a5a66dd	talk-llama : use llama_decode instead of llama_eval	2024-03-08 12:04:43 +02:00
Georgi Gerganov	8e409d1113	talk-llama : sync llama.cpp	2024-03-08 11:55:50 +02:00
Georgi Gerganov	05d1b61af4	talk-llama : sync llama.cpp	2024-03-08 11:52:47 +02:00
F1L1P	2e2626b167	examples : Auto lowercase language parameter in main.cpp (#1928 ) * Auto lowercase language parameter * Update examples/main/main.cpp Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com> --------- Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2024-03-06 22:25:10 +00:00
zhouwg	c0c0ae2dea	examples : fix typo in bench.cpp (#1933 )	2024-03-06 22:21:44 +00:00
zhouwg	f22d27a385	whisper.android.java : fix returns in JNI (#1929 )	2024-03-05 15:59:26 +02:00
Georgi Gerganov	25d313b38b	talk-llama : sync llama.cpp	2024-02-28 13:04:05 +02:00
Georgi Gerganov	1711bb3881	sync : llama.cpp (ggml/0)	2024-02-28 13:00:30 +02:00
Andrew S	0d8fd8483a	stream.wasm : fix invalid memory access when no segments (#1902 ) No segments may be returned when a smaller sample buffer (EG 2048 samples) is sent to the worker.	2024-02-26 10:12:35 +02:00
Georgi Gerganov	3170841ed9	talk-llama : sync llama.cpp	2024-02-25 20:00:10 +02:00
Georgi Gerganov	578e47e70c	sync : llama.cpp (ggml/0)	2024-02-25 19:58:46 +02:00
Tamotsu Takahashi	f18738f247	talk, talk-llama : pass text_to_speak as a file (#1865 ) * talk-llama: pass file instead of arg it is too hard to quote text in a portable way * talk-llama: pass heard_ok as a file * talk-llama: let eleven-labs.py accept options Options: -v voice, -s savefile, -p (--play) * talk-llama: check installed commands in "speak" Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed * talk-llama: pass voice_id again in order to sync talk with talk-llama * talk: sync with talk-llama Passing text_to_speak as a file is safer and more portable cf. https://stackoverflow.com/a/59036879/45375 * talk and talk-llama: get all installed voices in speak.ps1 * talk and talk-llama: get voices from api * talk and talk-llama: add more options to eleven-labs.py and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/) ``` usage: eleven-labs.py [-q] [-l] [-h] [-n NAME \| -v NUMBER] [-f KEY=VAL] [-s FILE \| -p] [TEXTFILE] options: -q, --quick skip checking the required library action: TEXTFILE read the text file (default: stdin) -l, --list show the list of voices and exit -h, --help show this help and exit voice selection: -n NAME, --name NAME get a voice object by name (default: Arnold) -v NUMBER, --voice NUMBER get a voice object by number (see --list) -f KEY=VAL, --filter KEY=VAL filter voices by labels (default: "use case=narration") this option can be used multiple times filtering will be disabled if the first -f has no "=" (e.g. -f "any") output: -s FILE, --save FILE save the TTS to a file (default: audio.mp3) -p, --play play the TTS with ffplay ``` * examples: add speak_with_file() as suggested in the review * talk and talk-llama: ignore to_speak.txt	2024-02-24 09:24:47 +02:00
Abhilash Majumder	a0ddd8392c	whisper : add SYCL support (#1863 ) * add changes from llama upstream * add sycl abstraction * add sycl build * update cmake * add sycl build config * fix bug * fix bug * refactor build * fix bug * update build * call build * use sycl header * add examples * add target * fix typecast in quant.c * readd fp16 and readme * fix quant typecast * add sample * add readme * remove cxx file check	2024-02-23 09:22:24 +02:00
Georgi Gerganov	a2506909b1	talk-llama : sync llama.cpp	2024-02-22 23:30:53 +02:00
Georgi Gerganov	5fdb27ff80	ggml : 32-bit arm compat (#1891 ) * ggml : 32-bit arm compat * ggml : add ggml_vqtbl1q_s8 impl * ggml : cont	2024-02-22 18:31:40 +02:00
Georgi Gerganov	ce411498f6	sync : llama.cpp (ggml/0) ggml-ci	2024-02-22 15:12:36 +02:00
Davidson Francis	c56344b509	main : fix file existence check in main.cpp (#1889 ) In commit `dda4b0e` of PR #1872, I've introduced a check for the existence of files before loading the model. However, I haven't considered the case where whisper.cpp might read from stdin as well, and in such cases, the checks should ignore the "-" argument as it does not represent a regular file. Additionally, this commit removes the usage of 'stat()' in favor of the recently introduced function 'is_file_exist()' in common.cpp from PR #1871. Apologies for the bug introduced in the previous PR and any inconvenience it may have caused.	2024-02-22 15:01:08 +02:00
Georgi Gerganov	59119f4f20	talk-llama : sync llama.cpp	2024-02-20 12:09:57 +02:00
Georgi Gerganov	83afebe872	common : add IQ1_S (ggml/0) ggml-ci	2024-02-19 15:53:25 +02:00
Davidson Francis	dda4b0ed06	main : check if input files exist before proceeding (#1872 ) Until the most recent commit (`3d42463`), the main.cpp sample file does not check whether the input files exist or not. Consequently, the model is loaded first before reporting whether there was a failure or not when processing a file. In environments with HDD, this can take about 50 seconds or more, depending on the loaded model. This commit addresses this issue by checking in advance whether the input files exist or not.	2024-02-19 10:51:26 +02:00
Felix	07d04280be	examples : clean up common code (#1871 ) move some utility functions into common.h	2024-02-19 10:50:15 +02:00
Georgi Gerganov	551529290d	talk-llama : sync llama.cpp	2024-02-12 10:39:58 +02:00
dscripka	a6fb6ab597	examples : added audio_ctx argument to main and server (#1857 ) * added audio_ctx argument to main and server examples * Better default value Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * better default value (again) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-12 09:19:07 +02:00
Georgi Gerganov	f273e66dc6	examples : initialize context params properly (#1852 )	2024-02-11 16:39:12 +02:00
Georgi Gerganov	02b4c52c12	talk-llama : sync llama.cpp	2024-02-10 10:10:59 +02:00
Valentin Gosu	80e8a2ea39	server : allow CORS request with authorization headers (#1850 ) Whisper plugin in Obsidian requires an API key which is then sent as an authorization header. However, the presence of an authorization header requires a CORS Preflight, so both the OPTIONS method and the Access-Control-Allow-Headers: authorization must be handled.	2024-02-09 17:42:41 +02:00
Neuman Vong	19f8048139	whisper.android : how to build with CLBlast (#1809 ) * FetchContent * OpenCL * Documentation and make optional * Specify GGML build options in build.gradle * Use gradle properties * @ggerganov Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * @gpokat --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-09 17:39:05 +02:00
Georgi Gerganov	434b8f3b96	talk-llama : stream response (#1121 )	2024-02-06 19:56:12 +02:00
Georgi Gerganov	7a74e929c8	sync : ggml (#0 )	2024-01-30 21:30:26 +02:00
JacobLinCool	ae5c4f7340	common : fix wav buffer detection (#1819 )	2024-01-30 19:35:08 +02:00
JacobLinCool	baa30bacdb	server : add fields to `verbose_json` response (#1802 ) * server: include additional fields in the verbose_json response as OpenAI does * server: show request examples on home page * server: todo note for compression_ratio and no_speech_prob * server: add simple demo form to the homepage	2024-01-30 14:15:55 +02:00
Georgi Gerganov	e72e4158de	talk-llama : sync llama.cpp	2024-01-28 19:44:10 +02:00
Georgi Gerganov	52cce82493	common : fix input buffer check (#1812 )	2024-01-27 17:33:09 +02:00
Georgi Gerganov	ef3c9ed9eb	talk-llama : sync llama.cpp	2024-01-27 17:24:53 +02:00
Michael Rienstra	4bbb60efce	docs : make model options / model install methods clearer (#1806 ) * Make models more "discoverable" * Clean up code block language identifiers * make 3 options clearer * undo Prettier formatter change * docs: `$` shell prompt, consistently * docs: minor changes	2024-01-26 17:39:54 +02:00
Neuman Vong	d6b9be21d7	whisper.android : return output from benchmarks (#1785 ) Benchmarks are failing because JNI expects a jstring and the benchmarks are missing a return statement (i.e., returning null). The functions actually build a jstring but don't return it, so this seems to have been an oversight. This patch returns the jstring and now the benchmarks run successfully. Fixes #1783.	2024-01-19 16:17:38 +02:00
Ryan Hitchman	c0329acde8	server : implement "verbose_json" format with token details (#1781 ) * examples/server: implement "verbose_json" format with token details. This is intended to mirror the format of openai's Python whisper.transcribe() return values. * server: don't write WAV to a temporary file if not converting * server: use std::lock_guard instead of manual lock/unlock	2024-01-18 22:58:42 +02:00
Georgi Gerganov	1f50a7d29f	sync : llama.cpp	2024-01-17 21:23:33 +02:00
Benjamin Heiniger	f6614155e4	talk-llama : optional wake-up command and audio confirmation (#1765 ) * talk-llama: add optional wake-word detection from command * talk-llama: add optional audio confirmation before generating answer * talk-llama: fix small formatting issue in output * talk-llama.cpp: fix Windows build	2024-01-16 15:52:01 +02:00
Przemysław Pawełczyk	f5f159c320	server : fix building and simplify lib deps on Windows (#1772 ) * make : fix server example building on MSYS2 environments (Windows) It was not working since commit `eff3570f78` when server was introduced. * cmake : simplify server example lib deps on Windows server uses httplib::Server, not httplib::SSLServer, so there is no need to mention cryptographic libraries in target_link_libraries. Winsock (ws2_32) suffices here. Also use plain library names like we use in other places.	2024-01-15 15:48:13 +02:00
Georgi Gerganov	6ebba525f1	talk-llama : sync llama.cpp	2024-01-14 18:08:20 +02:00
Georgi Gerganov	2a5874441d	talk-llama : llama.cpp	2024-01-14 11:06:28 +02:00
Georgi Gerganov	d08445c9ad	sync : ggml	2024-01-14 10:55:18 +02:00
Georgi Gerganov	f001a3b7b6	talk-llama : sync llama.cpp	2024-01-14 00:13:17 +02:00
RhinoDevel	db078a9ba8	talk-llama : add optional CLI arg to set the bot name (#1764 )	2024-01-13 20:51:35 +02:00
james wolf	a13a7da5ad	examples : add python example for transcription (#1744 ) * rebase and add simple python interface * moved python files to examples/python	2024-01-13 19:37:18 +02:00
Georgi Gerganov	40ae0962f4	talk-llama : sync llama.cpp	2024-01-12 22:04:51 +02:00
George Hindle	fbcb52d3cd	server : add more parameters to server api (#1754 ) * feat(server): add more parameters to server api * fix(server): reset params to original parsed values for each request	2024-01-12 13:42:52 +02:00
George Hindle	f7908f9bb8	params : don't compute timestamps when not printing them (#1755 )	2024-01-12 13:24:38 +02:00
Georgi Gerganov	00b7a4be02	talk-llama : sync llama.cpp	2024-01-11 22:10:10 +02:00
Georgi Gerganov	32e71a1861	sync : ggml	2024-01-11 21:54:17 +02:00
Georgi Gerganov	9c857cf280	sync : llama.cpp	2024-01-11 21:50:01 +02:00
RhinoDevel	bcc1658cd0	talk-llama : add optional Piper TTS support (#1749 ) Add commented-out command to optionally use Piper (https://github.com/rhasspy/piper) as text-to-speech solution for the talk-llama example. Piper voices sound almost like real people which is a big improvement (e.g.) from something like espeak.	2024-01-10 16:15:28 +02:00
Emmanuel Schmidbauer	c46886f599	server : add request path option(#1741 )	2024-01-08 22:39:51 +00:00
Georgi Gerganov	29f78392c1	main : add cli option to disable system prints (#1740 )	2024-01-08 16:41:28 +02:00
Georgi Gerganov	022756a872	server : fix server temperature + add temperature_inc (#1729 ) * server : fix server temperature + add temperature_inc * server : change dashes to underscores in parameter names	2024-01-07 13:35:14 +02:00
Georgi Gerganov	3b8c2dff57	talk-llama : sync latest llama.cpp	2024-01-06 17:22:57 +02:00
Georgi Gerganov	ab0a8593c5	whisper.swiftui : add .gitignore	2024-01-04 15:00:27 +02:00
Tamotsu Takahashi	d87de61ae6	ci : build with CLBlast + ggml-opencl use GGML_API (#1576 ) * Build with CLBlast * Declare GGML_API After rebasing, examples/talk-llama failed: "D:\a\whisper.cpp\whisper.cpp\build\ALL_BUILD.vcxproj" (build target) (1) -> "D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj" (default target) (14) -> (Link target) -> llama.obj : error LNK2019: unresolved external symbol ggml_cl_free_data referenced in function "public: __cdecl llama_model::~llama_model(void)" (??1llama_model@@QEAA@XZ) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] llama.obj : error LNK2019: unresolved external symbol ggml_cl_transform_tensor referenced in function "public: void __cdecl llama_model_loader::load_all_data(struct ggml_context ,void (__cdecl)(float,void ),void ,struct llama_mlock *)" (?load_all_data@llama_model_loader@@QEAAXPEAUggml_context@@P6AXMPEAX@Z1PEAUllama_mlock@@@Z) [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj] D:\a\whisper.cpp\whisper.cpp\build\bin\Release\talk-llama.exe : fatal error LNK1120: 2 unresolved externals [D:\a\whisper.cpp\whisper.cpp\build\examples\talk-llama\talk-llama.vcxproj]	2023-12-29 12:23:27 +02:00
Georgi Gerganov	3a5302108d	sync : ggml (ggml_scale, ggml_row_size, etc.) (#1677 ) * sync : ggml * sync : llama.cpp * talk-llama : fix obsolete param * ggml-alloc : fix ggml_tallocr_is_own * talk.wasm : update to new ggml * ggml : fix type punning in ggml_scale * ggml : cuda jetson + arm quants warnings	2023-12-22 17:53:39 +02:00
bobqianic	d2419030b0	examples : Revert CMakeLists.txt for talk-llama (#1669 )	2023-12-21 22:48:52 +02:00
Georgi Gerganov	940de9dbe9	wchess : update README.md	2023-12-14 22:00:47 +02:00
Georgi Gerganov	375585c07c	wchess : update readme	2023-12-14 17:51:14 +02:00
fraxy-v	fd99ece8e3	wchess : whisper assisted chess (#1595 ) * wchess: whisper assisted chess * wchess: fix allowed moves in check * wchess: touchstart, touchend events * wchess: css, disabled button * wchess : html touches * wchess : minor fixes and code style * wchess : bump encoder context to 1280 * wchess : index.html * wchess : fix CI warnings * wchess : add array header * wchess : build static library * wchess : display grammar * wchess : update UX * wchess : add comment * wchess : add README --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-14 15:58:26 +02:00
Kreijstal	ec03661b20	cmake : target windows 8 or above for prefetchVirtualMemory in llama-talk (#1617 ) Since we use prefetchVirtualMemory we specify we target win 8 or above, otherwise other compilers will refuse to use the prefetchVirtualMemory api, (I understand you are loading it dynamically but the header definition has this limitation)	2023-12-12 11:35:00 +00:00
Kreijstal	6335933a5b	cmake : Fix bug in httplib.h for mingw (#1615 ) Fix bug in httlib.h for mingw, please see https://github.com/yhirose/cpp-httplib/issues/1669	2023-12-10 17:47:52 +00:00
Georgi Gerganov	9521ba6801	whisper.objc : disable timestamps for real-time transcription	2023-12-08 13:43:37 +02:00
Oleg Sidorov	3163090d89	server : pass max-len argument to the server (#1574 ) This commit fixes the missing parameter binding for max-len between the input arguments and wparams.	2023-12-05 23:01:45 +02:00
Aleksander Andrzejewski	a0ec3fac54	Server : Add support for .vtt format to Whisper server (#1578 ) - The code comes from examples/main - The output mimetype is set to text/vtt Example usage: ```shell curl 127.0.0.1:8080/inference \ -H "Content-Type: multipart/form-data" \ -F file="@samples/jfk.wav" \ -F temperature="0.2" \ -F response-format="vtt" ```	2023-11-30 23:44:26 +00:00
Oleg Sidorov	6559b538e5	server : backport .srt output format (#1565 ) This commit adds a support of .srt format to Whisper server. The code is effectively backported from examples/main. The output mimetype is set to application/x-subrip as per https://en.wikipedia.org/wiki/SubRip. Example usage: curl 127.0.0.1:8080/inference \ -H "Content-Type: multipart/form-data" \ -F file="@<file-path>" \ -F temperature="0.2" \ -F response-format="srt"	2023-11-28 15:42:58 +02:00
Kasumi	6b094b6dfe	server : set default CORS headers to allow all (#1567 )	2023-11-28 11:55:20 +02:00
Hang	641f2f4282	readme : update help (#1560 )	2023-11-27 12:04:08 +02:00
Ismatulla Mansurov	23c21e92eb	server : automatically convert audio on the server (#1539 ) * server : automatically convert audio on the server * server : remove rebundant comments * server : automatic conversion refactor * server : update server readme * server : remove unnecessary comments and tabs * server : put back remove calling * server : apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : check ffmpeg before the server lunch * server : fix indentation * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix function typo calling * server : fix function typo calling * server : add warning in readme --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-27 11:28:34 +02:00
ecneladis	a5881d619c	server : add --print-realtime param (#1541 ) * server : add --print-realtime param * Fix duplicate realtime output	2023-11-24 09:35:02 +02:00
Okabintaro	8328d1900f	fix(server): typo in temperature parameter (#1545 ) Also fixed another typo in comments.	2023-11-23 20:59:36 +02:00
Felix	5c7be85fdc	Change temp file name for server application (#1535 ) Avoid issue of removing file if it exists in the current working directory	2023-11-22 09:23:36 +01:00
Felix	9ac88f2b57	Close file after writing in server application (#1533 ) Fix of mistake leaving file open while reading it again as wav	2023-11-21 20:36:10 +01:00
Georgi Gerganov	46f5b6cb08	server : add video to readme	2023-11-21 17:30:43 +02:00
Felix	eff3570f78	server : add a REST Whisper server example with OAI-like API (#1380 ) * Add first draft of server * Added json support and base funcs for server.cpp * Add more user input via api-request also some clean up * Add reqest params and load post function Also some general clean up * Remove unused function * Add readme * Add exception handlers * Update examples/server/server.cpp * make : add server target * Add magic curl syntax Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-20 21:40:24 +02:00
Georgi Gerganov	a01b2e0971	sdl : fix audio callback (#1523 )	2023-11-20 13:16:38 +02:00
Georgi Gerganov	bebf0da983	quantize : add support for K-quant types	2023-11-16 16:18:24 +02:00
Sam Pullara	7883d1cae4	talk-llama : improve quote and backtick handling (#1364 ) * ISSUE-1329: replace " with ' so it doesn't try to execute code in backticks. * Typo * Update to keep possessives in the output Closes the ' then puts a ' in quotes then reopens the ' to escape the ' characters.	2023-11-16 10:34:05 +02:00
Georgi Gerganov	ccc85b4ff8	talk-llama : enable GPU by default	2023-11-15 21:33:00 +02:00
Georgi Gerganov	bfbaa4dce5	whisper : make large version explicit + fix data size units (#1493 )	2023-11-15 19:42:25 +02:00
Georgi Gerganov	b6c5f49b78	whisper : add batched decoding (#1486 ) * whisper : add whisper_batch * whisper : move kv_self to whisper_state * whisper : full batched decoding support * whisper : fix memory leak in whisper_batch * whisper : fix mem leak again + remove oboslete function * whisper : clear kv cache when using whisper_decode API * whisper : speed-up sampling * whisper : fix decoders initializer * bench : add batch size 5 bench * whisper : add comment about the KV cache size * whisper : add check for max number of decoders * whisper : avoid starting sampling threads with bs=1 * whisper : enable beam-search by default * cuda : sync llama.cpp fixes	2023-11-15 16:12:52 +02:00
Evan Jones	3e5c7feeff	whisper : add grammar-based sampling (#1229 ) * whisper : add grammar-based sampling * build : fix after master merge * command : fix exception when recognizing the command * whisper : fine-tuning grammar functionality * command : grammar-related improvements - option to read grammar from file - add sample grammars for colors and chess moves - fine-tune the performance further * grammars : add assistant + update comments * command : enable beam-search, add "no_timestamps", add "context", add p * whisper : remove comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-13 10:51:34 +02:00
rlapray	c23598e4ca	talk-llama : add n_gpu_layers parameter (#1475 )	2023-11-13 10:04:16 +02:00
Tong Li	54a08bde29	examples : add whisper.android.java for compatibility with older Android versions using Java (#1382 ) * save the recorded audio to a file * Alignment -help * Save the correct audio * chage to a consistent coding style * Correct typo * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Correct variable misuse * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * Update examples/stream/stream.cpp * add .bin .cxx/ .gradle/ cmake-build-debug/ to gitignore add whisper.android.java * Added support for older versions of Android of Java * add examples for android java * add README.md for android java * add fullTranscribeWithTime * 增加 toString()方法和测试 * change return type to void * update to v1.4.1 * add WhisperService * chage to whisper_full_get_segment_t1 * add method transcribeDataWithTime * modified toString ``` return "[" + start + " --> " + end + "]:" + sentence; ``` * Optimize code logic * update text view on handle * set max lines * change Chinese to English * Update bindings/java/build.gradle * Update .gitignore * add android.java to github action * chage android.java to android_java in build.yml * remove gradle * chage jdk to temurin in android_java of CI * chage jdk to temurin 11 in android_java of CI * add x to gradlew * set api-level for android_java of CI * Update examples/whisper.android.java/app/src/main/jni/whisper/CMakeLists.txt * add ndk version in build.gradle * remove local.properties * add testFullTranscribeWithTime --------- Co-authored-by: litongmacos <litongjava@qq.com> Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>	2023-11-12 18:31:58 +02:00
Georgi Gerganov	b0502836b8	whisper : add full CUDA and Metal offloading (#1472 ) * whisper : migrate to ggml-backend * whisper : fix logit reading * whisper : fix tensor allocation during load * whisper : fix beam-search with CUDA * whisper : free backends + fix compile warning * whisper : print when CUDA is enabled * whisper : fix CoreML * make : clean-up * talk : fix compile warning * whisper : support ggml_conv with CUDA and Metal (#1473) * ggml : add CUDA support for ggml_conv * whisper : remove ggml_repeat for conv bias + single backend * cuda : fix im2col kernel * metal : add im2col support + mul mat-vec f16 x f16 * bench-all : add q4 models * whisper : clean-up * quantize-all : fix * ggml : im2col opts * whisper : avoid whisper_model_data wrapper * whisper : add note that ggml_mul_mat_pad does not work with CUDA * whisper : factor out graph compute in common function * whisper : fixes * whisper : fix UB with measure buffers * whisper : try to fix the parallel whisper_state functionality (#1479) * whisper : try to fix the parallel whisper_state functionality * whisper : fix multi-state Metal * whisper : free backend instances in whisper_state	2023-11-12 15:31:08 +02:00
Jakub Ráček	37947203e6	talk-llama : add language auto detect (#1467 ) * Add '-l auto' to talk-llama example * Update examples/talk-llama/talk-llama.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-09 19:21:44 +02:00
Sindre Sorhus	d03c60dd7f	ios : add support for Swift Package Manager (#1370 ) * Add support for Swift * Make it build in Xcode * Use the SPM package in the SwiftUI example app	2023-11-07 23:53:31 +02:00

... 2 3 4 5 6 ...

602 Commits