Commit Graph

459 Commits

Author SHA1 Message Date
Georgi Gerganov c65d0fd3c8 talk-llama : sync llama.cpp 2024-11-01 10:19:05 +02:00
Rotem Dan b6049060dd
whisper : add dtw preset for large-v3-turbo (#2481) 2024-10-15 21:00:21 +03:00
Georgi Gerganov 6e40108a59 objc : fix build 2024-10-05 15:23:51 +03:00
Georgi Gerganov 941912467d whisper : adapt to latest ggml (skip) (#0) 2024-10-05 15:23:51 +03:00
Rahul Vadhyar 2944cb72d9
examples : update dr_wav.h to newer version (#2449) 2024-10-04 11:04:51 +03:00
Georgi Gerganov ccc2547210 talk-llama : sync llama.cpp 2024-10-03 12:22:17 +03:00
gilbertgong ede1718f6d
server : ffmpeg overwrite leftover temp file (#2431)
* Remove possible leftover ffmpeg temp file from a previous failed conversion

* Revert "Remove possible leftover ffmpeg temp file from a previous failed conversion"

This reverts commit 00797403bd.

* Flag to force ffmpeg to overwrite output file if it exists
2024-10-02 15:06:40 +03:00
Georgi Gerganov 2ef717b293
whisper : add large-v3-turbo (#2440) 2024-10-01 15:57:06 +03:00
Georgi Gerganov 451e9ee92c make : remove "talk" target until updated 2024-09-24 19:45:08 +03:00
Georgi Gerganov fe18c29ab8 talk-llama : sync llama.cpp 2024-09-24 19:45:08 +03:00
Georgi Gerganov 54e5095765 examples : adapt to ggml.h changes (ggml/0)
ggml-ci
2024-09-24 19:45:08 +03:00
Toliver 5b1ce40fa8
server : use OS-generated temp file name for converted files (#2419) 2024-09-17 15:56:32 +03:00
UsernamesLame 9600fc3eb1
readme : remove invalid flag from Python example (#2396)
* Update README.md

Fix broken C-style API link

* Update whisper_processor.py

Update examples/python/whisper_processor.py to remove nonexistent flag "-np" from subprocess.Popen call.

* Add pywhispercpp to the Pybind11 Python wrapper list

abdeladim-s/pywhispercpp wasn't added to the list / was removed at some point (?)

It was referenced in issue #9, so I feel like it's worthy of being added as it's the first if not one of the first Python wrappers for whisper.cpp
2024-08-30 14:00:38 +03:00
Georgi Gerganov da9809f243 talk-llama : sync llama.cpp 2024-08-28 13:22:20 +03:00
Justine Tunney 7f78675008
examples : use colorblind friendly TTY color scheme (#2360)
This change updates the -pc flag, so that a new xterm256 color scheme is
used. This color scheme is believed to be better for three reasons:

1. It should be friendlier to the colorblind. The scheme was designed by
   Paul Tol (see: https://personal.sron.nl/~pault/). TensorBoard uses it
   since 2017, so it's already popular in the machine learning community

2. It should appear to be the same colors as before to people who aren't
   i.e. it's still a red-green spectrum like before but lightly modified

3. It is readable in both white and black background terminals. The neon
   colors before were probably a bit too intense for white backgrounds.
2024-08-20 10:49:10 +03:00
Georgi Gerganov 58323bf8ed build : fix aarch64 (#0) 2024-08-08 22:48:46 +03:00
Georgi Gerganov 22058f2dbc talk-llama : sync llama.cpp 2024-08-08 22:48:46 +03:00
Georgi Gerganov c7ea4fd235 common : handle new quant types (ggml/0) 2024-08-08 22:48:46 +03:00
Georgi Gerganov dbf9c15e30 talk-llama : sync llama.cpp 2024-07-08 14:53:55 +03:00
Georgi Gerganov d3f6c34976 examples : fix compile warnings [no ci] (#0) 2024-07-08 14:53:55 +03:00
Emmanuel Schmidbauer bec9836849
server : add inference path to make OAI API compatible (#2270) 2024-07-08 14:24:58 +03:00
Georgi Gerganov 4a62efbb95
cmake : minor fixes 2024-06-26 21:42:39 +03:00
Georgi Gerganov dc8cc2dd6f
whisper : disable CUDA mel + fix FFMPEG 2024-06-26 20:11:38 +03:00
Georgi Gerganov e30c679928
whisper : reorganize source code + improve CMake (#2256)
* scripts : update sync [no ci]

* files : reorganize [no ci]

* sync : llama.cpp

* cmake : link math library

* cmake : build normal ggml library

* files : move headers to include

* objc : fix path to ggml-metal.h

* ci : fix WHISPER_CUDA -> GGML_CUDA

* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00
Georgi Gerganov e293f17d34
talk-llama : sync llama.cpp 2024-06-18 09:45:37 +03:00
slaren de29b193f6 move BLAS to a separate backend (cont) (llama/6210)
ggml-ci
2024-06-18 09:39:40 +03:00
Georgi Gerganov 3b1ac03828 ggml : remove OpenCL (#0) 2024-06-16 18:19:48 +03:00
Georgi Gerganov 061eeb9f61 talk-llama : sync llama.cpp 2024-06-16 18:19:48 +03:00
Borislav Stanimirov af5833e298
whisper : remove `speed_up` and `phase_vocoder*` functions (#2198)
* whisper : fix cast warning

* whisper : remove phase_vocoder functions, ref #2195

* whisper : remove speed_up from whisper_full_params, closes #2195
2024-05-31 11:37:29 +03:00
Daniel Valdivia a7dc2aab16
server : fix typo (#2181)
A simple comment typo, PR can be dismissed
2024-05-25 10:46:22 +03:00
William Tambellini 1b51fdf170
examples : add support for decoding input with ffmpeg (Linux) (#2133)
- search for ffmpeg libs/headers at cmake time
- added ffmpeg-transcode.cpp into libcommon if ffmpeg on
- hooked ffmpeg trancoding in common read_wav(...)
- passed test:
./main -m ggml-base.en.bin -f samples/jfk.mp3
2024-05-21 18:31:41 +03:00
Pedro Probst adee3f9c1f
node : add flash_attn param (#2170) 2024-05-20 09:08:48 +03:00
Georgi Gerganov 7094ea5e75
whisper : use flash attention (#2152)
* whisper : use flash attention in the encoder

* whisper : add kv_pad

* whisper : remove extra backend instance (huh?)

* whisper : use FA for cross-attention

* whisper : use FA for self-attention

* whisper : simplify encoder FA

* whisper : add flash_attn runtime parameter

* scripts : add bench log

* scripts : add M1 Pro bench log
2024-05-15 09:38:19 +03:00
petterreinholdtsen 9d5771ae43
talk-llama : reject runs without required arguments (#2153)
* Extended talk-llama example to reject runs without required arguments.

Print warning and exit if models are not specified on the command line.

* Update examples/talk-llama/talk-llama.cpp

* Update examples/talk-llama/talk-llama.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-14 21:32:41 +03:00
Georgi Gerganov 4ef8d9f44e
server : return utf-8 (#2138) 2024-05-13 15:33:46 +03:00
Pedro Probst 3928dbd206
node : add audio_ctx and audio buffer params (#2123)
* node : add audio_ctx param

* node : support passing audio buffer directly

* node : parse audio_ctx in index.js

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-13 15:22:23 +03:00
valVk 30f73109b8
node : add additional params (#2000)
* Add additional params to addon.node

* Add comma_in_time as parameter

* Fix tests
2024-05-13 15:15:43 +03:00
Mark Karpelès 17fa62d3d3
js : remove un-needed request header from fetchRemote (#2119) 2024-05-13 15:13:19 +03:00
Daniel Ziegenberg 0bb05b113d
main : dont print timings with --no-prints (#2108)
Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
2024-05-13 15:00:19 +03:00
Daniel Ziegenberg f141b2b938
main : add options for temperature control (#2088)
Add two options:

```
-tp,       --temperature N     [0.00   ] The sampling temperature, between 0 and 1
-tpi,      --temperature-inc N [0.20   ] The increment of temperature, between 0 and 1
```

The sampling temperature, between 0 and 1. Higher values like 0.8 will
make the output more random, while lower values like 0.2 will make it
more focused and deterministic. If set to 0, the model will use log
probability to automatically increase the temperature until certain
thresholds are hit.

Signed-off-by: Daniel Ziegenberg <daniel@ziegenberg.at>
2024-05-13 14:59:44 +03:00
zhangjixiong e93081f83f
whisper.android : update example, add field to print timestamp (#2072) 2024-05-13 14:30:03 +03:00
Xingchen Song(宋星辰) b6bbce4ae9
cmake : fix json INTERFACE library (#2069) 2024-05-13 14:29:39 +03:00
mashizora 7705dc52da
main : fix double quote escaping in csv output (#2090) 2024-05-13 11:55:32 +03:00
Georgi Gerganov 3fa7d29876 talk-llama : sync llama.cpp 2024-05-13 11:02:26 +03:00
Georgi Gerganov accada542a ggml : resolve merge (ggml/0)
ggml-ci
2024-05-13 11:02:26 +03:00
Pedro Probst 58210d6a76
examples : fix node compilation (#2115)
* node : fix compilation and update examples

* node : fix readme

* Update addon.node test
2024-05-02 22:52:55 +01:00
Georgi Gerganov b0c3cbf2e8
main : pass nullptr when regex is empty (#2070) 2024-04-17 12:23:47 +03:00
Emmanuel Schmidbauer 9fab28135c
server : add dtw (#2044)
* server.cpp: add dtw

* Update examples/server/server.cpp

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-15 22:16:58 +03:00
Pedro Probst 1b5439a6c2
node : support no timestamps (#2048)
* fix: node: do not compute timestamps if you do not need them

* feat: add no_timestamps parameter to node addon
2024-04-15 20:03:34 +03:00
Kendrick Taylor 5c554c04ff
whisper.nvim : fix missing reference to "model" variable (#2049) 2024-04-15 19:41:28 +03:00
Ikko Eltociear Ashimine c383f091a1
whisper : update grammar-parser.cpp (#2058)
preceeding -> preceding
2024-04-15 19:40:27 +03:00
ulatekh c15b4cda7d
common : fix file-handle leak in read_wav() (#2026)
Now it cleans up in case of error.
2024-04-09 18:34:34 +03:00
Rotem Dan d3cfb6ca2b
main : set stdin to binary mode on Windows (#2025) 2024-04-09 18:33:32 +03:00
ulatekh 671b4bde6c
main : allow a response-file as the sole parameter (#2019)
* The "main" example now allows a response-file as the sole parameter.

A response-file is a text file with command-line parameters, one per line.
Prefix the name of the response-file with "@" to identify it as such.
It's used under MS Windows to work around command-line length limits.
It may be useful under other platforms to simplify character-escaping.

* minor : style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-09 18:31:16 +03:00
ulatekh c8eeb93a6a
whisper : suppress tokens with a regex (#1997)
* Allow a regular expression to describe tokens to suppress.

Example: --suppress-tokens-re "[,\.]|[ ]?[0-9]+" will suppress commas, periods, and numeric tokens.

Technique inspired by https://github.com/openai/whisper/discussions/1041

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Blind change to fix Java test.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-09 18:27:28 +03:00
ulatekh 319fe5146e
cmake : create solution folders (#2004)
* Create solution folders in the CMake build.

* Fixed non-SDL2 build.

* Fixed emscripten build.
2024-04-09 18:23:33 +03:00
Georgi Gerganov 81a3c41aa0
talk-llama : sync llama.cpp 2024-04-07 16:21:08 +03:00
ulatekh fc366b807a
main : add command-style grammar (#1998)
* Implemented command-style grammar in the main example.

Mostly just copied the relevant parts from the command example.

* main : code style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-28 12:02:10 +02:00
Georgi Gerganov 9fb308d90f
make : add grammar parser to common objects 2024-03-28 11:59:48 +02:00
Georgi Gerganov 2948c740a2
sync : ggml (#2001)
* sync : update scripts

* sync : ggml

* talk-llama : sync llama.cpp

* make : WHISPER_CUBLAS -> WHISPER_CUDA

* ci : try to fix sycl build

* talk-llama : fix make build
2024-03-27 18:55:10 +02:00
Georgi Gerganov 1558ec5a16
whisper : improve handling of prompts (#1981)
* whisper : improve handling of prompts

* whisper : add whisper_token_count helper
2024-03-25 14:48:19 +02:00
Mohammadreza Hendiani 04e48094e4
readme : add Fedora dependencies (#1970)
* README.md

fix documentaion and added fedora liunx dependencies for stream build

* fix documentaion and added fedora liunx dependencies for command build

* fix documentaion and added fedora liunx dependencies for talk build

* fix documentaion and added fedora liunx dependencies for talk-llama build

* reverted back mistakenly removed MacOS documentaion
2024-03-20 18:42:11 +02:00
denersc 741abb162c
whisper : token-level timestamps with DTW (#1485)
* whisper.cpp: impl dtw algo

* WIP: producing and placing DTW timestamps on tokens

* Fix compile and assertion errors. Attempt to DTW timestamp with single_segment=false.

* Fix mistake causing incorrect alignment of dtw timestamps

* implement N_TOP_MOST and CUSTOM alignment heads setting

* whisper: fix typo on alignment heads enum

* Fix issues related to changes in whisper.cpp

* Fixed excessive memory use when using DTW timestamps. Other minor fixes to DTW timestamping function

* decoder: save cross QKs only if requested

* Calling median filter with ggml_map_custom1

* Reimpl aheads n_top_most and custom. Sanity checks on chosen aheads

* Copying cross QKs from decoder backend correctly

* dtw: cleanup

* Fix incorrect n_frames passed to dtw when near end of audio

* Fix aheads_masks_init for backend != CPU

* whisper : minor style

* main : add dtw (wip)

* whisper: fix invalid memory access in aheads_masks_init

* main : add dtw (cont)

* whisper : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-20 18:25:26 +02:00
Jo Liss e7794a868f
examples : rename --audio-context to --audio-ctx per help text (#1953) 2024-03-18 17:53:33 +02:00
Georgi Gerganov de4d067f1e
talk-llama : sync llama.cpp 2024-03-15 14:21:59 +02:00
slaren f60ccfd83b
update examples and tests 2024-03-15 14:01:14 +02:00
Georgi Gerganov 2f5a5a66dd
talk-llama : use llama_decode instead of llama_eval 2024-03-08 12:04:43 +02:00
Georgi Gerganov 8e409d1113
talk-llama : sync llama.cpp 2024-03-08 11:55:50 +02:00
Georgi Gerganov 05d1b61af4
talk-llama : sync llama.cpp 2024-03-08 11:52:47 +02:00
F1L1P 2e2626b167
examples : Auto lowercase language parameter in main.cpp (#1928)
* Auto lowercase language parameter

* Update examples/main/main.cpp

Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>

---------

Co-authored-by: bobqianic <129547291+bobqianic@users.noreply.github.com>
2024-03-06 22:25:10 +00:00
zhouwg c0c0ae2dea
examples : fix typo in bench.cpp (#1933) 2024-03-06 22:21:44 +00:00
zhouwg f22d27a385
whisper.android.java : fix returns in JNI (#1929) 2024-03-05 15:59:26 +02:00
Georgi Gerganov 25d313b38b
talk-llama : sync llama.cpp 2024-02-28 13:04:05 +02:00
Georgi Gerganov 1711bb3881
sync : llama.cpp (ggml/0) 2024-02-28 13:00:30 +02:00
Andrew S 0d8fd8483a
stream.wasm : fix invalid memory access when no segments (#1902)
No segments may be returned when a smaller sample buffer (EG 2048 samples) is sent to the worker.
2024-02-26 10:12:35 +02:00
Georgi Gerganov 3170841ed9
talk-llama : sync llama.cpp 2024-02-25 20:00:10 +02:00
Georgi Gerganov 578e47e70c
sync : llama.cpp (ggml/0) 2024-02-25 19:58:46 +02:00
Tamotsu Takahashi f18738f247
talk, talk-llama : pass text_to_speak as a file (#1865)
* talk-llama: pass file instead of arg

it is too hard to quote text in a portable way

* talk-llama: pass heard_ok as a file

* talk-llama: let eleven-labs.py accept options

Options: -v voice, -s savefile, -p (--play)

* talk-llama: check installed commands in "speak"

Pass "-q" to eleven-labs.py to skip checking whether elevenlabs is installed

* talk-llama: pass voice_id again

in order to sync talk with talk-llama

* talk: sync with talk-llama

Passing text_to_speak as a file is safer and more portable
cf. https://stackoverflow.com/a/59036879/45375

* talk and talk-llama: get all installed voices in speak.ps1

* talk and talk-llama: get voices from api

* talk and talk-llama: add more options to eleven-labs.py

and remove DEFAULT_VOICE because it is deprecated (https://www.reddit.com/r/ElevenLabs/comments/1830abt/what_happened_to_bella/)

```
usage: eleven-labs.py [-q] [-l] [-h] [-n NAME | -v NUMBER] [-f KEY=VAL] [-s FILE | -p] [TEXTFILE]

options:
  -q, --quick           skip checking the required library

action:
  TEXTFILE              read the text file (default: stdin)
  -l, --list            show the list of voices and exit
  -h, --help            show this help and exit

voice selection:
  -n NAME, --name NAME  get a voice object by name (default: Arnold)
  -v NUMBER, --voice NUMBER
                        get a voice object by number (see --list)
  -f KEY=VAL, --filter KEY=VAL
                        filter voices by labels (default: "use case=narration")
                        this option can be used multiple times
                        filtering will be disabled if the first -f has no "=" (e.g. -f "any")

output:
  -s FILE, --save FILE  save the TTS to a file (default: audio.mp3)
  -p, --play            play the TTS with ffplay
```

* examples: add speak_with_file()

as suggested in the review

* talk and talk-llama: ignore to_speak.txt
2024-02-24 09:24:47 +02:00
Abhilash Majumder a0ddd8392c
whisper : add SYCL support (#1863)
* add changes from llama upstream

* add sycl abstraction

* add sycl build

* update cmake

* add sycl build config

* fix bug

* fix bug

* refactor build

* fix bug

* update build

* call build

* use sycl header

* add examples

* add target

* fix typecast in quant.c

* readd fp16 and readme

* fix quant typecast

* add sample

* add readme

* remove cxx file check
2024-02-23 09:22:24 +02:00
Georgi Gerganov a2506909b1
talk-llama : sync llama.cpp 2024-02-22 23:30:53 +02:00
Georgi Gerganov 5fdb27ff80
ggml : 32-bit arm compat (#1891)
* ggml : 32-bit arm compat

* ggml : add ggml_vqtbl1q_s8 impl

* ggml : cont
2024-02-22 18:31:40 +02:00
Georgi Gerganov ce411498f6
sync : llama.cpp (ggml/0)
ggml-ci
2024-02-22 15:12:36 +02:00
Davidson Francis c56344b509
main : fix file existence check in main.cpp (#1889)
In commit dda4b0e of PR #1872, I've introduced a check for the
existence of files before loading the model. However, I haven't
considered the case where whisper.cpp might read from stdin as well,
and in such cases, the checks should ignore the "-" argument as it
does not represent a regular file.

Additionally, this commit removes the usage of 'stat()' in favor of
the recently introduced function 'is_file_exist()' in common.cpp from
PR #1871.

Apologies for the bug introduced in the previous PR and any
inconvenience it may have caused.
2024-02-22 15:01:08 +02:00
Georgi Gerganov 59119f4f20
talk-llama : sync llama.cpp 2024-02-20 12:09:57 +02:00
Georgi Gerganov 83afebe872
common : add IQ1_S (ggml/0)
ggml-ci
2024-02-19 15:53:25 +02:00
Davidson Francis dda4b0ed06
main : check if input files exist before proceeding (#1872)
Until the most recent commit (3d42463), the main.cpp sample file does
not check whether the input files exist or not. Consequently, the
model is loaded first before reporting whether there was a failure or
not when processing a file. In environments with HDD, this can take
about 50 seconds or more, depending on the loaded model.

This commit addresses this issue by checking in advance whether the
input files exist or not.
2024-02-19 10:51:26 +02:00
Felix 07d04280be
examples : clean up common code (#1871)
move some utility functions into common.h
2024-02-19 10:50:15 +02:00
Georgi Gerganov 551529290d
talk-llama : sync llama.cpp 2024-02-12 10:39:58 +02:00
dscripka a6fb6ab597
examples : added audio_ctx argument to main and server (#1857)
* added audio_ctx argument to main and server examples

* Better default value

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* better default value (again)

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-12 09:19:07 +02:00
Georgi Gerganov f273e66dc6
examples : initialize context params properly (#1852) 2024-02-11 16:39:12 +02:00
Georgi Gerganov 02b4c52c12
talk-llama : sync llama.cpp 2024-02-10 10:10:59 +02:00
Valentin Gosu 80e8a2ea39
server : allow CORS request with authorization headers (#1850)
Whisper plugin in Obsidian requires an API key which is
then sent as an authorization header.
However, the presence of an authorization header requires
a CORS Preflight, so both the OPTIONS method and
the Access-Control-Allow-Headers: authorization must be
handled.
2024-02-09 17:42:41 +02:00
Neuman Vong 19f8048139
whisper.android : how to build with CLBlast (#1809)
* FetchContent

* OpenCL

* Documentation and make optional

* Specify GGML build options in build.gradle

* Use gradle properties

* @ggerganov

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* @gpokat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-09 17:39:05 +02:00
Georgi Gerganov 434b8f3b96
talk-llama : stream response (#1121) 2024-02-06 19:56:12 +02:00
Georgi Gerganov 7a74e929c8
sync : ggml (#0) 2024-01-30 21:30:26 +02:00
JacobLinCool ae5c4f7340
common : fix wav buffer detection (#1819) 2024-01-30 19:35:08 +02:00
JacobLinCool baa30bacdb
server : add fields to `verbose_json` response (#1802)
* server: include additional fields in the verbose_json response as OpenAI does

* server: show request examples on home page

* server: todo note for compression_ratio and no_speech_prob

* server: add simple demo form to the homepage
2024-01-30 14:15:55 +02:00
Georgi Gerganov e72e4158de
talk-llama : sync llama.cpp 2024-01-28 19:44:10 +02:00
Georgi Gerganov 52cce82493
common : fix input buffer check (#1812) 2024-01-27 17:33:09 +02:00
Georgi Gerganov ef3c9ed9eb
talk-llama : sync llama.cpp 2024-01-27 17:24:53 +02:00