Commit Graph

748 Commits

Author SHA1 Message Date
Georgi Gerganov f408c64564
bench : fix timings by running a pre-heat 2023-09-13 23:03:25 +03:00
Georgi Gerganov d863f725a1
coreml : add code to toggle Core ML config (CPU, ANE, GPU) 2023-09-13 22:51:10 +03:00
Georgi Gerganov d37f56e7a9
ios : update submodule 2023-09-13 21:31:29 +03:00
Georgi Gerganov 23277d21ce
readme : add Metal info 2023-09-13 20:54:03 +03:00
Georgi Gerganov ecb23fb1eb
metal : sync latest llama.cpp kernels 2023-09-13 20:44:05 +03:00
Georgi Gerganov 8e8daa8451
metal : speed-up KQ multiplication 2023-09-13 19:59:16 +03:00
Georgi Gerganov 16db4da3f1
swiftui : fix build 2023-09-13 19:49:11 +03:00
Georgi Gerganov 257d7942af
ios : add Metal support 2023-09-13 19:45:12 +03:00
Georgi Gerganov 181bb8cb28
objc : fix build (no Metal yet) 2023-09-13 18:54:41 +03:00
Georgi Gerganov 796f84cd95
whisper : add <functional> header 2023-09-13 13:35:42 +03:00
Georgi Gerganov 77f4bf49c8
cmake : update to support Metal build 2023-09-13 13:34:51 +03:00
Georgi Gerganov b6f09669a2
whisper : factor out alloc init in a function 2023-09-13 12:51:52 +03:00
Georgi Gerganov 254b687239
whisper : add whisper_allocr to wrap ggml_allocr 2023-09-13 11:58:19 +03:00
Georgi Gerganov b19888cfb4
ggml-alloc : try to make CI happy by reducing vram to 128GB 2023-09-13 11:57:46 +03:00
Georgi Gerganov 905c944143
ggml : use simpler ggml_bytes() implementation 2023-09-13 11:39:09 +03:00
Georgi Gerganov 3074a7ff14
whisper : offload the Encoder to Metal 2023-09-13 00:09:44 +03:00
Georgi Gerganov ec9a7db74c
whisper : remove ggml_repeat in the encoder 2023-09-12 20:34:32 +03:00
Georgi Gerganov cd476375b4
metal : run "cross" step on the GPU 2023-09-12 20:11:13 +03:00
Georgi Gerganov 9fdd415367
ggml : fix ggml_nbytes (probably temp solution) 2023-09-12 20:10:53 +03:00
Georgi Gerganov 79a88057bd
metal : add multi-decoder support 2023-09-12 19:33:29 +03:00
Georgi Gerganov fbc9ddc582
metal : decoder works on GPU! 2023-09-12 19:23:30 +03:00
Georgi Gerganov 3b9979a373
ci : try to debug vmem issue 2023-09-12 14:08:48 +03:00
Georgi Gerganov de94c783ee
Merge branch 'master' into metal-and-alloc 2023-09-12 14:02:43 +03:00
Georgi Gerganov 3fec2119e6
whisper : fix bench regression + fix performance when using CPU BLAS (#1275)
* whisper : fix bench regression

* ggml : use sched_yield when using BLAS + add comment
2023-09-12 13:54:04 +03:00
Georgi Gerganov d3b2dd4955
whisper : initial Metal version 2023-09-11 16:23:31 +03:00
Georgi Gerganov 4845b9ed09
whisper.android : try to fix build 2023-09-11 15:19:21 +03:00
Georgi Gerganov 2770d46ef5
whisper : refactor ggml-alloc init 2023-09-11 15:04:33 +03:00
Georgi Gerganov 4d9acc60c3
ci : see if this is causing the crash 2023-09-11 14:42:25 +03:00
Georgi Gerganov 06d1d2836b
extra : update sync-ggml.sh script to also sync ggml-alloc 2023-09-10 22:45:38 +03:00
Georgi Gerganov 9a78b72246
ios : update submodule 2023-09-10 22:36:50 +03:00
Georgi Gerganov 794e8fe0ea
build : fix ggml-alloc 2023-09-10 22:19:39 +03:00
Georgi Gerganov fa672b46e6
whisper : CoreML support ggml-alloc 2023-09-10 21:57:04 +03:00
Georgi Gerganov af6f67b251
whisper : ggml-alloc is now supported 2023-09-10 20:09:17 +03:00
Georgi Gerganov bed5ad69dd
whisper : allocate encoder and decoder using ggml-alloc 2023-09-10 19:50:34 +03:00
Georgi Gerganov 949ab6328d
whisper : factor out graph builds 2023-09-10 19:23:06 +03:00
Georgi Gerganov fbc3f8033e
metal : init 2023-09-10 18:38:34 +03:00
bobqianic 9b14418863
whisper : faster beam_search sampling via reduced KV cache copies (#1243)
* Faster `beam_search` sampling

Refine the KV cache update logic for more intelligent and efficient updating.

* Faster `whisper_sample_token_topk`

* Update whisper.cpp

* Update whisper.cpp

* Update whisper.cpp

* Reduce `memory allocation`

* Add `pointer swapping`

* Fixed some bugs

* Update whisper.cpp

* Apply suggestions from code review

* Updated the logic for determining `two-copy`

* Updated the logic for determining `two-copy` v2

* whisper : add debug logs + coding style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-10 16:04:27 +03:00
Nicholas Albion 6ddc727fac
java : fixed signing of java artifact using gradle (#1267)
* --stacktrace signMavenJavaPublication

* added temporary step "Debug gradle signing"

* cd bindings/java

* use GPG_PRIVATE_KEY and GPG_PASSPHRASE

* use secrets.GPG_PRIVATE_KEY and GPG_PASSPHRASE
2023-09-09 18:55:51 +03:00
Georgi Gerganov acb5278cc8
ci : try to fix gradle action (#1265) 2023-09-08 20:50:15 +03:00
Georgi Gerganov 0839209cab
gitignore : update 2023-09-08 19:45:28 +03:00
Georgi Gerganov b39809668a
sync : ggml (HBM + Metal + style) (#1264) 2023-09-08 17:58:31 +03:00
Georgi Gerganov 3e9edc6845
ci : upgrade gradle to 2.4.2 (#1263)
* ci : upgrade gradle to 2.4.2

* cmake : add comment (#1129)
2023-09-08 17:58:14 +03:00
Georgi Gerganov bfc73f1fa2
sync : ggml (CUDA faster rope) 2023-09-08 15:01:26 +03:00
Georgi Gerganov f00c9bba33
cmake : noramlize case (#1129) 2023-09-08 14:50:03 +03:00
Przemysław Pawełczyk b55b505690
build : do not use _GNU_SOURCE gratuitously (#1129)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build whisper.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions,
plus some stuff from BSD that is not specified in POSIX.1.

Well, that was true until NUMA support was added recently in ggml,
so enable GNU libc extensions for Linux builds to cover that.

There is no need to penalize musl libc which simply follows standards.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
minimal FTM (_XOPEN_SOURCE=600) or other FTM depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* examples : include SDL headers before other headers

Avoid macOS build error when _DARWIN_C_SOURCE is not defined, brought by
SDL2 relying on Darwin extension memset_pattern4/8/16 (from string.h).

* make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK

* make : use BSD-specific FTMs to enable alloca on BSDs

* make : fix OpenBSD build by exposing newer POSIX definitions

* cmake : follow recent FTM improvements from Makefile
2023-09-07 12:36:14 +03:00
Georgi Gerganov 2818de21ff
examples : fix build + compile warnings (close #1256) 2023-09-07 12:33:12 +03:00
Neil Chudleigh aed5d40607
models : add quantum models to download-ggml-model.sh (#1235)
* Add quantized models to download-ggml-model.sh

* Update names in download-ggml-model script to normalized
2023-09-07 12:16:58 +03:00
Digipom afa5477d1c
whisper.android : bump gradle plugin and dependencies + a lint pass (#1255) 2023-09-07 12:15:59 +03:00
Nicholas Albion 01fcd42431 sign jar for Maven Central repo 2023-09-07 11:45:44 +10:00
Digipom f990610776
whisper.android : address ARM's big.LITTLE arch by checking cpu info (#1254)
Addresses https://github.com/ggerganov/whisper.cpp/issues/1248
2023-09-06 18:32:30 +03:00