whisper.cpp/ggml
Chen Yuan 3fc738a8c2
ggml-webgpu: address quantization precision and backend lifecycle managment (llama/21521)
* ggml(webgpu): fix the busy-polls in Emscripten  in the waitAny after #20618, and remove the busy webgpu log

* Merge with upstream

* Fix GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants

* Update Unary wgsl EXP and EXPM1 for f16 stability

* Fix GET_ROWS IQ4_XS strcut for NaN f16 canonicalization

* Fix numerical percision for unary sqrt when working with f16

* Fix NaN canonicalization for packed integers using f16

* Update err threshold for binary div ops when using f16

* backend: Keep one Dawn/WebGPU instance alive for the lifetime of the static backend

* clean: uncomment existing code logs

* clean: clean the unncessary debug info

* Refactor and generalize dequant helpers

* Remove deprecated quant structs

* Refactor shader defines to reduce repetition

* Remove error override for F16 type

* fix: fix the accidential removal of the proper initialization of ctx

* clean: clean legacy and format code

* fix: did not modify tests ops

---------

Co-authored-by: Jeremy J. Hartmann <jeremy@mtion.tv>
2026-04-30 11:29:05 +03:00
..
cmake cmake : remove unused file (ggml/1419) 2026-02-08 09:29:10 +02:00
include ggml: backend-agnostic tensor parallelism (experimental) (llama/19378) 2026-04-30 11:29:05 +03:00
src ggml-webgpu: address quantization precision and backend lifecycle managment (llama/21521) 2026-04-30 11:29:05 +03:00
.gitignore
CMakeLists.txt ggml: backend-agnostic tensor parallelism (experimental) (llama/19378) 2026-04-30 11:29:05 +03:00