ggml-webgpu: compute pass batching and removing profiling overhead (llama/21873)

* Update register tiling matmul to use f32 accumulation

* fix profiling code

* Fix register tiling matmul for chrome, i'm blaming dawn

* Update batch tuning value for iOS

* compile fix

* Fix use of new load function

* Move to a single query set for GPU profiling

* Move to batching compute passes when not profiling

* Refactor build_multi

* remove iOS throttling now that we're batching compute passes
This commit is contained in:
Reese Levine 2026-04-16 01:12:19 -07:00 committed by Georgi Gerganov
parent f62bb13320
commit 092330b474
No known key found for this signature in database
GPG Key ID: 449E073F9DC10735
1 changed files with 349 additions and 452 deletions

File diff suppressed because it is too large Load Diff