whisper.cpp/ggml/include
Jeff Bolz b1f65a4a7e vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (llama/18295)
* vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron

Also handle GGML_OP_SCALE at the end (nemotron, deepseek2).

Fewer pipeline variants and spec constants, just use push constants.

In test_topk_moe, change exp_probs_b to be 1D, matching real networks.

Update test-backend-ops and ggml-backend to allow verifying multiple outputs
in a fusion test (topk_moe has two outputs). Previously only the final node
was verified.

* change test_topk_moe to allow results in arbitrary order

* disable sigmoid fusion for moltenvk
2026-01-14 09:11:59 +02:00
..
ggml-alloc.h llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (llama/16653) 2025-12-18 08:20:56 +02:00
ggml-backend.h vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (llama/18295) 2026-01-14 09:11:59 +02:00
ggml-blas.h
ggml-cann.h
ggml-cpp.h
ggml-cpu.h ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (llama/17951) 2025-12-18 08:20:56 +02:00
ggml-cuda.h
ggml-hexagon.h Add experimental ggml-hexagon backend for the Hexagon NPU (llama/16547) 2025-11-09 23:38:03 +02:00
ggml-metal.h
ggml-opencl.h
ggml-opt.h
ggml-rpc.h rpc : fix alloc size logic (llama/17116) 2025-12-12 17:53:18 +02:00
ggml-sycl.h
ggml-vulkan.h
ggml-webgpu.h
ggml-zdnn.h zdnn: refactor codebase + add docs (llama/16178) 2025-09-29 15:18:09 +03:00
ggml-zendnn.h ggml-zendnn : add ZenDNN backend for AMD CPUs (llama/17690) 2025-12-12 17:53:21 +02:00
ggml.h llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (llama/16653) 2025-12-18 08:20:56 +02:00
gguf.h