whisper.cpp/ggml
Jeff Bolz aedf332ec5 vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (llama/18678)
This fixes incoherent output in Llama-4-Maverick-17B-128E-PAB-Q8_0, which
has a mul_mat_id with an A matrix that's Q8_0 8192 x 5120 x 128.

This should work when the number of blocks in the A matrix is less than 2^32
(for mul_mat_vec or mul_mm_cm2), or for mul_mm I think the limit is like
2^32*LOAD_VEC_A elements.

- Divide batch_stride by QUANT_K earlier, so the block index calculation works in 32b.
- Each vk_pipeline_struct has a linked list of pipelines that will allow it to handle
variants. So far this change just adds a single use case for this, compiling with the
e64BitIndexingEXT flag.
- Use the 64b indexing variant when the A matrix is larger than maxStorageBufferRange.

64-bit indexing has some cost - around 3-5% in MoE models, so it's worth the effort
to avoid enabling it unconditionally.
2026-01-14 09:11:59 +02:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) 2025-08-18 20:30:45 +03:00
include ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (llama/18628) 2026-01-14 09:11:59 +02:00
src vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (llama/18678) 2026-01-14 09:11:59 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : bump version to 0.9.5 (ggml/1410) 2025-12-31 18:27:20 +02:00