whisper.cpp/ggml
Aman Gupta 41e578ec8a CUDA: experimental native mxfp4 support for blackwell (llama/17906)
* CUDA: experimental native mxfp4 support for blackwell

* optimize load_tiles

* optimize quantize_mxfp4

* cleanup

* first pass review: formatting

* use interleaved layout for mma

* mmq: add assert for size

* use __nv_fp4x4_e2m1

* use iter_k as 512, cleanup

* Use 1200 as blackwell instead of 1000

* address review comments

* mmq: fix stride

* quantize.cu: use reference impl of e8m0 scale

* address review comments

* add 120f-virtual + minor fixes

---------

Co-authored-by: Aman Gupta <aman>
2025-12-31 17:52:09 +02:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) 2025-08-18 20:30:45 +03:00
include llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (llama/16653) 2025-12-18 08:20:56 +02:00
src CUDA: experimental native mxfp4 support for blackwell (llama/17906) 2025-12-31 17:52:09 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (llama/17977) 2025-12-31 17:52:09 +02:00