whisper.cpp/ggml/src/ggml-cuda
Yoshi_likes_e4 9828caafb5
Add a warning for special devices (llama/15563)
* Add warning

* Print the devices names

* Add newlines

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* Fix vector names

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-09-20 13:42:42 +03:00
..
template-instances llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
vendors CUDA: MoE helper in device code, better tile sizes (llama/15525) 2025-09-20 13:42:41 +03:00
CMakeLists.txt CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433) 2025-09-20 13:42:38 +03:00
acc.cu llama/ggml: add LLM training support (llama/10544) 2025-05-13 13:59:21 +03:00
acc.cuh
add-id.cu musa: fix build warnings (llama/15258) 2025-09-20 13:42:38 +03:00
add-id.cuh llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
arange.cu
arange.cuh
argmax.cu
argmax.cuh
argsort.cu
argsort.cuh
binbcast.cu
binbcast.cuh
clamp.cu
clamp.cuh
common.cuh CUDA: MoE helper in device code, better tile sizes (llama/15525) 2025-09-20 13:42:41 +03:00
concat.cu musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611) 2025-03-31 14:56:53 +03:00
concat.cuh
conv-transpose-1d.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
conv-transpose-1d.cuh
conv2d-dw.cu CUDA: add conv_2d_dw (llama/14265) 2025-06-21 07:34:17 +03:00
conv2d-dw.cuh CUDA: add conv_2d_dw (llama/14265) 2025-06-21 07:34:17 +03:00
conv2d-transpose.cu CUDA: add conv_2d_transpose (llama/14287) 2025-06-21 07:34:17 +03:00
conv2d-transpose.cuh CUDA: add conv_2d_transpose (llama/14287) 2025-06-21 07:34:17 +03:00
convert.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
convert.cuh HIP: Cleanup hipification header (llama/15285) 2025-08-18 20:30:45 +03:00
count-equal.cu
count-equal.cuh
cp-async.cuh CUDA: FA support for Deepseek (Ampere or newer) (llama/13306) 2025-05-13 13:59:21 +03:00
cpy-utils.cuh HIP: Cleanup hipification header (llama/15285) 2025-08-18 20:30:45 +03:00
cpy.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
cpy.cuh ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (llama/12970) 2025-04-24 20:39:16 +03:00
cross-entropy-loss.cu CUDA: add dynamic shared mem to softmax, refactor general usage (llama/14497) 2025-07-12 19:23:56 +03:00
cross-entropy-loss.cuh
dequantize.cuh CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433) 2025-09-20 13:42:38 +03:00
diagmask.cu
diagmask.cuh
fattn-common.cuh CUDA: refactor FA support/selection code (llama/15454) 2025-09-20 13:42:38 +03:00
fattn-mma-f16.cuh musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
fattn-tile-f16.cu CUDA: fix half2 -> half conversion for HIP (llama/15529) 2025-09-20 13:42:40 +03:00
fattn-tile-f16.cuh
fattn-tile-f32.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
fattn-tile-f32.cuh
fattn-vec-f16.cuh musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
fattn-vec-f32.cuh musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
fattn-wmma-f16.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
fattn-wmma-f16.cuh
fattn.cu CUDA: refactor FA support/selection code (llama/15454) 2025-09-20 13:42:38 +03:00
fattn.cuh CUDA: refactor FA support/selection code (llama/15454) 2025-09-20 13:42:38 +03:00
getrows.cu CUDA: replace GGML_CUDA_F16 with CUDA arch checks (llama/15433) 2025-09-20 13:42:38 +03:00
getrows.cuh CUDA: batched+noncont MMQ, refactor bs>1 MoE code (llama/13199) 2025-05-01 13:29:02 +03:00
ggml-cuda.cu Add a warning for special devices (llama/15563) 2025-09-20 13:42:42 +03:00
gla.cu
gla.cuh
im2col.cu llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
im2col.cuh
mean.cu cuda : fix GGML_CUDA_GRAPHS=OFF (llama/15300) 2025-08-18 20:30:45 +03:00
mean.cuh CUDA: add mean operation (llama/14313) 2025-07-01 17:54:53 +03:00
mma.cuh musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
mmf.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
mmf.cuh CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131) 2025-08-18 20:30:45 +03:00
mmq.cu CUDA: MoE helper in device code, better tile sizes (llama/15525) 2025-09-20 13:42:41 +03:00
mmq.cuh CUDA: MoE helper in device code, better tile sizes (llama/15525) 2025-09-20 13:42:41 +03:00
mmvf.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
mmvf.cuh CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131) 2025-08-18 20:30:45 +03:00
mmvq.cu musa: add GGML_UNUSED_VARS (llama/15446) 2025-09-20 13:42:38 +03:00
mmvq.cuh CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014) 2025-04-24 20:39:16 +03:00
norm.cu CUDA: add fused rms norm (llama/14800) 2025-07-28 13:02:32 +03:00
norm.cuh CUDA: add fused rms norm (llama/14800) 2025-07-28 13:02:32 +03:00
opt-step-adamw.cu
opt-step-adamw.cuh
opt-step-sgd.cu finetune: SGD optimizer, more CLI args (llama/13873) 2025-08-18 20:30:45 +03:00
opt-step-sgd.cuh finetune: SGD optimizer, more CLI args (llama/13873) 2025-08-18 20:30:45 +03:00
out-prod.cu
out-prod.cuh
pad.cu musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (llama/12611) 2025-03-31 14:56:53 +03:00
pad.cuh
pad_reflect_1d.cu cuda : add Pad Reflect 1D support (llama/14659) 2025-09-20 13:42:39 +03:00
pad_reflect_1d.cuh cuda : add Pad Reflect 1D support (llama/14659) 2025-09-20 13:42:39 +03:00
pool2d.cu
pool2d.cuh
quantize.cu CUDA: fix crash on large batch size for quant. MoE (llama/13537) 2025-05-19 14:58:39 +03:00
quantize.cuh CUDA: batched+noncont MMQ, refactor bs>1 MoE code (llama/13199) 2025-05-01 13:29:02 +03:00
reduce_rows.cuh musa: fix build warnings (llama/15258) 2025-09-20 13:42:38 +03:00
roll.cu CUDA: add roll (llama/14919) 2025-08-18 20:30:45 +03:00
roll.cuh CUDA: add roll (llama/14919) 2025-08-18 20:30:45 +03:00
rope.cu cuda : fix rope with partial rotation and non-cont src (llama/14580) 2025-07-12 19:23:56 +03:00
rope.cuh
scale.cu ggml : add ggml_scale_bias (llama/14417) 2025-07-12 19:23:56 +03:00
scale.cuh
set-rows.cu HIP: Cleanup hipification header (llama/15285) 2025-08-18 20:30:45 +03:00
set-rows.cuh CUDA: add set rows for f32 and f16 (llama/14551) 2025-07-20 00:23:50 +03:00
softcap.cu cuda : add softcap fusion (llama/14907) 2025-08-18 20:30:45 +03:00
softcap.cuh cuda : add softcap fusion (llama/14907) 2025-08-18 20:30:45 +03:00
softmax.cu llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
softmax.cuh
ssm-conv.cu model : support LiquidAI LFM2 hybrid family (llama/14620) 2025-07-12 19:23:56 +03:00
ssm-conv.cuh ggml : faster ssm scan (llama/10558) 2025-04-02 15:51:57 +03:00
ssm-scan.cu cuda: refactored ssm_scan and use CUB (llama/13291) 2025-08-18 20:30:45 +03:00
ssm-scan.cuh ggml : faster ssm scan (llama/10558) 2025-04-02 15:51:57 +03:00
sum.cu CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132) 2025-08-18 20:30:45 +03:00
sum.cuh
sumrows.cu CUDA: Optimize `reduce_rows_f32` kernel, leading up to 25x perf improvement on kernel-level and 10% perf increase for Gemma3n (llama/15132) 2025-08-18 20:30:45 +03:00
sumrows.cuh CUDA: add mean operation (llama/14313) 2025-07-01 17:54:53 +03:00
tsembd.cu
tsembd.cuh
unary.cu llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
unary.cuh llama : add gpt-oss (llama/15091) 2025-08-18 20:30:45 +03:00
upscale.cu CUDA: add bilinear interpolation for upscale (llama/14563) 2025-07-12 19:23:56 +03:00
upscale.cuh
vecdotq.cuh CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (llama/15451) 2025-09-20 13:42:41 +03:00
wkv.cu llama: Add support for RWKV v7 architecture (llama/12412) 2025-03-27 11:06:03 +02:00
wkv.cuh llama: Add support for RWKV v7 architecture (llama/12412) 2025-03-27 11:06:03 +02:00