whisper.cpp

Commit Graph

Author	SHA1	Message	Date
Jeff Bolz	a682fdce0c	vulkan: fix compile warnings (llama/10731)	2024-12-18 12:52:16 +02:00
stduhpf	9ffbd3d969	Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (llama/10723) * Vulkan: fix NaN in tanh.comp * Faster NaN-free tanh	2024-12-18 12:52:16 +02:00
Jeff Bolz	6585a890b4	vulkan: compile a test shader in cmake to check for coopmat2 support (llama/10713)	2024-12-18 12:52:16 +02:00
Georgi Gerganov	d0a050b51f	ggml : disable iq4_nl interleave size 8 (llama/10709) ggml-ci	2024-12-18 12:52:16 +02:00
Djip007	e990d1b791	ggml : refactor online repacking (llama/10446) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-18 12:52:16 +02:00
0cc4m	4a6d52efe6	Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (llama/10597) * Vulkan: Implement VK_KHR_cooperative_matrix support in the matrix matrix multiplication shader * Improve performance with better q4_k and q5_k dequant and store unrolling * Add Vulkan MUL_MAT and MUL_MAT_ID accumulator precision selection * Rework mulmat shader selection and compilation logic, avoid compiling shaders that won't get used by device * Vulkan: Implement accumulator switch for specific mul mat mat shaders * Vulkan: Unroll more loops for more mul mat mat performance * Vulkan: Add VK_AMD_shader_core_properties2 support to read Compute Unit count for split_k logic * Disable coopmat support on AMD proprietary driver * Remove redundant checks * Add environment variable GGML_VK_DISABLE_COOPMAT to disable VK_KHR_cooperative_matrix support * Fix rebase typo * Fix coopmat2 MUL_MAT_ID pipeline selection	2024-12-18 12:52:16 +02:00
Robert Ormandi	8b841d430a	metal : Extend how Llama.cpp locates metal resources (llama/10676) * metal : Extend how Llama.cpp locates metal resources (llama/10675) * It searches the resource file in the directory where the current binary is located as well. * Resolves symbolic links. Rationale: When we plug this dependency into a Bazel build and run it in the context of Bazel (e.g. testing): * the execution directory is often very different from where the files are located and no direct control over this (Bazel sandboxing), * the Bazel sandbox often use symbolic links to make files available. With this patch, we can have the resource file added to the target, can build and run tests in the context of Bazel. * Update ggml/src/ggml-metal/ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-metal/ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-18 12:52:16 +02:00
Jeff Bolz	b74b68212a	vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (llama/10206)	2024-12-18 12:52:16 +02:00
Georgi Gerganov	94e7da1ff2	cmake : fix "amd64" processor string (#2638 )	2024-12-17 18:34:32 +02:00
gn64	c4aed6831e	vulkan : fix soft_max.comp division by zero (#2633 ) This change prevents a division by zero error when p.KY is 0.	2024-12-16 12:34:38 +02:00
Georgi Gerganov	7d134e3737	ggml : remove old files (skip) (#0 )	2024-12-08 23:04:26 +02:00
Georgi Gerganov	9df53b357e	ggml : sync remnants (skip) (#0 )	2024-12-08 22:48:25 +02:00
Diego Devesa	a815940e0e	ggml : add predefined list of CPU backend variants to build (llama/10626) * ggml : add predefined list of CPU backend variants to build * update CPU dockerfiles	2024-12-08 20:14:35 +02:00
Diego Devesa	904e307bce	ggml-cpu : fix HWCAP2_I8MM value (llama/10646)	2024-12-08 20:14:35 +02:00
Jeff Bolz	491ec076b4	vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (llama/10642)	2024-12-08 20:14:35 +02:00
Nicolò Scipione	966433fdf2	SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (llama/10584) * [SYCL] Move to Compile Time backend selection on oneMKL Interface for NVIDIA backend Move to compile time selection to backend to avoid latency at run time. Add it to all mkl gemm calls and only for NVIDIA backend. Signed-off-by: nscipione <nicolo.scipione@codeplay.com> * Formatting * Address PR comments to increase readibility --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2024-12-08 20:14:35 +02:00
Frankie Robertson	6f1ba9d82d	Avoid using __fp16 on ARM with old nvcc (llama/10616)	2024-12-08 20:14:35 +02:00
Jeff Bolz	015ecd0001	vulkan: optimize and reenable split_k (llama/10637) Use vector loads when possible in mul_mat_split_k_reduce. Use split_k when there aren't enough workgroups to fill the shaders.	2024-12-08 20:14:35 +02:00
PAB	b7c64a4352	ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037) * implemented cpu kernel * add i32 test cases in test-backend-ops * typedef `ggml_metal_kargs_set` * implemented `kernel_set` * memcpy	2024-12-08 20:14:35 +02:00
PAB	7895d39508	ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034) * ggml_pad_reflect_1d defined in header * implemented on CPU * called the forward pass * impl Metal kernel * added Metal kernel * added OP_PAD_REFLECT_1D in test-backend-ops.cpp * add test-pad-reflect-1d test case * test case support multiple backend	2024-12-08 20:14:35 +02:00
Georgi Gerganov	22616f00f9	files : remove make artifacts	2024-12-08 20:14:35 +02:00
Diego Devesa	3daeacad24	ggml : move AMX to the CPU backend (llama/10570) ggml : automatic selection of best CPU backend (llama/10606)	2024-12-08 20:14:35 +02:00
Georgi Gerganov	4d73962da4	metal : small-batch mat-mul kernels (llama/10581) * metal : small-batch mat-mul kernels ggml-ci * metal : add rest of types ggml-ci * metal : final adjustments ggml-ci * metal : add comments ggml-ci	2024-12-08 20:14:35 +02:00
Akarshan Biswas	068812650e	SYCL: Fix and switch to GGML_LOG system instead of fprintf (llama/10579) * Switched to GGML_LOG * Fix missing semicolon	2024-12-08 20:14:35 +02:00
Adrien Gallouët	4b7e059e15	ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (llama/10567) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2024-12-08 20:14:35 +02:00
Eve	30e35d7271	vulkan: Dynamic subgroup size support for Q6_K mat_vec (llama/10536) * subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45) * force 16 sequential threads per block * make 16 subgroup size a constant	2024-12-08 20:14:35 +02:00
Georgi Gerganov	3623bd58f2	ggml : fix I8MM Q4_1 scaling factor conversion (llama/10562) ggml-ci	2024-12-08 20:14:35 +02:00
Shupei Fan	cb847c20a7	ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (llama/10580)	2024-12-08 20:14:35 +02:00
Alberto Cabrera Pérez	964b154a2a	sycl : offload of get_rows set to 0 (llama/10432)	2024-12-08 20:14:35 +02:00
Alberto Cabrera Pérez	d7c2a04bce	sycl : Reroute permuted mul_mats through oneMKL (llama/10408) This PR fixes the failing MUL_MAT tests for the sycl backend.	2024-12-08 20:14:35 +02:00
Chenguang Li	2bb4ca9cba	CANN: RoPE operator optimization (llama/10563) * [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-12-08 20:14:35 +02:00
Jeff Bolz	a753a82462	vulkan: get the first command buffer submitted sooner (llama/10499) This is an incremental improvement over #9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.	2024-12-08 20:14:35 +02:00
Georgi Gerganov	276b08d8f0	ggml : remove redundant copyright notice + update authors	2024-12-08 20:14:35 +02:00
Georgi Gerganov	4ca1e72fe0	ggml : fix row condition for i8mm kernels (llama/10561) ggml-ci	2024-12-08 20:14:35 +02:00
Georgi Gerganov	16a66f103f	cmake : fix ARM feature detection (llama/10543) ggml-ci	2024-12-08 20:14:35 +02:00
Shupei Fan	330273901f	ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541) * ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard	2024-12-08 20:14:35 +02:00
Sergio López	42099a9342	kompute : improve backend to pass test_backend_ops (llama/10542) * kompute: op_unary: reject unsupported parameters Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: softmax: implement ALiBi support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: rope: implement neox and phi3 support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q4_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_[q4_0\|q4_1\|q8_0] permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_f16 permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> * kompute: op_mul_mat_q6_k permutted support Signed-off-by: Sergio Lopez <slp@redhat.com> --------- Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-12-08 20:14:35 +02:00
leo-pony	90dd5fca9c	CANN: Fix SOC_TYPE compile bug (llama/10519) * CANN: Fix the bug build fail on Ascend310P under two cases: 1) Manual specify SOC_TYPE 2) Under some unusual compile environment * Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU. * fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version	2024-12-08 20:14:35 +02:00
Chenguang Li	2490f2a7f8	CANN: ROPE operator optimization (llama/10540) * [cann] ROPE operator optimization Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-12-08 20:14:35 +02:00
uvos	230e985633	Add some minimal optimizations for CDNA (llama/10498) * Add some minimal optimizations for CDNA * ggml_cuda: set launch bounds also for GCN as it helps there too	2024-12-08 20:14:35 +02:00
Georgi Gerganov	ae24083f23	metal : fix group_norm support condition (llama/0)	2024-12-08 20:14:35 +02:00
Jeff Bolz	6463e36369	vulkan: define all quant data structures in types.comp (llama/10440)	2024-12-08 20:14:35 +02:00
Jeff Bolz	b3301f7d82	vulkan: Handle GPUs with less shared memory (llama/10468) There have been reports of failure to compile on systems with <= 32KB of shared memory (e.g. #10037). This change makes the large tile size fall back to a smaller size if necessary, and makes mul_mat_id fall back to CPU if there's only 16KB of shared memory.	2024-12-08 20:14:35 +02:00
Jeff Bolz	ab5d4d93ec	vulkan: further optimize q5_k mul_mat_vec (llama/10479)	2024-12-08 20:14:35 +02:00
Jeff Bolz	2d6e9dd723	vulkan: skip integer div/mod in get_offsets for batch_idx==0 (llama/10506)	2024-12-08 20:14:35 +02:00
Jeff Bolz	2f16e51553	vulkan: optimize Q2_K and Q3_K mul_mat_vec (llama/10459)	2024-12-08 20:14:35 +02:00
R0CKSTAR	0f0994902f	mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (llama/10516) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-12-08 20:14:35 +02:00
Jeff Bolz	5e1fcc1780	vulkan: fix group_norm (llama/10496) Fix bad calculation of the end of the range. Add a backend test that covers the bad case (taken from stable diffusion). Fixes https://github.com/leejet/stable-diffusion.cpp/issues/439.	2024-12-08 20:14:35 +02:00
Georgi Gerganov	48f421de23	cmake : enable warnings in llama (llama/10474) * cmake : enable warnings in llama ggml-ci * cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS * cmake : get_flags -> ggml_get_flags * speculative-simple : fix warnings * cmake : reuse ggml_get_flags ggml-ci * speculative-simple : fix compile warning ggml-ci	2024-12-08 20:14:35 +02:00
Charles Xu	e7afb2b991	ggml-cpu: cmake add arm64 cpu feature check for macos (llama/10487) * ggml-cpu: cmake add arm64 cpu feature check for macos * use vmmlaq_s32 for compile option i8mm check	2024-12-08 20:14:35 +02:00

1 2 3 4 5 ...

395 Commits