ArrayFire v3.10.0 Release Notes
Release Date: 2025-09-05 // 9 months ago-
v3.10.0
👌 Improvements
- ➕ Added signed int8 support #3661 #3508 #3507 #3503
- 👍 Increased support for half (fp16) #3680 #3258 #3561 #3627 #3561 #3627 #3559
- ⚡️ Updated oneAPI to use Intel oneAPI (R) 2025.1 #3643 #3573
- ⚡️ Updated cl2hpp dependency #3651 #3562
- ➕ Add support for CUDA 12.3, 12.4, 12.5, 12.6, 12.8, and 12.9 #3657 #3645 #3641 #3636 #3588 #3552 #3586 #3541
- ➕ Added minimum driver version check for CUDA GPUs #3648
- ➕ Add more examples #3530 #3455 #3375 #3612 #3584 #3577
- 📚 Updated documentation #3496 #3613
- 👌 Improved performance of matrix multiplication of sparse matrices on the OpenCL backend #3608
- 👌 Improved cmake configure #3581 #3569 #3567 #3564 #3554
- Loosen indexing assertions for assignments #3514
🛠 Fixes
- 🛠 Fix jit tree when doing operations containing moddims and original array #3671
- 🛠 Fix incorrect behavior of sub-arrays with multiple functions #3679 #3668 #3666 #3665 #3664 #3663 #3658 #3659 #3650 #3611 #3633 #3602
- 🛠 Fix half precision operations in multiple backends #3676 #3662
- 🛠 Fix for join not always respecting the order of parameters #3667 #3513
- 🛠 Fix for cmake building as an external project (needed by arrayfire python wheels) #3669
- 🛠 Fix for cmake build in Windows (including with vcpkg) #3655 #3646 #3644 #3512 #3626 #3566 #3557 #3591 #3592
- 🛠 Fix race condition in OpenCL flood fill #3535
- 🛠 Fix indexing array using sequences
af_seqthat have non-unit steps #3587 - 🛠 Fix padding issue convolve2GradientNN #3519
- 🛠 Fix incorrect axis values for histogram #3590
- 🛠 Fix unified exceptions errors #3617
- 🛠 Fix OpenCL memory migration on devices with different contexts #3510
- 🛠 Fix conversion of COO Sparse to Dense matrix #3589 #3579
- Fix
AF_JIT_KERNEL_TRACEon Windows #3517 - 🛠 Fix cmake build with CUDNN #3521
- Fix cmake build with
AF_DISABLE_CPU_ASYNC#3551
Contributions
Special thanks to our contributors:
Willy Born
verstatx
Filip Matzner
Fraser Cormack
errata-c
Tyler Hilbert
Previous changes from v3.9.0
-
v3.9.0
👌 Improvements
- ➕ Add oneAPI backend #3296
- ➕ Add support to directly access arrays on other devices #3447
- ➕ Add asynchronous reduce all functions that return an af_array #3199
- ➕ Add broadcast support #2871
- 👌 Improve OpenCL CPU JIT performance #3257 #3392
- ⚡️ Optimize thread/block calculations of several kernels #3144
- ➕ Add support for fast math compiliation when building ArrayFire #3334 #3337
- 🐎 Optimize performance of fftconvolve when using floats #3338
- ➕ Add support for CUDA 12.1 and 12.2
- 👍 Better handling of empty arrays #3398
- 👍 Better handling of memory in linear algebra functions in OpenCL #3423
- 👍 Better logging with JIT kernels #3468
- ⚡️ Optimize memory manager/JIT interactions for small number of buffers #3468
- 📚 Documentation improvements #3485
- ⚡️ Optimize reorder function #3488
🛠 Fixes
- 👌 Improve Errors when creating OpenCL contexts from devices #3257
- 👌 Improvements to vcpkg builds #3376 #3476
- 🛠 Fix reduce by key when nan's are present #3261
- 🛠 Fix error in convolve where the ndims parameter was forced to be equal to 2 #3277
- 👉 Make constructors that accept dim_t to be explicit to avoid invalid conversions #3259
- 🛠 Fix error in randu when compiling against clang 14 #3333
- 🛠 Fix bug in OpenCL linear algebra functions #3398
- 🛠 Fix bug with thread local variables when device was changed #3420 #3421
- 🛠 Fix bug in qr related to uninitialized memory #3422
- 🛠 Fix bug in shift where the array had an empty middle dimension #3488
Contributions
Special thanks to our contributors:
Willy Born
Mike Mullen