Changelog History
Page 2
-
v3.7.0 Changes
February 13, 2020v3.7.0
⚡️ Major Updates
- ➕ Added the ability to customize the memory manager(Thanks jacobkahn and flashlight) [#2461]
- ➕ Added 16-bit floating point support for several functions [#2413] [#2587] [#2585] [#2587] [#2583]
- ➕ Added sumByKey, productByKey, minByKey, maxByKey, allTrueByKey, anyTrueByKey, countByKey [#2254]
- ➕ Added confidence connected components [#2748]
- ➕ Added neural network based convolution and gradient functions [#2359]
- ➕ Added a padding function [#2682]
- ➕ Added pinverse for pseudo inverse [#2279]
- ➕ Added support for uniform ranges in approx1 and approx2 functions. [#2297]
- ➕ Added support to write to preallocated arrays for some functions [#2599] [#2481] [#2328] [#2327]
- ➕ Added meanvar function [#2258]
- ➕ Add support for sparse-sparse arithmetic support [#2312]
- ➕ Added rsqrt function for reciprocal square root [#2500]
- ➕ Added a lower level af_gemm function for general matrix multiplication [#2481]
- ➕ Added a function to set the cuBLAS math mode for the CUDA backend [#2584]
- Separate debug symbols into separate files [#2535]
- 🖨 Print stacktraces on errors [#2632]
- 👌 Support move constructor for af::array [#2595]
- 🔦 Expose events in the public API [#2461]
- ➕ Add setAxesLabelFormat to format labels on graphs [#2495]
👌 Improvements
- 👍 Better error messages for systems with driver or device incompatibilities [#2678] [#2448][#2761]
- ⚡️ Optimized unified backend function calls [#2695]
- ⚡️ Optimized anisotropic smoothing [#2713]
- ⚡️ Optimized canny filter for CUDA and OpenCL [#2727]
- 👍 Better MKL search script [#2738][#2743][#2745]
- 👍 Better logging of different submodules in ArrayFire [#2670] [#2669]
- 👌 Improve documentation [#2665] [#2620] [#2615] [#2639] [#2628] [#2633] [#2622] [#2617] [#2558] [#2326][#2515]
- ⚡️ Optimized af::array assignment [#2575]
- ⚡️ Update the k-means example to display the result [#2521]
🛠 Fixes
- 🛠 Fix multi-config generators [#2736]
- 🛠 Fix access errors in canny [#2727]
- 🛠 Fix segfault in the unified backend if no backends are available [#2720]
- 🛠 Fix access errors in scan-by-key [#2693]
- 🛠 Fix sobel operator [#2600]
- 🛠 Fix an issue with the random number generator and s16 [#2587]
- 🛠 Fix issue with boolean product reduction [#2544]
- 🛠 Fix array_proxy move constructor [#2537]
- 🛠 Fix convolve3 launch configuration [#2519]
- 🛠 Fix an issue where the fft function modified the input array [#2520]
- ➕ Added a work around for nvidia-opencl runtime if forge dependencies are missing [#2761]
Contributions
Special thanks to our contributors:
@jacobkahn
@WilliamTambellini
@lehins
@r-barnes
@gaika
@ShalokShalom -
v3.6.4 Changes
May 20, 2019v3.6.4
The source code with sub-modules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.4.tar.bz2
🛠 Fixes
-
v3.6.3 Changes
April 22, 2019v3.6.3
The source code with sub-modules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.3.tar.bz2
👌 Improvements
- Graphics are now a runtime dependency instead of a link time dependency #2365
- ⬇️ Reduce the CUDA backend binary size using runtime compilation of kernels #2437
- Improved batched matrix multiplication on the CPU backend by using Intel MKL's
cblas_Xgemm_batched#2206 - Print JIT kernels to disk or stream using the
AF_JIT_KERNEL_TRACEenvironment variable #2404 void*pointers are now allowed as arguments toaf::array::write()#2367- Slightly improve the efficiency of JITed tile operations #2472
- 👉 Make the random number generation on the CPU backend to be consistent with CUDA and OpenCL #2435
- 🖐 Handled very large JIT tree generations #2484 #2487
🐛 Bug Fixes
- 🛠 Fixed
af::array::array_proxymove assignment operator #2479 - 🛠 Fixed input array dimensions validation in svdInplace() #2331
- 🛠 Fixed the typedef declaration for window resource handle #2357.
- Increase compatibility with GCC 8 #2379
- 🛠 Fixed
af::writetests #2380 - 🛠 Fixed a bug in broadcast step of 1D exclusive scan #2366
- 🛠 Fixed OpenGL related build errors on OSX #2382
- 🛠 Fixed multiple array evaluation. Performance improvement. #2384
- 🛠 Fixed buffer overflow and expected output of kNN SSD small test #2445
- 🛠 Fixed MKL linking order to enable threaded BLAS #2444
- ➕ Added validations for forge module plugin availability before calling resource cleanup #2443
- Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA backend #2443
- 🛠 Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 #2464
- 🛠 Fix errors on exits when using the cuda backend with unified #2470
📚 Documentation
- 🛠 Updated svdInplace() documentation following a bugfix #2331
- 🛠 Fixed a typo in matrix multiplication documentation #2358
- 🛠 Fixed a code snippet demonstrating C-API use #2406
- ⚡️ Updated hamming matcher implementation limitation #2434
- ➕ Added illustration for the rotate function #2453
Misc
- 👉 Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase #2362
- Display a more informative error message if CUDA driver is incompatible #2421 #2448
- 🔄 Changed forge resource management to use smart pointers #2452
- 🗄 Deprecated intl and uintl typedefs in API #2360
- 🏗 Enabled graphics by default for all builds starting with v3.6.3 #2365
- 🛠 Fixed several warnings #2344 #2356 #2361
- 🔨 Refactored initArray() calls to use createEmptyArray(). initArray() is for internal use only by Array class. #2361
- 🔨 Refactored
void*memory allocations to use unsigned char type #2459 - 🗄 Replaced deprecated MKL API with in-house implementations for sparse to sparse/dense conversions #2312
- 🛠 Reorganized and fixed some internal backend API #2356
- ⚡️ Updated compilation order of CUDA files to speed up compile time #2368
- ✂ Removed conditional graphics support builds after enabling runtime loading of graphics dependencies #2365
- Marked graphics dependencies as optional in CPack RPM config #2365
- 🔨 Refactored a sparse arithmetic backend API #2379
- Fixed const correctness of
af_device_arrayAPI #2396 - ⚡️ Update Forge to v1.0.4 #2466
- Manage Forge resources from the DeviceManager class #2381
- 🛠 Fixed non-mkl & non-batch blas upstream call arguments #2401
- 🔗 Link MKL with OpenMP instead of TBB by default
- 👉 use clang-format to format source code
Contributions
Special thanks to our contributors:
Alessandro Bessi
zhihaoy
Jacob Khan
William Tambellini -
v3.6.2 Changes
November 29, 2018v3.6.2
The source code with sub-modules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.2.tar.bz2
🔋 Features
- 👍 Batching support for
condargument in select() [#2243] - Broadcast batching for matmul [#2315]
- ➕ Add support for multiple nearest neighbours from nearestNeighbour() [#2280]
👌 Improvements
- 🐎 Performance improvements in morph() [#2238]
- 🛠 Fix linking errors when compiling without Freeimage/Graphics [#2248]
- 🛠 Fixes to improve the usage of ArrayFire as a sub-project [#2290]
- 👍 Allow custom library path for loading dynamic backend libraries [#2302]
🐛 Bug fixes
- 🛠 Fix overflow in
dim4::ndims. [#2289] - ✂ Remove setDevice from
af::arraydestructor [#2319] - 🛠 Fix pow precision for integral types [#2305]
- 🛠 Fix issues with tile with a large repeat dimension [#2307]
- Fix grid based indexing calculation in
af_draw_hist[#2230] - 🛠 Fix bug when using an
af::arrayfor indexing [#2311] - 🛠 Fix CLBlast errors on exit on Windows [#2222]
📚 Documentation
- 👌 Improve
unwrapdocumentation [#2301] - 👌 Improve
wrapdocumentation [#2320] - 🛠 Fix and improve
accumdocumentation [#2298] - 👌 Improve
tiledocumentation [#2293] - 📚 Clarify
approx*indexing in documentation [#2287] - 📚 Update examples of select in detailed documentation [#2277]
- ⚡️ Update
lookupexamples [#2288] - 📚 Update set documentation [#2299]
Misc
- 🆕 New ArrayFire ASSERT utility functions [#2249][#2256][#2257][#2263]
- 👌 Improve error messages in JIT [#2309]
af*library and dependencies directory changed tolib64[#2186]
Contributions
Thank you to our contributors:
Jacob Kahn
Vardan Akopian - 👍 Batching support for
-
v3.6.1 Changes
July 06, 2018v 3.6.1
🚀 The source code for this release can be downloaded here:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.1.tar.bz2👌 Improvements
- FreeImage is now a run-time dependency [#2164]
- ⬇️ Reduced binary size by setting the symbol visibility to hidden [#2168]
- ➕ Add logging to memory manager and unified loader using the
AF_TRACEenvironment variable [#2169][#2216] - 👌 Improved CPU Anisotropic Diffusion performance [#2174]
- Perform normalization after FFT for improved accuracy [#2185, #2192]
- ⚡️ Updated CLBlast to v1.4.0 [#2178]
- ➕ Added additional validation when using
af::seqfor indexing [#2153] - 👍 Perform checks for unsupported cards by the CUDA implementation [#2182]
- Avoid selecting backend if no devices are found. [#2218]
🐛 Bug Fixes
- 🛠 Fixed region when all pixels were the foreground or background [#2152]
- 🛠 Fixed several memory leaks [#2202, #2201, #2180, #2179, #2177, #2175]
- 🛠 Fixed bug in setDevice which didn't allow you to select the last device [#2189]
- 🛠 Fixed bug in min/max where the first element of the array was a NaN value [#2155]
- 🛠 Fixed graphics window indexing [#2207]
- 🛠 Fixed renaming issue when installing cuda libraries on OSX [#2221]
- 🛠 Fixed NSIS installer PATH variable [#2223]
-
v3.6.0 Changes
May 04, 2018v3.6.0
The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.6.0.tar.bz2⚡️ Major Updates
- Added the
topk()function. 1 - Added batched matrix multiply support.2 3
- Added anisotropic diffusion,
anisotropicDiffusion().Documentation 3.
🔋 Features
- Added support for batched matrix multiply. 1 2
- New anisotropic diffusion function,
anisotropicDiffusion(). Documentation 3. - New
topk()function, which returns the top k elements along a given dimension of the input. Documentation. 4 - 🖨 New gradient diffusion example.
👌 Improvements
- JITed
select()andshift()functions for CUDA and OpenCL backends. 1 - Significant CMake improvements. 2 3 4
- 👌 Improved the quality of the random number generator 5
- ✅ Corrected assert function calls in select() tests. 5
- Modified
af_colormapstruct to match forge's definition. 6 - 👌 Improved Black Scholes example. 7
- 🚀 Used CPack to generate installers. 8. We will be using CPack to generate installers beginning with this release.
- Refactored black_scholes_options example to use built-in
af::erfcfunction for cumulative normal distribution.9. - ⬇️ Reduced the scope of mutexes in memory manager 10
- Official installers do not require the CUDA toolkit to be installed starting with v3.6.0.
🐛 Bug fixes
- ⚠ Fixed
shfl_down()warnings with CUDA 9. 1 - Disabled CUDA JIT debug flags on ARM architecture.2
- 🛠 Fixed CLBLast install lib dir for linux platform where
libdirectory has arch(64) suffix.3 - 🛠 Fixed assert condition in 3d morph opencl kernel.4
- 🛠 Fixed JIT errors with large non-linear kernels5
- 🛠 Fixed bug in CPU JIT after moddims was called 5
- 🛠 Fixed a deadlock scenario caused by the method
MemoryManager::nativeFree6
📚 Documentation
- 🛠 Fixed variable name typo in
vectorization.md. 1 - Fixed
AF_API_VERSIONvalue in Doxygen config file. 2
Known issues
- 👍 NVCC does not currently support platform toolset v141 (Visual Studio 2017 R15.6). Use the v140 platform toolset, instead. You may pass in the toolset version to CMake via the
-Tflag like socmake -G "Visual Studio 15 2017 Win64" -T v140.- To download and install other platform toolsets, visit https://blogs.msdn.microsoft.com/vcblog/2017/11/15/side-by-side-minor-version-msvc-toolsets-in-visual-studio-2017
- ✅ Several OpenCL tests failing on OSX:
canny_opencl, fft_opencl, gen_assign_opencl, homography_opencl, reduce_opencl, scan_by_key_opencl, solve_dense_opencl, sparse_arith_opencl, sparse_convert_opencl, where_opencl
Contributions
Special thanks to our contributors:
Adrien F. Vincent, Cedric Nugteren, Felix, Filip Matzner, HoneyPatouceul, Patrick Lavin, Ralf Stubner, William Tambellini - Added the
-
v3.5.1 Changes
September 19, 2017v3.5.1
The source code with submodules can be downloaded directly from the following
link: http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.1.tar.bz2Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)👌 Improvements
- 😌 Relaxed
af::unwrap()function's arguments. 1 - 🔄 Changed behavior of af::array::allocated() to specify memory allocated. 1
- ✂ Removed restriction on the number of bins for
af::histogram()on CUDA and
OpenCL kernels. 1
🐎 Performance
- 👌 Improved JIT performance. 1
- 👌 Improved CPU element-wise operation performance. 1
- 👌 Improved regions performance using texture objects. 1
🐛 Bug fixes
- 🛠 Fixed overflow issues in mean. 1
- 🛠 Fixed memory leak when chaining indexing operations. 1
- 🛠 Fixed bug in array assignment when using an empty array to index. 1
- 🛠 Fixed bug with
af::matmul()which occured when its RHS argument was an
indexed vector. 1 - 🛠 Fixed bug deadlock bug when sparse array was used with a JIT Array. 1
- 🛠 Fixed pixel tests for FAST kernels. 1
- 🛠 Fixed
af::replaceso that it is now copy-on-write. 1 - 🛠 Fixed launch configuration issues in CUDA JIT. 1
- 🛠 Fixed segfaults and "Pure Virtual Call" error warnings when exiting on
Windows. 1 2 - ↪ Workaround for
clEnqueueReadBufferbug on OSX.
1
🏗 Build
- Fixed issues when compiling with GCC 7.1. 1 2
- Eliminated unnecessary Boost dependency from CPU and CUDA backends. 1
Misc
- ⚡️ Updated support links to point to Slack instead of Gitter. 1
- 😌 Relaxed
-
v3.5.0 Changes
June 23, 2017v3.5.0
The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.5.0.tar.bz2Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)⚡️ Major Updates
- 👍 ArrayFire now supports threaded applications. 1
- ➕ Added Canny edge detector. 1
- ➕ Added Sparse-Dense arithmetic operations. 1
🔋 Features
- ArrayFire Threading
- af::array can be read by multiple threads
- All ArrayFire functions can be executed concurrently by multiple threads
- Threads can operate on different devices to simplify Muli-device workloads
- 🆕 New Canny edge detector function, af::canny(). 1
- Can automatically calculate high threshold with
AF_CANNY_THRESHOLD_AUTO_OTSU - Supports both L1 and L2 Norms to calculate gradients
- Can automatically calculate high threshold with
- 🆕 New tuned OpenCL BLAS backend, CLBlast.
👌 Improvements
- 📄 Converted CUDA JIT to use NVRTC instead of NVVM.
- 🐎 Performance improvements in af::reorder(). 1
- 🐎 Performance improvements in array::scalar(). 1
- 👌 Improved unified backend performance. 1
- ArrayFire now depends on Forge v1.0. 1
- Can now specify the FFT plan cache size using the af::setFFTPlanCacheSize() function.
- Get the number of physical bytes allocated by the memory manager
af_get_allocated_bytes(). 1 - af::dot() can now return a scalar value to the host. 1
🐛 Bug Fixes
- 🛠 Fixed improper release of default Mersenne random engine. 1
- 🛠 Fixed af::randu() and af::randn() ranges for floating point types. 1
- 🛠 Fixed assignment bug in CPU backend. 1
- 🛠 Fixed complex (
c32,c64) multiplication in OpenCL convolution kernels. 1 - Fixed inconsistent behavior with af::replace() and replace_scalar(). 1
- Fixed memory leak in af_fir(). 1
- 📜 Fixed memory leaks in af_cast for sparse arrays. 1
- Fixing correctness of af_pow for complex numbers by using Cartesian form. 1
- Corrected af::select() with indexing in CUDA and OpenCL backends. 1
- ↪ Workaround for VS2015 compiler ternary bug. 1
- 🛠 Fixed memory corruption in
cuda::findPlan(). 1 - Argument checks in af_create_sparse_array avoids inputs of type int64. 1
🏗 Build fixes
- On OSX, utilize new GLFW package from the brew package manager. 1 2
- 🛠 Fixed CUDA PTX names generated by CMake v3.7. 1
- 👌 Support
gcc> 5.x for CUDA. 1
Examples
- 🆕 New genetic algorithm example. 1
📚 Documentation
- ⚡️ Updated
README.mdto improve readability and formatting. 1 - ⚡️ Updated
README.mdto mention Julia and Nim wrappers. 1 - 👌 Improved installation instructions -
docs/pages/install.md. 1
Miscellaneous
- 👍 A few improvements for ROCm support. 1
- ✂ Removed CUDA 6.5 support. 1
Known issues
- 🏁 Windows
- The Windows NVIDIA driver version
37x.xxcontains a bug which causesfftconvolve_openclto fail. Upgrade or downgrade to a different version of the driver to avoid this failure. - The following tests fail on Windows with NVIDIA hardware:
threading_cuda,qr_dense_opencl,solve_dense_opencl.
- The Windows NVIDIA driver version
- 🍎 macOS
- The Accelerate framework, used by the CPU backend on macOS, leverages Intel graphics cards (Iris) when there are no discrete GPUs available. This OpenCL implementation is known to give incorrect results on the following tests:
lu_dense_{cpu,opencl},solve_dense_{cpu,opencl},inverse_dense_{cpu,opencl}. - Certain tests intermittently fail on macOS with NVIDIA GPUs apparently due to inconsistent driver behavior:
fft_large_cudaandsvd_dense_cuda. - The following tests are currently failing on macOS with AMD GPUs:
cholesky_dense_openclandscan_by_key_opencl.
- The Accelerate framework, used by the CPU backend on macOS, leverages Intel graphics cards (Iris) when there are no discrete GPUs available. This OpenCL implementation is known to give incorrect results on the following tests:
-
v3.4.2 Changes
December 21, 2016v3.4.2
The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.2.tar.bz2Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)🗄 Deprecation Announcement
🚀 This release supports CUDA 6.5 and higher. The next ArrayFire release will
👌 support CUDA 7.0 and higher, dropping support for CUDA 6.5. Reasons for no
👍 longer supporting CUDA 6.5 include:- 👍 CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which
is used by ArrayFire's CPU and OpenCL backends. - Very few ArrayFire users still use CUDA 6.5.
👍 As a result, the older Jetson TK1 / Tegra K1 will no longer be supported in
🚀 the next ArrayFire release. The newer Jetson TX1 / Tegra X1 will continue to
have full capability with ArrayFire.🐳 Docker
👌 Improvements
- Implemented sparse storage format conversions between AF_STORAGE_CSR
and AF_STORAGE_COO.
1- Directly convert between AF_STORAGE_COO <--> AF_STORAGE_CSR
📜 using the af::sparseConvertTo() function. - af::sparseConvertTo() now also supports converting to dense.
- Directly convert between AF_STORAGE_COO <--> AF_STORAGE_CSR
- 📜 Added cast support for sparse arrays.
1- Casting only changes the values array and the type. The row and column
index arrays are not changed.
- Casting only changes the values array and the type. The row and column
- Reintroduced automated computation of chart axes limits for graphics functions.
1- The axes limits will always be the minimum/maximum of the current and new
limit. - The user can still set limits from API calls. If the user sets a limit
from the API call, then the automatic limit setting will be disabled.
- The axes limits will always be the minimum/maximum of the current and new
- Using
boost::scoped_arrayinstead ofboost::scoped_ptrwhen managing
array resources.
1 - 🐎 Internal performance improvements to getInfo() by using
constreferences
to avoid unnecessary copying ofArrayInfoobjects.
1 - ➕ Added support for scalar af::array inputs for af::convolve() and
set functions.
1
2
3 - 🐎 Performance fixes in af::fftConvolve() kernels.
1
2
🏗 Build
- 👌 Support for Visual Studio 2015 compilation.
1
2 - 🛠 Fixed
FindCBLAS.cmakewhen PkgConfig is used.
1
🐛 Bug fixes
- 🛠 Fixes to JIT when tree is large.
1
2 - 🛠 Fixed indexing bug when converting dense to sparse af::array as
AF_STORAGE_COO.
1 - 🛠 Fixed af::bilateral() OpenCL kernel compilation on OS X.
1 - 🛠 Fixed memory leak in af::regions() (CPU) and af::rgb2ycbcr().
1
2
3
Installers
- 🛠 Major OS X installer fixes.
1- Fixed installation scripts.
- Fixed installation symlinks for libraries.
- 🏁 Windows installer now ships with more pre-built examples.
Examples
- ➕ Added af::choleskyInPlace() calls to
cholesky.cppexample.
1
📚 Documentation
- ➕ Added
u8as supported data type ingetting_started.md.
1 - 🛠 Fixed typos.
1
CUDA 8 on OSX
- 👍 CUDA 8.0.55 supports Xcode 8.
1
Known Issues
- Known failures with CUDA 6.5. These include all functions that use
📜 sorting. As a result, sparse storage format conversion between
AF_STORAGE_COO and AF_STORAGE_CSR has been disabled for CUDA 6.5.
- 👍 CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which
-
v3.4.1 Changes
October 15, 2016v3.4.1
The source code with submodules can be downloaded directly from the following link:
http://arrayfire.com/arrayfire_source/arrayfire-full-3.4.1.tar.bz2Installer CUDA Version: 8.0 (Required)
Installer OpenCL Version: 1.2 (Minimum)Installers
- 🐧 Installers for Linux, OS X and Windows
- CUDA backend now uses CUDA 8.0.
- Uses Intel MKL 2017.
- CUDA Compute 2.x (Fermi) is no longer compiled into the library.
- Installer for OS X
- The libraries shipping in the OS X Installer are now compiled with Apple
Clang v7.3.1 (previouly v6.1.0). - The OS X version used is 10.11.6 (previously 10.10.5).
- The libraries shipping in the OS X Installer are now compiled with Apple
- Installer for Jetson TX1 / Tegra X1
- Requires JetPack for L4T 2.3
🐧 (containing Linux for Tegra r24.2 for TX1). - CUDA backend now uses CUDA 8.0 64-bit.
- Using CUDA's cusolver instead of CPU fallback.
- Uses OpenBLAS for CPU BLAS.
- All ArrayFire libraries are now 64-bit.
- Requires JetPack for L4T 2.3
👌 Improvements
- ➕ Add sparse array support to af::eval().
1 - ➕ Add OpenCL-CPU fallback support for sparse af::matmul() when running on
📜 a unified memory device. Uses MKL Sparse BLAS. - When using CUDA libdevice, pick the correct compute version based on device.
1 - 👍 OpenCL FFT now also supports prime factors 7, 11 and 13.
1
2
🐛 Bug Fixes
- 👍 Allow CUDA libdevice to be detected from custom directory.
- 🛠 Fix
aarch64detection on Jetson TX1 64-bit OS.
1 - Add missing definition of
af_set_fft_plan_cache_sizein unified backend.
1 - 🛠 Fix intial values for af::min() and af::max() operations.
1
2 - 🛠 Fix distance calculation in af::nearestNeighbour for CUDA and OpenCL backend.
1
2 - 🛠 Fix OpenCL bug where scalars where are passed incorrectly to compile options.
1 - 🛠 Fix bug in af::Window::surface() with respect to dimensions and ranges.
1 - Fix possible double free corruption in af_assign_seq().
1 - ➕ Add missing eval for key in af::scanByKey in CPU backend.
1 - Fixed creation of sparse values array using AF_STORAGE_COO.
1
1
Examples
- ➕ Add a Conjugate Gradient solver example
📜 to demonstrate sparse and dense matrix operations.
1
CUDA Backend
- When using CUDA 8.0,
0️⃣ compute 2.x are no longer in default compute list.- This follows CUDA 8.0
🗄 deprecating computes 2.x. - Default computes for CUDA 8.0 will be 30, 50, 60.
- This follows CUDA 8.0
- 0️⃣ When using CUDA pre-8.0, the default selection remains 20, 30, 50.
- 0️⃣ CUDA backend now uses
-arch=sm_30for PTX compilation as default.- Unless compute 2.0 is enabled.
Known Issues
- af::lu() on CPU is known to give incorrect results when built run on
OS X 10.11 or 10.12 and compiled with Accelerate Framework.
1- Since the OS X Installer libraries uses MKL rather than Accelerate
Framework, this issue does not affect those libraries.
- Since the OS X Installer libraries uses MKL rather than Accelerate
- 🐧 Installers for Linux, OS X and Windows