Arrays, presumably numerical, in julia.
Julia arrays are 1-indexed, right?
They have arbitrary indices.
is needed to support such (recently introduced) arbitrary indexing which is not reflected in all documentation online.
Best practice is not to count from
If you just need to visit each element, iterate using eachindex.
If you want to iterate by axes, instead of:
function mycopy!(dest::AbstractVector, src::AbstractVector) length(dest) == length(src) || throw( DimensionMismatch("vectors must match")) # OK, now we’re safe to use @inbounds, right? (not anymore!) for i = 1:length(src) @inbounds dest[i] = src[i] end dest end
function mycopy!(dest::AbstractVector, src::AbstractVector) axes(dest) == axes(src) || throw( DimensionMismatch("vectors must match")) for i in LinearIndices(src) @inbounds dest[i] = src[i] end dest end
Looking for the
stack operation, where you combine, e.g. 2D arrays into 3d arrays?
Use cat on a non-existent dimension, e.g.
There is an under-development keyword is the under-development that requires no
Generic multidimensional indexing
If you want to avoid macros but you can’t work out how to get your broadcasts to line up, there is the CartesianIndices system, which is compact, general but not especially clear. Here is a surprisingly simple (sic) one-pole filter implementation
function expfiltdim(x, dim::Integer, α) s = similar(x) Rpre = CartesianIndices(size(x)[1:dim-1]) Rpost = CartesianIndices(size(x)[dim+1:end]) _expfilt!(s, x, α, Rpre, size(x, dim), Rpost) end function _expfilt!(s, x, α, Rpre, n, Rpost) for Ipost in Rpost # Initialize the first value along the filtered dimension for Ipre in Rpre s[Ipre, 1, Ipost] = x[Ipre, 1, Ipost] end # Handle all other entries for i = 2:n for Ipre in Rpre s[Ipre, i, Ipost] = α*x[Ipre, i, Ipost] + (1-α)*s[Ipre, i-1, Ipost] end end end s end
You can avoid any of these for a couple of common cases. EllipsisNotation provides compact intuitive syntactic sugar if you wish to get at the first or last axis.
A[..,1] = [2 1 4 5 2 2 3 6] A[..,2] = [3 2 6 5 3 2 6 6] A[:,:,1] == [2 1 4 5 2 2 3 6] #true
There is also AxisAlgorithms, which does batch matrix multiplications and inversions for a couple of handy cases.
Arrays are column-major
The first index should change fastest. That is, walk over the last index in your outer loop, and apply operations to slices comprising the other indices.
Multidimensional arrays in Julia are stored in column-major order. This means that arrays are stacked one column at a time. This can be verified using the vec function or the syntax
copy_col_rowis much faster than
copy_row_colbecause it follows our rule of thumb that the first element to appear in a slice expression should be coupled with the inner-most loop. This is expected because copy_cols respects the column-based memory layout of the Matrix and fills it one column at a time. Additionally,… it follows our rule of thumb that the first element to appear in a slice expression should be coupled with the inner-most loop.
Einstein summation is a handy way of thinking about lots of tensor operations which is not as esoteric as it sounds, including, e.g. batch matrix multiply. TensorOperations.jl does some optimised einstein summations and is GPU-compatible. Einsum.jl does reputedly more gneral Einstein-type summations but with possibly less optimisation? mcabbott/Tullio.jl: ⅀ is the newest entrant:
Used by itself the macro writes ordinary nested loops much like
Einsum.@einsum. One difference is that it can parse more expressions (such as convolution , and worse). Another is that it will use multi-threading (via
Threads.@spawn) and recursive tiling, on large enough arrays. But it also co-operates with various other packages, provided they are loaded before the macro is called
One of the best scientific computing packages is BandedMatrices.jl. It lets you directly define banded matrices and then overloads things like
\to use the right parts of BLAS. Compared to just using a sparse matrix (the standard MATLAB/Python way), this is SCREAMING FAST (the QR factorization difference is pretty big). Banded matrices show up everywhere in scientific computing (pretty much every PDE has a banded matrix operator). If it’s not a banded matrix, then you probably have a block-banded matrix, and thank god that @dlfivefifty also wrote BlockBandedMatrices.jl which will utilize the same speed tricks but expanded to block banded matrices (so, discretizations of systems of PDEs).
Shunting to GPUs
Coding on GPUs for Julia is, at least in principle, easy. Thus you can possibly glean some nearly-free acceleration by using, for example, CuArrays to push computation onto GPUs. Simon Danisch has written the simplest examples of GPUs usage in julia. Some thought in how you structure algorithms is needed. AFAICT the hippest interface is the vendor-neutral GPUArrays which, e.g. (I think) for CUDA will transparently use CUArrays and CUDANative, for e.g. an NVIDIA GPU. You can also invoke them in a vendor-specific way, which looks easy. There is even activist support for google cloud TPU compilation. How well this vectorises in practice is not clear to me yet.
Ground zero for this appeas to be JuliaGPU. I am not up to speed on what is happening here at the moment.
a.k.a. broadcasting wizardry.
There are a couple of issues here; dot syntax in julia provides a means to both compactly express vectorized operations in julia and also to do smart broadcasting and operation fusing (which compiles functions down to vectorised kernels collapsing intermediate operations.) The details are non-obvious, but you can reportedly often ignore them and just use the @. macro to magically invoke it. In most cases it works transparently on your own functions too.
Note the manual says:
In Julia, vectorized functions are not required for performance, and indeed it is often beneficial to write your own loops (see Performance Tips), but they can still be convenient.
This is directly contradicted shortly after by the same manual and also the ephemeral julia discussion on the net. As some recent packages note you probably do want to use this syntax for vector operations because it works on GPUs, whereas using basic iteration syntax might fail to parallelize to GPUs.
In practice some thought is required to get this working; One might need reshape to get the dimensions lined up, or more complicated indexing. Pankaj Mishra’s note includes an example of how you would do complicated indexing and mutation in parallel over sparse matrices using getindex and setindex!.
This is not a huge problem; vectorised syntax is not so opaque.
But what if I want to do custom broadcasting because, e.g. I can do some operation over a grid faster using an FFT, so the FFT must be taken when I broadcast? How do I define custom broadcasting for this kind of case for some type in julia?
I’ll try to work this out from the moving parts in the ApproxFun system. 🏗
This library provides the @avx macro, which may be used to prefix a for loop or broadcast statement. It then tries to vectorize the loop to improve runtime performance.
The macro assumes that loop iterations can be reordered. It also currently supports simple nested loops, where loop bounds of inner loops are constant across iterations of the outer loop, and only a single loop at each level of noop lest. These limitations should be removed in a future version.