allCode.jl
. They've been tested under Julia 1.10.4. <function>
or <operator>
).
This is just notation, and the symbols <
and >
should not be misconstrued as Julia's syntax.
Action | Keyboard Shortcut |
---|---|
Previous Section | Ctrl + 🠘 |
Next Section | Ctrl + 🠚 |
List of Sections | Ctrl + z |
List of Subsections | Ctrl + x |
Close Any Popped Up Window (like this one) | Esc |
Open All Codes and Outputs in a Post | Alt + 🠛 |
Close All Codes and Outputs in a Post | Alt + 🠙 |
Unit | Acronym | Measure in Seconds |
---|---|---|
Seconds | s | 1 |
Milliseconds | ms | 10-3 |
Microseconds | μs | 10-6 |
Nanoseconds | ns | 10-9 |
allCode.jl
. They've been tested under Julia 1.10.4. Vector creation can be a performance bottleneck, given the memory-allocation overhead associated with it. The issue arises not only when defining new vectors, but also when we reference a slice like x[1:2]
, or computing intermediate results on the fly like x .* y
in sum(x .* y)
.
This section introduces one way to address this limitation of the built-in vectors. It's based on the so-called static vectors, which are made available by the StaticArrays
package. Unlike the built-in vectors, which are allocated on the heap, static vectors are allocated on the stack and are essentially tuples under the hood. This feature determines that they're only suitable for collections comprising a few elements. As a rule of thumb, consider using static vectors for up to 75 elements. Beyond this threshold, the overhead associated with their creation and access will outweigh any potential performance gain, or directly result in a fatal error. [note] My recommended number of elements is actually lower than the documentation's suggested (100 elements). The reason for this discrepancy is that as you approach the upper limit, the performance benefits of static vectors compared to regular vectors decreases significantly. As a result, you may find yourself spending more time benchmarking performance than reaping any speed advantages.
Despite being built on top of tuples, static vectors offer various additional advantages. Firstly, they maintain their performance benefits even with larger collections. Secondly, static vectors are more convenient to work with, as they can be manipulated as regular vectors. In fact, they support other array types like matrices, explaining the package's name. Finally, StaticArrays
provides mutable variants, making them less rigid than tuples, which only exist as immutable objects. In this respect, it's worth indicating that, while the mutable version provides performance benefits relative to regular vectors, the immutable option still offers the best performance.
StaticArrays
package is already available in the workspace, which demands including the line using StaticArrays
at the beginning of each session.
The package StaticArrays
include several variants of static vectors. Our focus we'll be mainly on the type SVector
, which we'll refer to as SVectors.
There are several approaches to creating an SVector, each serving a distinct purpose. We present them below. The first one creates an SVector through literal values, while the other option included converts a standard vector into an SVector.
# all 'sx' define the same static vector '[3,4,5]'
sx = SVector(3,4,5)
sx = SVector{3, Int64}(3,4,5)
sx = SA[3,4,5]
sx = @SVector [i for i in 3:5]
sx
3-element SVector{3, Int64} with indices SOneTo(3):
3
4
5
# all 'sx' define a static vector with same elements as 'x'
x = collect(1:10)
sx = SVector(x...)
sx = SVector{length(x), eltype(x)}(x)
sx = SA[x...]
sx = @SVector [a for a in x]
sx
10-element SVector{10, Int64} with indices SOneTo(10):
1
2
3
â‹®
9
10
Of these approaches, we'll mainly rely on the function SVector
, occasionally employing SA
for indexing. [note] The approach based on the macro @SVector
requires some caveats. For instance, it doesn't support specifications based on local variables. This precludes the use of eachindex(x)
, unless x
is a global variable. Note the use of the splatting operator ...
to turn a regular vector into an SVector. The operator ...
is necessary for the function SVector
to transform a collection into a sequence of arguments. [note] For instance, foo(x...)
is equivalent to foo(x[1], x[2], x[3])
given a vector or tuple x
with 3 elements.
It's possible to define slices of SVectors, although we must be mindful of how indexing works: a slice remains an SVector when indices are specified as SVectors, whereas the slice becomes an ordinary vector for indices given by ranges or regular vectors. The sole exception to this rule is when the slice references the whole object (i.e., sx[:]
), in which case an SVector is returned.
x = collect(3:10) ; sx = SVector(x...)
# both define static vectors
slice1 = sx[:]
slice2 = sx[SA[1,2]] # or slice2 = sx[SVector(1,2)]
slice1
8-element SVector{8, Int64} with indices SOneTo(8):
3
4
â‹®
9
10
slice2
2-element SVector{2, Int64} with indices SOneTo(2):
3
4
x = collect(3:10) ; sx = SVector(x...)
# both define and ordinary vector
slice2 = sx[1:2]
slice2 = sx[[1,2]]
slice
2-element Vector{Int64}:
3
4
The objects defined in StaticArrays
are built on top of tuples when possible. This entails that SVectors typically don't allocate memory.
x = rand(10)
function foo(x)
a = x[1:2] # 1 allocation (copy of slice)
b = [3,4] # 1 allocation (vector creation)
sum(a) * sum(b) # 0 allocation (scalars don't allocate)
end
@btime foo($x)
29.819 ns (2 allocations: 160 bytes)
x = rand(10)
@views function foo(x)
a = x[1:2] # 0 allocation (view of slice)
b = [3,4] # 1 allocation (vector creation)
sum(a) * sum(b) # 0 allocation (scalars don't allocate)
end
@btime foo($x)
15.015 ns (1 allocation: 80 bytes)
x = rand(10); tup = Tuple(x)
function foo(x)
a = x[1:2] # 0 allocation (slice of tuple)
b = (3,4) # 0 allocation (tuple creation)
sum(a) * sum(b) # 0 allocation (scalars don't allocate)
end
@btime foo($tup)
1.400 ns (0 allocations: 0 bytes)
x = rand(10); sx = SA[x...]
function foo(x)
a = x[SA[1,2]] # 0 allocation (slice of static array)
b = SA[3,4] # 0 allocation (static array creation)
sum(a) * sum(b) # 0 allocation (scalars don't allocate)
end
@btime foo($sx)
1.600 ns (0 allocations: 0 bytes)
The impact of SVectors on memory allocations is especially relevant for operations creating temporary vectors, such as broadcasting.
x = rand(10)
foo(x) = sum(2 .* x)
@btime foo($x)
17.936 ns (1 allocation: 144 bytes)
x = rand(10); sx = SVector(x...)
foo(x) = sum(2 .* x)
@btime foo($sx)
1.800 ns (0 allocations: 0 bytes)
Interestingly, even if an operation doesn't allocate memory under regular vectors, SVectors still provide performance benefits. To illustrate this, consider the function sum(f, <vector>)
, which sums the elements of <vector>
after they're transformed via f
. The implementation below shows that the operation is faster with SVectors, despite not allocating memory with regular vectors.
x = rand(10)
foo(x) = sum(a -> 10 + 2a + 3a^2, x)
@btime foo($x)
4.400 ns (0 allocations: 0 bytes)
x = rand(10); sx = SVector(x...);
foo(x) = sum(a -> 10 + 2a + 3a^2, x)
@btime foo($sx)
2.900 ns (0 allocations: 0 bytes)
In Julia, SVectors are formally defined as objects with type SVector{N,T}
, where N
specifies the number of elements and T
denotes the element's type. For instance, SVector(4,5,6)
has type SVector{3,Int64}
, indicating that it comprises 3 elements with type Int64
. This implies that the number of elements is part of the SVector
type, a feature also present in tuples.
Similar to tuples, the type SVector
also defines an immutable object, meaning that its elements can't be added, removed, or modified. In case mutable collections are needed, the package StaticArrays
provides the type MVector
. MVectors and their slices share the same syntax as SVectors, with the only difference that the function Svector
is substituted with MVector
. This is illustrated in the following.
x = [1,2,3]
sx = SVector(x...)
sx[1] = 0
x = [1,2,3]
mx = MVector(x...)
mx[1] = 0
mx
3-element MVector{3, Int64} with indices SOneTo(3):
0
2
3
MVectors offer performance benefits over regular vectors, although they're equal to or slower than SVectors. Additionally, MVectors may result in memory allocations. Considering this, if you're certain that the collection won't be mutated, it's recommended to use SVectors—due to their immutable nature, SVectors will perform equally or better than MVectors, but never worse.
To demonstrate this, we present an example where SVectors and MVectors have the same performance and allocations, and another where they differ. The examples also reveal that, regardless of whether the Svector performs better than an MVector, both outperform regular vectors.
x = rand(10)
sx = SVector(x...); mx = MVector(x...)
foo(x) = sum(a -> 10 + 2a + 3a^2, x)
@btime foo($x)
4.400 ns (0 allocations: 0 bytes)
@btime foo($sx)
2.800 ns (0 allocations: 0 bytes)
@btime foo($mx)
2.900 ns (0 allocations: 0 bytes)
x = rand(10)
sx = SVector(x...); mx = MVector(x...)
foo(x) = 10 + 2x + 3x^2
@btime foo.($x)
19.739 ns (1 allocation: 144 bytes)
@btime foo.($sx)
1.600 ns (0 allocations: 0 bytes)
@btime foo.($mx)
6.600 ns (1 allocation: 96 bytes)
The mutability of MVectors makes them ideal for initializing a vector that will eventually be filled via a for-loop. In fact, executing similar(sx)
when sx
is an SVector automatically returns an MVector.
sx = SVector(1,2,3)
mx = similar(sx) # it defines an MVector with undef elements
3-element MVector{3, Int64} with indices SOneTo(3):
2073873450800
-1152921504606846976
0
Next, we consider operations that pre-allocate intermediate outputs in relation to static vectors. The conclusions are twofold. Firstly, if the operation exclusively uses SVectors, there's no need to pre-allocate intermediate results. Secondly, storing the final output as an MVector can entail performance gains relative to a regular vector.
The example we use involves counting the elements in x
that are greater than x[i]
, with the operation computed for each possible index i
. Formally, this is captured by sum(x .> x[i])
. [note] For simplicity, we use sum
to avoid introducing new functions, although it'd be more natural to use count
instead. . To make the comparison starker, the operation x .> x[i]
and sum
are written separately.
x = rand(50)
function foo(x; output = similar(x))
for i in eachindex(x)
temp = x .> x[i]
output[i] = sum(temp)
end
return output
end
@btime foo($x)
3.188 μs (101 allocations: 5.17 KiB)
x = rand(50)
function foo(x; output = similar(x), temp = similar(x))
for i in eachindex(x)
@. temp = x > x[i]
output[i] = sum(temp)
end
return output
end
@btime foo($x)
695.745 ns (2 allocations: 992 bytes)
x = rand(50); sx = SVector(x...)
function foo(x; output = Vector{Float64}(undef, length(x)))
for i in eachindex(x)
temp = x .> x[i]
output[i] = sum(temp)
end
return output
end
@btime foo($sx)
183.661 ns (1 allocation: 496 bytes)
x = rand(50); sx = SVector(x...)
function foo(x; output = similar(x))
for i in eachindex(x)
temp = x .> x[i]
output[i] = sum(temp)
end
return output
end
@btime foo($sx)
148.817 ns (1 allocation: 448 bytes)
x = rand(50); sx = SVector(x...)
function foo(x; output = MVector{length(x),eltype(x)}(undef))
for i in eachindex(x)
temp = x .> x[i]
output[i] = sum(temp)
end
return output
end
@btime foo($sx)
148.975 ns (1 allocation: 448 bytes)
Note that every implementation requires pre-allocating the vector output
, which leaves us with only one decision to make: whether to pre-allocate the temporary vector temp
. This also explains why every implementation involves at least one memory allocation.
The "No-Preallocation" tab serves as a baseline for comparison. In contrast, the "Pre-allocating" tab reuses a regular vector to compute temp
, while the "SVector" tab converts x
to an SVector sx
without pre-allocating temp
. The benchmarks reveal that the SVector approach outperforms pre-allocation, thanks to the memory allocation avoidance and additional optimization opportunities provided by SVectors.
As for the last two tabs, they continue defining x
as an SVector, but additionally treat output
as an MVector. The last tab in particular does this by using similar(sx)
to initialize output
, whereas the other tab explicitly specifies an MVector. As the benchmarks demonstrate, integrating MVectors into the operation yields further performance gains.
SVectors are defined as objects with a type SVector{N,T}
. Due to this, the type definition requires not only the type T
of its elements, but also the number of elements N
it comprises. This feature, which is shared with MVectors and inherited from tuples, can readily introduce type instabilities if not handled carefully.
A way to avoid this potential issue is by passing any SVector and MVector as a function argument. Alternatively, we can employ the Val
technique discussed for tuples, which enables dispatch by the number of elements.
x = rand(50)
function foo(x)
output = SVector{length(x), eltype(x)}(undef)
output = MVector{length(x), eltype(x)}(undef)
for i in eachindex(x)
temp = x .> x[i]
output[i] = sum(temp)
end
return output
end
@code_warntype foo(x) # type unstable
x = rand(50); sx = SVector(x...)
function foo(x)
output = MVector{length(x), eltype(x)}(undef)
for i in eachindex(x)
temp = x .> x[i]
output[i] = sum(temp)
end
return output
end
@code_warntype foo(sx) # type stable
x = rand(50)
function foo(x, ::Val{N}) where N
sx = SVector{N, eltype(x)}(x)
output = MVector{N, eltype(x)}(undef)
for i in eachindex(x)
temp = x .> x[i]
output[i] = sum(temp)
end
return output
end
@code_warntype foo(x, Val(length(x))) # type stable