pic
Personal
Website

9e. Tuples and Static Vectors for Small Collections

PhD in Economics
Remark
The scripts of this section are available here under the name allCode.jl. They've been tested under Julia 1.10.4.

Introduction

Vector creation can be a performance bottleneck, given the memory-allocation overhead associated with it. The issue arises not only when defining new vectors, but also when we reference a slice like x[1:2], or computing intermediate results on the fly like x .* y in sum(x .* y).

This section introduces one way to address this limitation of the built-in vectors. It's based on the so-called static vectors, which are made available by the StaticArrays package. Unlike the built-in vectors, which are allocated on the heap, static vectors are allocated on the stack and are essentially tuples under the hood. This feature determines that they're only suitable for collections comprising a few elements. As a rule of thumb, consider using static vectors for up to 75 elements. Beyond this threshold, the overhead associated with their creation and access will outweigh any potential performance gain, or directly result in a fatal error. [note] My recommended number of elements is actually lower than the documentation's suggested (100 elements). The reason for this discrepancy is that as you approach the upper limit, the performance benefits of static vectors compared to regular vectors decreases significantly. As a result, you may find yourself spending more time benchmarking performance than reaping any speed advantages.

Despite being built on top of tuples, static vectors offer various additional advantages. Firstly, they maintain their performance benefits even with larger collections. Secondly, static vectors are more convenient to work with, as they can be manipulated as regular vectors. In fact, they support other array types like matrices, explaining the package's name. Finally, StaticArrays provides mutable variants, making them less rigid than tuples, which only exist as immutable objects. In this respect, it's worth indicating that, while the mutable version provides performance benefits relative to regular vectors, the immutable option still offers the best performance.

Warning!
To avoid repetition, the entire section assumes all collections contain a few elements. Considering this, any benefit of static vectors that is highlighted is contingent upon this assumption. We'll also suppose that the StaticArrays package is already available in the workspace, which demands including the line using StaticArrays at the beginning of each session.

Creating Static Vectors

The package StaticArrays include several variants of static vectors. Our focus we'll be mainly on the type SVector, which we'll refer to as SVectors.

There are several approaches to creating an SVector, each serving a distinct purpose. We present them below. The first one creates an SVector through literal values, while the other option included converts a standard vector into an SVector.

# all 'sx' define the same static vector '[3,4,5]'


sx = SVector(3,4,5)
sx = SVector{3, Int64}(3,4,5)
sx = SA[3,4,5]
sx = @SVector [i for i in 3:5]
Output in REPL
julia>
sx
3-element SVector{3, Int64} with indices SOneTo(3):
 3
 4
 5
# all 'sx' define a static vector with same elements as 'x'
x = collect(1:10)

sx = SVector(x...)
sx = SVector{length(x), eltype(x)}(x)
sx = SA[x...]
sx = @SVector [a for a in x]
Output in REPL
julia>
sx
10-element SVector{10, Int64} with indices SOneTo(10):
  1
  2
  3
  â‹®
  9
 10

Of these approaches, we'll mainly rely on the function SVector, occasionally employing SA for indexing. [note] The approach based on the macro @SVector requires some caveats. For instance, it doesn't support specifications based on local variables. This precludes the use of eachindex(x), unless x is a global variable. Note the use of the splatting operator ... to turn a regular vector into an SVector. The operator ... is necessary for the function SVector to transform a collection into a sequence of arguments. [note] For instance, foo(x...) is equivalent to foo(x[1], x[2], x[3]) given a vector or tuple x with 3 elements.

It's possible to define slices of SVectors, although we must be mindful of how indexing works: a slice remains an SVector when indices are specified as SVectors, whereas the slice becomes an ordinary vector for indices given by ranges or regular vectors. The sole exception to this rule is when the slice references the whole object (i.e., sx[:]), in which case an SVector is returned.

x = collect(3:10) ; sx = SVector(x...)

# both define static vectors
slice1 = sx[:]
slice2 = sx[SA[1,2]]        # or slice2 = sx[SVector(1,2)]
Output in REPL
julia>
slice1
8-element SVector{8, Int64} with indices SOneTo(8):
  3
  4
  â‹®
  9
 10

julia>
slice2
2-element SVector{2, Int64} with indices SOneTo(2):
 3
 4
x = collect(3:10) ; sx = SVector(x...)

# both define and ordinary vector
slice2 = sx[1:2]
slice2 = sx[[1,2]]
Output in REPL
julia>
slice
2-element Vector{Int64}:
 3
 4

SVectors Don't Allocate and Are Faster

The objects defined in StaticArrays are built on top of tuples when possible. This entails that SVectors typically don't allocate memory.

x = rand(10)

function foo(x)
    a = x[1:2]              # 1 allocation (copy of slice)
    b = [3,4]               # 1 allocation (vector creation)

    sum(a) * sum(b)         # 0 allocation (scalars don't allocate)
end
Output in REPL
julia>
@btime foo($x)
  29.819 ns (2 allocations: 160 bytes)
x = rand(10)

@views function foo(x)
    a = x[1:2]              # 0 allocation (view of slice)
    b = [3,4]               # 1 allocation (vector creation) 

    sum(a) * sum(b)         # 0 allocation (scalars don't allocate)
end
Output in REPL
julia>
@btime foo($x)
  15.015 ns (1 allocation: 80 bytes)
x = rand(10);   tup = Tuple(x)

function foo(x)
    a = x[1:2]              # 0 allocation (slice of tuple)
    b = (3,4)               # 0 allocation (tuple creation)

    sum(a) * sum(b)         # 0 allocation (scalars don't allocate)
end
Output in REPL
julia>
@btime foo($tup)
  1.400 ns (0 allocations: 0 bytes)
x = rand(10);   sx = SA[x...]

function foo(x)
    a = x[SA[1,2]]          # 0 allocation (slice of static array)
    b = SA[3,4]             # 0 allocation (static array creation)

    sum(a) * sum(b)         # 0 allocation (scalars don't allocate)
end
Output in REPL
julia>
@btime foo($sx)
  1.600 ns (0 allocations: 0 bytes)

The impact of SVectors on memory allocations is especially relevant for operations creating temporary vectors, such as broadcasting.

x = rand(10)

foo(x) = sum(2 .* x)
Output in REPL
julia>
@btime foo($x)
  17.936 ns (1 allocation: 144 bytes)
x = rand(10);   sx = SVector(x...)

foo(x) = sum(2 .* x)
Output in REPL
julia>
@btime foo($sx)
  1.800 ns (0 allocations: 0 bytes)

Interestingly, even if an operation doesn't allocate memory under regular vectors, SVectors still provide performance benefits. To illustrate this, consider the function sum(f, <vector>), which sums the elements of <vector> after they're transformed via f. The implementation below shows that the operation is faster with SVectors, despite not allocating memory with regular vectors.

x  = rand(10)

foo(x) = sum(a -> 10 + 2a +  3a^2, x)
Output in REPL
julia>
@btime foo($x)
  4.400 ns (0 allocations: 0 bytes)
x  = rand(10);  sx = SVector(x...);

foo(x) = sum(a -> 10 + 2a +  3a^2, x)
Output in REPL
julia>
@btime foo($sx)
  2.900 ns (0 allocations: 0 bytes)

SVector Type and Its Mutable Variant

In Julia, SVectors are formally defined as objects with type SVector{N,T}, where N specifies the number of elements and T denotes the element's type. For instance, SVector(4,5,6) has type SVector{3,Int64}, indicating that it comprises 3 elements with type Int64. This implies that the number of elements is part of the SVector type, a feature also present in tuples.

Similar to tuples, the type SVector also defines an immutable object, meaning that its elements can't be added, removed, or modified. In case mutable collections are needed, the package StaticArrays provides the type MVector. MVectors and their slices share the same syntax as SVectors, with the only difference that the function Svector is substituted with MVector. This is illustrated in the following.

x  = [1,2,3]
sx = SVector(x...)

sx[1] = 0

Output in REPL
ERROR: setindex!(::SVector{3, Int64}, value, ::Int) is not defined
x  = [1,2,3]
mx = MVector(x...)

mx[1] = 0
Output in REPL
julia>
mx
3-element MVector{3, Int64} with indices SOneTo(3):
 0
 2
 3

MVectors offer performance benefits over regular vectors, although they're equal to or slower than SVectors. Additionally, MVectors may result in memory allocations. Considering this, if you're certain that the collection won't be mutated, it's recommended to use SVectors—due to their immutable nature, SVectors will perform equally or better than MVectors, but never worse.

To demonstrate this, we present an example where SVectors and MVectors have the same performance and allocations, and another where they differ. The examples also reveal that, regardless of whether the Svector performs better than an MVector, both outperform regular vectors.

x  = rand(10)
sx = SVector(x...);  mx = MVector(x...)

foo(x) = sum(a -> 10 + 2a +  3a^2, x)
Output in REPL
julia>
@btime foo($x)
  4.400 ns (0 allocations: 0 bytes)

julia>
@btime foo($sx)
  2.800 ns (0 allocations: 0 bytes)

julia>
@btime foo($mx)
  2.900 ns (0 allocations: 0 bytes)
x  = rand(10)
sx = SVector(x...);  mx = MVector(x...)

foo(x) = 10 + 2x +  3x^2
Output in REPL
julia>
@btime foo.($x)
  19.739 ns (1 allocation: 144 bytes)

julia>
@btime foo.($sx)
  1.600 ns (0 allocations: 0 bytes)

julia>
@btime foo.($mx)
  6.600 ns (1 allocation: 96 bytes)

The mutability of MVectors makes them ideal for initializing a vector that will eventually be filled via a for-loop. In fact, executing similar(sx) when sx is an SVector automatically returns an MVector.

'similar' for SVectors
sx = SVector(1,2,3)

mx = similar(sx)        # it defines an MVector with undef elements
Output in REPL
3-element MVector{3, Int64} with indices SOneTo(3):
        2073873450800
 -1152921504606846976
                    0

Using Static Vectors

Next, we consider operations that pre-allocate intermediate outputs in relation to static vectors. The conclusions are twofold. Firstly, if the operation exclusively uses SVectors, there's no need to pre-allocate intermediate results. Secondly, storing the final output as an MVector can entail performance gains relative to a regular vector.

The example we use involves counting the elements in x that are greater than x[i], with the operation computed for each possible index i. Formally, this is captured by sum(x .> x[i]). [note] For simplicity, we use sum to avoid introducing new functions, although it'd be more natural to use count instead. . To make the comparison starker, the operation x .> x[i] and sum are written separately.

x = rand(50)

function foo(x; output = similar(x))
    for i in eachindex(x)
        temp      = x .> x[i]
        output[i] = sum(temp)
    end

    return output
end
Output in REPL
julia>
@btime foo($x)
  3.188 μs (101 allocations: 5.17 KiB)
x = rand(50)

function foo(x; output = similar(x), temp = similar(x))
    for i in eachindex(x)
        @. temp      = x > x[i]
           output[i] = sum(temp)
    end

    return output
end
Output in REPL
julia>
@btime foo($x)
  695.745 ns (2 allocations: 992 bytes)
x = rand(50);   sx = SVector(x...)

function foo(x; output = Vector{Float64}(undef, length(x)))
    for i in eachindex(x)
        temp      = x .> x[i]
        output[i] = sum(temp)
    end

    return output
end
Output in REPL
julia>
@btime foo($sx)
  183.661 ns (1 allocation: 496 bytes)
x = rand(50);   sx = SVector(x...)

function foo(x; output = similar(x))
    for i in eachindex(x)
        temp      = x .> x[i]
        output[i] = sum(temp)
    end

    return output
end
Output in REPL
julia>
@btime foo($sx)
  148.817 ns (1 allocation: 448 bytes)
x = rand(50);   sx = SVector(x...)

function foo(x; output = MVector{length(x),eltype(x)}(undef))
    for i in eachindex(x)
        temp      = x .> x[i]
        output[i] = sum(temp)
    end

    return output
end
Output in REPL
julia>
@btime foo($sx)
  148.975 ns (1 allocation: 448 bytes)

Note that every implementation requires pre-allocating the vector output, which leaves us with only one decision to make: whether to pre-allocate the temporary vector temp. This also explains why every implementation involves at least one memory allocation.

The "No-Preallocation" tab serves as a baseline for comparison. In contrast, the "Pre-allocating" tab reuses a regular vector to compute temp, while the "SVector" tab converts x to an SVector sx without pre-allocating temp. The benchmarks reveal that the SVector approach outperforms pre-allocation, thanks to the memory allocation avoidance and additional optimization opportunities provided by SVectors.

As for the last two tabs, they continue defining x as an SVector, but additionally treat output as an MVector. The last tab in particular does this by using similar(sx) to initialize output, whereas the other tab explicitly specifies an MVector. As the benchmarks demonstrate, integrating MVectors into the operation yields further performance gains.

The Number of Elements is Part of SVector's Type

SVectors are defined as objects with a type SVector{N,T}. Due to this, the type definition requires not only the type T of its elements, but also the number of elements N it comprises. This feature, which is shared with MVectors and inherited from tuples, can readily introduce type instabilities if not handled carefully.

A way to avoid this potential issue is by passing any SVector and MVector as a function argument. Alternatively, we can employ the Val technique discussed for tuples, which enables dispatch by the number of elements.

x = rand(50)

function foo(x)
    output = SVector{length(x), eltype(x)}(undef)
    output = MVector{length(x), eltype(x)}(undef)

    for i in eachindex(x)
        temp      = x .> x[i]
        output[i] = sum(temp)
    end

    return output
end

@code_warntype foo(x)                     # type unstable
x = rand(50);   sx = SVector(x...)

function foo(x)
    
    output = MVector{length(x), eltype(x)}(undef)
    
    for i in eachindex(x)
        temp      = x .> x[i]
        output[i] = sum(temp)
    end

    return output
end

@code_warntype foo(sx)                    # type stable
x = rand(50)

function foo(x, ::Val{N}) where N
    sx     = SVector{N, eltype(x)}(x)
    output = MVector{N, eltype(x)}(undef)

    for i in eachindex(x)
        temp      = x .> x[i]
        output[i] = sum(temp)
    end

    return output
end

@code_warntype foo(x, Val(length(x)))     # type stable