Home
Chapters

Generic expressions are represented by <something> (e.g., <function> or <operator>). This is just notation, and the symbols < and > should not be misconstrued as Julia's syntax.

Action	Keyboard Shortcut
Previous Section	`Ctrl + 🠘`
Next Section	`Ctrl + 🠚`
List of Subsections	`Ctrl + x`
Search	`Ctrl + Shift + F`
Open All Codes and Outputs in a Post	`Alt + 🠛`
Close All Codes and Outputs in a Post	`Alt + 🠙`

When benchmarking, the equivalence of time measures is as follows.

Unit	Acronym	Measure in Seconds
Seconds	s	1
Milliseconds	ms	10^-3
Microseconds	μs	10^-6
Nanoseconds	ns	10^-9

Links
BOOK in PDF
Dark Mode

Personal

Website

9h. Mutations and Pre-Allocations

Martin Alfaro

PhD in Economics

Code Script

This section's scripts are available here, under the name allCode.jl. They've been tested under Julia 1.11.9.

Introduction

Certain operations involving for-loops may trigger the creation of new vectors during each iteration, leading to repeated memory allocation. In many cases, this dynamic allocation is unnecessary. Particularly, when the vectors only store temporary intermediate results that are discarded after the current iteration.

A comparable issue arises when the same operations must be executed repeatedly with varying inputs, as in simulations or Monte Carlo experiments. If only the final output is required, generating new intermediate vectors for every run results in avoidable allocations and wasted resources.

In these situations, performance can be improved through pre-allocation. This technique consists of initializing a vector before execution begins, then reusing it throughout each computation. By allocating memory upfront and updating the vector in place, the program avoids the repeated overhead of memory allocation.

Remarkably, the technique isn't exclusive to Julia, but rather represents an optimization strategy applicable across programming languages. Its effectiveness ultimately stems from favoring mutations over the creation of new objects, thus minimizing allocations on the heap.

Our presentation will include a review of methods for initializing vectors. This is a prerequisite for implementing a pre-allocation strategy. We then present some scenarios where pre-allocation proves advantageous, with special emphasis on its advantages within nested for-loops.

The Advantages of Mutating Vectors

The benefits of pre-allocation stem from the broader advantages of in-place operations. This strategy encourages reusing existing vectors rather than creating new objects. Because this principle is valuable beyond pre-allocation, we begin by illustrating the performance gains that come from reusing the same memory address.

To make this concrete, consider the operation of sorting a vector x. In Julia, many built-in functions have an in-place function counterpart. Sorting is one such case: sort(x) allocates a new sorted vector, while sort!(x) rearranges the elements of x in place. When preserving the original ordering isn’t necessary, creating a separate copy provides no benefit. In those cases, the mutating version sort!(x) is preferable, as it avoids unnecessary allocations by reusing the memory associated with x. The gains from this approach are presented below.

x = rand(1_000_000)

Output in REPL

julia>

@btime sort($x)

  43.514 ms (9 allocations: 15.267 MiB)

x = rand(1_000_000)

Output in REPL

julia>

@btime sort!($x)

  1.750 ms (0 allocations: 0 bytes)

As we can see, the differences in runtime are substantial. If the same operation were executed repeatedly inside a for-loop, the performance gaps would become even more pronounced.

Pre-allocation builds on the same underlying idea, but applies to situations where a vector like x doesn’t yet exist. Considering this, we begin with a review of how to initialize vectors efficiently.

Initializing Vectors

Remark

The review of methods for vector initialization will be relatively brief and centered on performance considerations. For a more detailed coverage, see the section about vector initialization.

Vector initialization refers to the process of creating a vector that subsequently will be filled with values. The process typically involves two steps: reserving space in memory and populating that space with some initial values. An efficient approach to initializing a vector involves performing only the first step, keeping whatever content is in the memory address. These values held in memory are referred to as undef (undefined). Although Julia will display them as numerical values, they're essentially arbitrary and meaningless.

There are two methods for initializing a vector with undef values. The first one is through a constructor, requiring the specification of the length and element types. The second one is based on the function similar, which creates a vector with the same type and dimension as another existing vector. This approach is particularly useful when the output sought matches the structure of an input.

Below, we compare the performance of several approaches to initializing vectors. In particular, we establish that working with undef values is faster than populating vectors with specific values. To starkly show the differences in execution time, we repeat the process of vector creation 100,000 times.

x           = collect(1:100)
repetitions = 100_000                       # repetitions in a for-loop

function foo(x, repetitions)
    for _ in 1:repetitions
        Vector{Int64}(undef, length(x))
    end
end

Output in REPL

julia>

@btime foo($x, $repetitions)

  3.455 ms (100000 allocations: 85.449 MiB)

x           = collect(1:100)
repetitions = 100_000                       # repetitions in a for-loop

function foo(x, repetitions)
    for _ in 1:repetitions
        similar(x)
    end
end

Output in REPL

julia>

@btime foo($x, $repetitions)

  2.977 ms (100000 allocations: 85.449 MiB)

x           = collect(1:100)
repetitions = 100_000                       # repetitions in a for-loop

function foo(x, repetitions)
    for _ in 1:repetitions
        zeros(Int64, length(x))
    end
end

Output in REPL

julia>

@btime foo($x, $repetitions)

  10.797 ms (100000 allocations: 85.449 MiB)

x           = collect(1:100)
repetitions = 100_000                       # repetitions in a for-loop

function foo(x, repetitions)
    for _ in 1:repetitions
        ones(Int64, length(x))
    end
end

Output in REPL

julia>

@btime foo($x, $repetitions)

  11.003 ms (100000 allocations: 85.449 MiB)

x           = collect(1:100)
repetitions = 100_000                       # repetitions in a for-loop

function foo(x, repetitions)
    for _ in 1:repetitions
        fill(2, length(x))                  # vector filled with integer 2
    end
end

Output in REPL

julia>

@btime foo($x, $repetitions)

  10.170 ms (100000 allocations: 85.449 MiB)

Remark

Recall that _ isn't a keyword, but rather a convention widely adopted by programmers to denote dummy variables. These are variables included solely to meet syntactical requirements, but are never actually used or referenced within the code. In our example, the inclusion of _ is because for-loops must always include a variable to iterate over. While any other symbol could be used, _ signals programmers that its value can be safely ignored.

Initializing Vectors in Functions

We can initialize a vector by passing it to the function as a keyword argument. This strategy even enables the use of similar(x), where x is a previous function argument. Considering this, the following two implementations are equivalent.

function foo(x)
   output = similar(x) 
   # <some calculations using 'output'>
end

function foo(x; output = similar(x))

   # <some calculations using 'output'>
end

When multiple variables of the same type need to be initialized, array comprehensions offer a concise solution. A more efficient alternative is based on the so-called generators, which are the lazy counterpart of array comprehensions. This reduces memory allocations by initializing and assigning the vectors element-wise. The behavior differs from array comprehensions, which first create a vector that holds all the initialized vectors.

Below we illustrate these strategies. Although the example is based on similar(x), note that the same principle applies a constructor like Vector{Float64}(undef, length(x)).

x = [1,2,3]

function foo(x)
    a,b,c = [similar(x) for _ in 1:3]
    # <some calculations using a,b,c>
end

Output in REPL

julia>

@btime foo($x)

  101.973 ns (8 allocations: 320 bytes)

x = [1,2,3]

function foo(x)
    a,b,c = (similar(x) for _ in 1:3)
    # <some calculations using a,b,c>
end

Output in REPL

julia>

@btime foo($x)

  40.751 ns (3 allocations: 144 bytes)

Describing the Pre-Allocation Technique

To describe how pre-allocation works, we'll consider a scenario where the strategy proves to be advantageous. The setting involves a nested for-loop, in which the output of a for-loop serves as an intermediate input for another for-loop. The examples are based not only on explicit for-loops, but also on operations that are internally computed via a for-loop (e.g., broadcasting).

Let's first describe the inner for-loop that will be eventually embedded in an outer for-loop. Suppose we're assessing a worker's performance over a 30-day period, with daily scores recorded on a scale from 0 to 1. Defining successful performance as any score above 0.5, the following code snippets generate vectors indicating the days on which the goal was achieved.

nr_days            = 30
score              = rand(nr_days)

performance(score) = score .> 0.5

Output in REPL

julia>

@btime performance($score)

  49.875 ns (3 allocations: 96 bytes)

nr_days            = 30
score              = rand(nr_days)

function performance(score)
    target = similar(score)

    for i in eachindex(score)
        target[i] = score[i] > 0.5
    end

    return target
end

Output in REPL

julia>

@btime performance($score)

  45.300 ns (2 allocations: 304 bytes)

nr_days            = 30
score              = rand(nr_days)

function performance(score; target=similar(score))
    

    for i in eachindex(score)
        target[i] = score[i] > 0.5
    end

    return target
end

Output in REPL

julia>

@btime performance($score)

  44.777 ns (2 allocations: 304 bytes)

Consider now that the output of this operation (namely, the vector of days in which the target was met) represents an intermediate step in another for-loop. For instance, suppose we have multiple workers, each with recorded performance scores. The goal is to summarize each worker's information through a summary statistic. In particular, the ratio of the standard deviation to the mean, which expresses variability in units of the average.

In practice, this statistic requires computing both the mean and the standard deviation. Below, we show that computing each statistic via reductions isn't efficient. The reason is that, while reductions avoid memory allocations, they require computing the days on which the target was met twice. Instead, it's more efficient to compute and store this vector beforehand, which we then reuse as an input for computing the mean and the standard deviation.

The result highlights a more general principle: when the same intermediate input is required for multiple outputs, the cost of allocating memory for the intermediate vector tends to be lower than its repeated computation. In the example, the reduction is implemented via lazy broadcasting, which internally relies on reduction techniques. The rest implement the result via explicit or implicit for-loops, with the intermediate input stored in a vector.

nr_days            = 30
scores             = [rand(nr_days), rand(nr_days), rand(nr_days)]  # 3 workers



function repeated_call(scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)

      stats[col] = std(@~ scores[col] .> 0.5) / mean(@~ scores[col] .> 0.5)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call($scores)

  685.195 ns (2 allocations: 80 bytes)

nr_days            = 30
scores             = [rand(nr_days), rand(nr_days), rand(nr_days)]   # 3 workers

performance(score) = score .> 0.5

function repeated_call(scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
      target     = performance(scores[col])
      stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call($scores)

  495.696 ns (11 allocations: 368 bytes)

nr_days            = 30
scores             = [rand(nr_days), rand(nr_days), rand(nr_days)]   # 3 workers

function performance(score)
    target = similar(score)

    for i in eachindex(score)
        target[i] = score[i] > 0.5
    end

    return target
end

function repeated_call(scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
      target     = performance(scores[col])
      stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call($scores)

  362.846 ns (8 allocations: 992 bytes)

nr_days            = 30
scores             = [rand(nr_days), rand(nr_days), rand(nr_days)]   # 3 workers

function performance(score; target = similar(score))
    
    
    for i in eachindex(score)
        target[i] = score[i] > 0.5
    end

    return target
end

function repeated_call(scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
      target     = performance(scores[col])
      stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call($scores)

  376.145 ns (8 allocations: 992 bytes)

Once we establish that storing the intermediate vector is more performant, our approach will necessarily involve memory allocations. The methods described above, however, allocate more than required: during each iteration, a new vector is created for each worker to store the days in which the target was met. This intermediate vector doesn't need to be preserved for future use. Indeed, because it's stored in a local variable, it'll be discarded once the iteration concludes.

Such cases are strong candidates for pre-allocating intermediate results. By defining a single vector target, we can reuse it across iterations to compute the days when the target was met for each worker. This strategy incurs the overhead of creating the vector only once, after which it's reused for every worker.

The implementation requires defining the vector target before the execution of the outer for-loop. During each iteration, we then mutate target using an in-place function, which we call performance!. This function can update the contents either with a standard for-loop or by applying the broadcasting operator .=.

nr_days = 30
scores  = [rand(nr_days), rand(nr_days), rand(nr_days)]


performance(score) = score .> 0.5

function repeated_call(scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
        target     = performance(scores[col])
        stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call($scores)

  524.400 ns (11 allocations: 368 bytes)

nr_days = 30
scores  = [rand(nr_days), rand(nr_days), rand(nr_days)]
target  = similar(scores[1])

performance!(target, score) = (@. target = score > 0.5)

function repeated_call!(target, scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
        performance!(target, scores[col])
        stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call!($target, $scores)

  358.638 ns (2 allocations: 80 bytes)

nr_days = 30
scores  = [rand(nr_days), rand(nr_days), rand(nr_days)]
target  = similar(scores[1])

function performance!(target, score)
    for i in eachindex(score)
        target[i] = score[i] > 0.5
    end
end

function repeated_call!(target, scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
        performance!(target, scores[col])
        stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call!($target, $scores)

  339.744 ns (2 allocations: 80 bytes)

Warning! - Use of @. to update values

When your goal is to update the values of a vector, recall that @. has to be placed at the beginning of the statement.

# the following are equivalent and define a new variable
   output  = @. 2  * x
   output  =    2 .* x

# the following are equivalent and update 'output'
@. output  = 2  * x
   output .= 2 .* x

Compared to a for-loop, the method using .= provides a simpler syntax. This is why, when the function performance! is simple enough as in our example, it's common to avoid its definition and directly insert it into the inner for-loop. The possibility also enables implementing the update via a built-in in-place function. For instance, the function map! can be used with this purpose.

nr_days = 30
scores  = [rand(nr_days), rand(nr_days), rand(nr_days)]


function repeated_call(scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
        target     = @. score > 0.5
        stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call($scores)

  822.471 ns (23 allocations: 608 bytes)

nr_days = 30
scores  = [rand(nr_days), rand(nr_days), rand(nr_days)]
target  = similar(scores[1])

function repeated_call!(target, scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
        @. target  = scores[col] > 0.5
        stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call!($target, $scores)

  336.372 ns (2 allocations: 80 bytes)

nr_days = 30
scores  = [rand(nr_days), rand(nr_days), rand(nr_days)]
target  = similar(scores[1])

function repeated_call!(target, scores)
    stats = Vector{Float64}(undef, length(scores))

    for col in eachindex(scores)
        map!(a -> a > 0.5, target, scores[col])
        stats[col] = std(target) / mean(target)
    end

    return stats
end

Output in REPL

julia>

@btime repeated_call!($target, $scores)

  329.814 ns (2 allocations: 80 bytes)

NOTATION

PAGE LAYOUT

LINKS TO SECTIONS

KEYBOARD SHORTCUTS

TIME MEASUREMENT

Dark Mode

Search

Introduction

The Advantages of Mutating Vectors

Initializing Vectors

Initializing Vectors in Functions

Describing the Pre-Allocation Technique