Home
Chapters

Generic expressions are represented by <something> (e.g., <function> or <operator>). This is just notation, and the symbols < and > should not be misconstrued as Julia's syntax.

Action	Keyboard Shortcut
Previous Section	`Ctrl + 🠘`
Next Section	`Ctrl + 🠚`
List of Sections	`Ctrl + z`
List of Subsections	`Ctrl + x`
Close Any Popped Up Window (like this one)	`Esc`
Open All Codes and Outputs in a Post	`Alt + 🠛`
Close All Codes and Outputs in a Post	`Alt + 🠙`

When benchmarking, the equivalence of time measures is as follows.

Unit	Acronym	Measure in Seconds
Seconds	s	1
Milliseconds	ms	10^-3
Microseconds	μs	10^-6
Nanoseconds	ns	10^-9

Links
Dark Mode (Experimental)

Personal

Website

9d. Reductions

Martin Alfaro

PhD in Economics

PART II: HIGH PERFORMANCE
7. Introduction to Performance

a. Overview and Goals

b. When To Optimize Code?

c. Benchmarking Execution Time

d. Preliminaries on Types

e. Functions: Type Inference and Multiple Dispatch

f. Preliminaries on Memory Allocations

8. Type Stability

a. Overview and Goals

b. Defining Type Stability

c. Type Stability with Scalars and Vectors

d. Type Stability with Global Variables

e. Barrier Functions

f. Type Stability with Tuples

g. Type Stability with Higher-Order Functions

h. Gotchas for Type Stability

9. Reducing Memory Allocations

a. Objects Allocating Memory

b. Slice Views to Reduce Allocations

c. Pre-Allocations

d. Reductions

e. Tuples and Static Vectors for Small Collections

f. Lazy Operations

g. Lazy Broadcasting and Loop Fusion

Remark

The scripts of this section are available here under the name allCode.jl. They've been tested under Julia 1.10.4.

Introduction

Reductions are a powerful technique for operations that take collections as inputs and return a single element. They arise naturally when we need to compute summary statistics such as the average, variance, or maximum of a collection.

The reduction process works by iteratively applying an operation to pairs of elements, accumulating the results at each step until the final output is obtained. For example, to compute sum(x) for a vector x, a reduction would start by adding the first two elements, then add the third element to that running total, and continue this cumulative process until all elements have been summed.

The method is particularly convenient when we need to transform a vector's elements prior to computing a summary statistic. By operating on scalars, reductions sidestep the need of materializing intermediate outputs, thereby reducing memory allocations. For example, to compute the sum of the log-transformed values in x, a reduction would avoid the creation of the intermediate vector log.(x) for computing sum(log.(x)).

Intuition about Reductions

A reduction is a process that applies an operator or function to pairs of elements in collection, ultimately yielding a single output value. Its implementation relies on the use of a for-loop, which updates the result in each iteration.

A classic example of reduction is the addition of all elements in a vector. This applies + to pairs of elements, iteratively updating the accumulated sum. The methodology is demonstrated below.

x = rand(100)

foo(x) = sum(x)

Output in REPL

julia>

foo(x)

48.447

x = rand(100)

function foo(x)
    output = 0.

    for i in eachindex(x)
        output = output + x[i]
    end

    return output
end

Output in REPL

julia>

foo(x)

48.447

x = rand(100)

function foo(x)
    output = 0.
    
    for i in eachindex(x)
        output += x[i]
    end

    return output
end

Output in REPL

julia>

foo(x)

48.447

The last tab makes use of an updating operator, which offers a concise syntax <operator>=. When this operator is applied, the variable on the left-hand side is employed as the first operand. For instance, x += a is equivalent to x = x + a. Other updating operators are -=, and *=.

A More Formal Analysis

Reductions rely on either a binary operator or a two-argument function. An example of the former is +, which we used above. As for functions, max(a,b), which returns the maximum of the scalars a and b, can be used for computing the maximum of a collection x.

A reduction operation can only be applied if the binary operation satisfies two mathematical requirements:

Associativity: the way in which operations are grouped does not change the result. For example, addition is associative because (a + b) + c = a + (b + c).
Existence of an identity element: there exists an element that leaves any other element unchanged when the binary operation is applied. For example, the identity element of addition is 0 because a + 0 = a.

For operations accepting reductions, their identity elements are the following.

Operation	Identity Element
Sum	0
Product	1
Maximum	-Inf
Minimum	Inf

The identity element constitutes the initial value that we provide to start the iteration. Below, we illustrate how the operations have to be implemented for sum and max.

REDUCTION 1: sum of [1,2,3,4]

REDUCTION 2: maximum of [1,4,2,8]

Considering these differences in the identity elements, below we indicate how to implement the reductions for each operation. The function foo1 indicates the desired outcome, while foo2 provides the same outcome via a reduction.

x = rand(100)

foo1(x) = sum(x)

function foo2(x)
    output = 0.

    for i in eachindex(x)
        output += x[i]
    end

    return output
end

x = rand(100)

foo1(x) = prod(x)

function foo2(x)
    output = 1.

    for i in eachindex(x)
        output *= x[i]
    end

    return output
end

x = rand(100)

foo1(x) = maximum(x)

function foo2(x)
    output = -Inf

    for i in eachindex(x)
        output = max(output, x[i])
    end

    return output
end

x = rand(100)

foo1(x) = minimum(x)

function foo2(x)
    output = Inf
    
    for i in eachindex(x)
        output = min(output, x[i])
    end

    return output
end

Avoiding Memory Allocations through Reductions

To illustrate the application through a concrete example, consider the operation sum(log.(x)) for a vector x. This operation transforms x via log.(x) and then sums the transformed elements. By default, broadcasting materializes the resulting vector, therefore allocating memory for log.(x). However, we're not interested in this result itself: log.(x) is a temporary vector that will be eventually "reduced" to a scalar. [note] In the section Lazy Operations, we'll explore an alternative, where broadcast doesn't materialize intermediate results. Taking this into account, we should implement an approach that bypasses this unnecessary allocation. Reductions do so by defining a scalar output, which we iteratively update by summing the transformed values of x.

Below, we demonstrate the technique implementation for common reductions.

x = rand(100)

foo1(x) = sum(3 .* x)

function foo2(x)
    output = 0.

    for i in eachindex(x)
        output += 3 * x[i]
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  48.528 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  24.900 ns (0 allocations: 0 bytes)

x = rand(100)

foo1(x) = prod(3 .* x)

function foo2(x)
    output = 1.

    for i in eachindex(x)
        output *= 3 * x[i]
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  47.658 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  43.592 ns (0 allocations: 0 bytes)

x = rand(100)

foo1(x) = maximum(3 .* x)

function foo2(x)
    output = -Inf

    for i in eachindex(x)
        output = max(output, 3 * x[i])
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  196.296 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  171.915 ns (0 allocations: 0 bytes)

x = rand(100)

foo1(x) = minimum(3 .* x)

function foo2(x)
    output = Inf
    
    for i in eachindex(x)
        output = min(output, 3 * x[i])
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  194.364 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  165.600 ns (0 allocations: 0 bytes)

Reductions via Built-in Functions

The previous examples show that reductions require the explicit writing of a for-loop. Unfortunately, this can prove unwieldy to write and read. In light of this, Julia streamlines the process by adding methods to sum, prod, maximum, and minimum to implement reductions for you.

The syntax is foo(<transforming function>, x), where foo is one of the functions mentioned and x is the vector to be transformed. For instance, the following examples consider a transformation 3 .* x. [note] Julia provides the reduce and mapreduce functions for implementing additional types of reductions. mapreduce is the most general between both, as it allows for transformations of the vector by applying the function map that we studied here. However, the built-in implementations for specific functions for sum, prod, maximum, and minimum are more efficient, as they're customized to those particular functions.

x = rand(100)

foo(x) = sum(a -> 3 * a, x)

Output in REPL

julia>

@btime foo($x)

  7.000 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = prod(a -> 3 * a, x)

Output in REPL

julia>

@btime foo($x)

  7.207 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = maximum(a -> 3 * a, x)

Output in REPL

julia>

@btime foo($x)

  174.550 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = minimum(a -> 3 * a, x)

Output in REPL

julia>

@btime foo($x)

  169.035 ns (0 allocations: 0 bytes)

Reductions can also be performed when the transforming function requires multiple arguments. To do this, you should enclose the multiple variables using zip, and referring to each variable through indexes. We illustrate this below, where the transforming function is x .* y.

x = rand(100); y = rand(100)

foo(x,y) = sum(a -> a[1] * a[2], zip(x,y))

Output in REPL

julia>

@btime foo($x, $y)

  29.276 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = prod(a -> a[1] * a[2], zip(x,y))

Output in REPL

julia>

@btime foo($x, $y)

  48.227 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = maximum(a -> a[1] * a[2], zip(x,y))

Output in REPL

julia>

@btime foo($x, $y)

  171.721 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = minimum(a -> a[1] * a[2], zip(x,y))

Output in REPL

julia>

@btime foo($x, $y)

  166.937 ns (0 allocations: 0 bytes)

The "reduce" and "mapreduce" Functions

Julia provides the function reduce and mapreduce, which provide a generic interface for reduction operations. Their difference lies in that reduce applies the reduction directly, while mapreduce transforms the collection's elements prior to doing it.

It's worth remarking that you should still rely on the dedicated functions sum, prod, max, and min for such operations, as they've been optimized for their respective tasks. In this context, our primary use case of reduce and mapreduce is when working with packages that provide their own versions. These packages could implement reductions by applying a different approach or algorithm. For instance, the package Folds provides a parallelized version of both map and mapreduce, enabling the utilization of all available CPU cores. Its syntax is identical to Julia's built-in functions.

Reductions

Let's first consider reduce, which performs a reduction without transforming the collection. The syntax for the reduction is reduce(<function>,x), where <function> is a two-argument function. The following example demonstrates its usage.

x = rand(100)

foo(x) = reduce(+, x)

Output in REPL

julia>

@btime foo($x)

  6.600 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = reduce(*, x)

Output in REPL

julia>

@btime foo($x)

  6.700 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = reduce(max, x)

Output in REPL

julia>

@btime foo($x)

  167.733 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = reduce(min, x)

Output in REPL

julia>

@btime foo($x)

  167.978 ns (0 allocations: 0 bytes)

Map Reductions

Julia additionally provides the mapreduce function, which combines the functionality of map and reduce. As we discussed here, recall that map(foo,x) transforms each element of the collection x by applying foo element-wise. Consequently, mapreduce(<transformation>,<reduction>,x) first transforms x's elements through map, and then applies a reduction to the resulting output.

Note that reduce can be considered as a special case of mapreduce, where the latter transforms x through the identity function, identity(x). Likewise, mapreduce(<transformation>,<operator>,x) produces the same result as reduce(<operator>,map(<transformation>,x)). However, mapreduce is more efficient, since it avoids the allocation of the intermediate transformed vector. This is demonstrated below, where we consider a transformation 3 .* x eventually reduced through sum. In other words, the output is equivalent to sum(3 .* x).

x = rand(100)

foo(x) = mapreduce(a -> 3 * a, +, x)

Output in REPL

julia>

@btime foo($x)

  6.700 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = reduce(+, map(a -> 3 * a, x))

Output in REPL

julia>

@btime foo($x)

  54.684 ns (1 allocation: 896 bytes)

NOTATION

PAGE LAYOUT

LINKS TO SECTIONS

KEYBOARD SHORTCUTS

TIME MEASUREMENT

Dark Mode (Experimental)

PART II: HIGH PERFORMANCE

Introduction

Intuition about Reductions

A More Formal Analysis

REDUCTION 1: sum of [1,2,3,4]

REDUCTION 2: maximum of [1,4,2,8]

Avoiding Memory Allocations through Reductions

Reductions via Built-in Functions

The "reduce" and "mapreduce" Functions

Reductions

Map Reductions