Home
Chapters

Generic expressions are represented by <something> (e.g., <function> or <operator>). This is just notation, and the symbols < and > should not be misconstrued as Julia's syntax.

Action	Keyboard Shortcut
Previous Section	`Ctrl + 🠘`
Next Section	`Ctrl + 🠚`
List of Sections	`Ctrl + z`
List of Subsections	`Ctrl + x`
Close Any Popped Up Window (like this one)	`Esc`
Open All Codes and Outputs in a Post	`Alt + 🠛`
Close All Codes and Outputs in a Post	`Alt + 🠙`

When benchmarking, the equivalence of time measures is as follows.

Unit	Acronym	Measure in Seconds
Seconds	s	1
Milliseconds	ms	10^-3
Microseconds	μs	10^-6
Nanoseconds	ns	10^-9

Links
Dark Mode

Personal

Website

9f. Reductions

Martin Alfaro

PhD in Economics

PART II: HIGH PERFORMANCE
7. Introduction to Performance

a. Overview and Goals

b. When To Optimize Code?

c. Benchmarking Execution Time

d. Preliminaries on Types

e. Functions: Type Inference and Multiple Dispatch

8. Type Stability

a. Overview and Goals

b. Defining Type Stability

c. Type Stability with Scalars and Vectors

d. Type Stability with Global Variables

e. Barrier Functions

f. Type Stability with Tuples

g. Type Stability with Higher-Order Functions

h. Gotchas for Type Stability

9. Reducing Memory Allocations

a. Overview and Goals

b. Stack vs Heap

c. Objects Allocating Memory

d. Slice Views to Reduce Allocations

e. Pre-Allocations

f. Reductions

g. Static Vectors for Small Collections

h. Lazy Operations

i. Lazy Broadcasting and Loop Fusion
10. Vectorization (SIMD)

a. Overview and Goals

b. Macros as a Means for Optimizations

c. Introduction to SIMD

d. SIMD: Independence of Iterations

e. SIMD: Unit Strides

f. SIMD: Branchless Code

g. Packages For SIMD

11. Multithreading

a. Overview and Goals

b. Introduction to Multithreading

c. Task-Based Parallelism: @spawn

d. Thread-Safe Operations

e. Parallel For-Loops: @threads

f. Applying Parallelization

g. Packages for Multithreading

Code Script

The scripts of this section are available here under the name allCode.jl. They've been tested under Julia 1.11.3.

Introduction

Reductions are a powerful technique for operations that take collections as inputs and return a single element. They arise naturally when we need to compute summary statistics such as an average, variance, or maximum of a collection.

The reduction process works by iteratively applying an operation to pairs of elements, accumulating the results at each step until the final output is obtained. For example, to compute sum(x) for a vector x, a reduction would start by adding the first two elements, then add the third element to the total, and continue this cumulative process until all elements have been summed.

The method is particularly convenient when we need to transform a vector's elements prior to computing a summary statistic. By operating on scalars, reductions sidestep the need of materializing intermediate outputs, thereby reducing memory allocations. This means that, if for example you have to compute sum(log.(x)), a reduction would avoid the creation of the intermediate vector log.(x).

Intuition

Reductions are typically implemented using a for-loop, with an operator applied to pairs of elements and the resulting output updated in each iteration. A classic example is the summation of all numeric elements in a vector. This involves applying the addition operator + to pairs of elements, iteratively updating the accumulated sum. This is demonstrated below.

x = rand(100)

foo(x) = sum(x)

Output in REPL

julia>

foo(x)

48.447

x = rand(100)

function foo(x)
    output = 0.

    for i in eachindex(x)
        output = output + x[i]
    end

    return output
end

Output in REPL

julia>

foo(x)

48.447

x = rand(100)

function foo(x)
    output = 0.
    
    for i in eachindex(x)
        output += x[i]
    end

    return output
end

Output in REPL

julia>

foo(x)

48.447

In reductions, it's common to see implementations like the last tab, which are based on update operators. This entails that an expression like x += a is equivalent to x = x + a.

Implementing Reductions

Reductions are implemented by applying either a binary operator or a two-argument function during each iteration. An example of a binary operator is +, as we used above. However, we could have also used + as a two-argument function, replacing output = output + x[i] with output = +(output, x[i]). The possibility of using functions broadens the scope of reductions. For instance, it allows us to compute the maximum value of a vector x by the function max, where max(a,b) returns the maximum of the scalars a and b.

Formally, a reduction requires the binary operation to satisfy two mathematical requirements:

Associativity: the way in which operations are grouped does not change the result. For example, addition is associative because (a + b) + c = a + (b + c).
Existence of an identity element: there exists an element that leaves any other element unchanged when the binary operation is applied. For example, the identity element of addition is 0 because a + 0 = a.

The following list indicates the identity elements of each operation.

Operation	Identity Element
Sum	0
Product	1
Maximum	-Inf
Minimum	Inf

The relevance of identity elements lies in that they constitute the initial values of the iterative process. Based on these identity elements, we next implement reductions for several operations. The examples make use of the function foo1 to show the desired outcome, while foo2 provides the same output via a reduction.

x = rand(100)

foo1(x) = sum(x)

function foo2(x)
    output = 0.

    for i in eachindex(x)
        output += x[i]
    end

    return output
end

x = rand(100)

foo1(x) = prod(x)

function foo2(x)
    output = 1.

    for i in eachindex(x)
        output *= x[i]
    end

    return output
end

x = rand(100)

foo1(x) = maximum(x)

function foo2(x)
    output = -Inf

    for i in eachindex(x)
        output = max(output, x[i])
    end

    return output
end

x = rand(100)

foo1(x) = minimum(x)

function foo2(x)
    output = Inf
    
    for i in eachindex(x)
        output = min(output, x[i])
    end

    return output
end

We can also visually illustrate how reductions operate in these examples, as we do below.

REDUCTION 1: sum of [1,2,3,4]

REDUCTION 2: maximum of [1,4,2,8]

Avoiding Memory Allocations via Reductions

One of the primary advantages of reductions is that they avoid the memory allocation of intermediate results. To illustrate this, consider the operation sum(log.(x)) for a vector x. This operation involves transforming x into log.(x) and then summing the transformed elements. By default, broadcasting creates a new vector to store the result of log.(x), thereby allocating memory for it. However, we're only interested in the final sum and not the intermediate result itself. Therefore, an approach that bypasses the allocation of log.(x) is beneficial. Reductions accomplish this by defining a scalar output, which is iteratively updated by summing the transformed values of x. [note] In the section Lazy Operations, we'll explore an alternative with broadcasting that doesn't materialize intermediate results either.

x = rand(100)

foo1(x) = sum(log.(x))

function foo2(x)
    output = 0.

    for i in eachindex(x)
        output += log(x[i])
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  315.584 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  296.119 ns (0 allocations: 0 bytes)

x = rand(100)

foo1(x) = prod(log.(x))

function foo2(x)
    output = 1.

    for i in eachindex(x)
        output *= log(x[i])
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  311.840 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  296.061 ns (0 allocations: 0 bytes)

x = rand(100)

foo1(x) = maximum(log.(x))

function foo2(x)
    output = -Inf

    for i in eachindex(x)
        output = max(output, log(x[i]))
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  482.602 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  374.961 ns (0 allocations: 0 bytes)

x = rand(100)

foo1(x) = minimum(log.(x))

function foo2(x)
    output = Inf
    
    for i in eachindex(x)
        output = min(output, log(x[i]))
    end

    return output
end

Output in REPL

julia>

@btime foo1($x)

  487.156 ns (1 allocation: 896 bytes)

julia>

@btime foo2($x)

  368.502 ns (0 allocations: 0 bytes)

Reductions via Built-in Functions

The previous examples implemented reductions through explicit for-loops. Unfortunately, this approach can compromise readability due to the verbosity of for-loops. To address the issue, Julia offers several streamline alternatives to implement reductions.

For common operations, Julia also implements mapreduce as additional methods of the functions sum, prod, maximum, and minimum. These built-in implementations are more efficient than mapreduce, as they've been optimized for each respective case. Their syntax is given by foo(<transforming function>, x), where foo is one of the functions mentioned and x is the vector to be transformed. For instance, the following examples consider reductions for the transformed vector 2 .* x.

x      = rand(100)

foo(x) = sum(log, x)        #same output as sum(log.(x))

Output in REPL

julia>

@btime foo($x)

  294.889 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = prod(log, x)       #same output as prod(log.(x))

Output in REPL

julia>

@btime foo($x)

  294.763 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = maximum(log, x)    #same output as maximum(log.(x))

Output in REPL

julia>

@btime foo($x)

  579.940 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = minimum(log, x)    #same output as minimum(log.(x))

Output in REPL

julia>

@btime foo($x)

  577.516 ns (0 allocations: 0 bytes)

While we've used the built-in function log for transforming x, the approach can be employed through anonymous functions.

x      = rand(100)

foo(x) = sum(a -> 2 * a, x)       #same output as sum(2 .* x)

Output in REPL

julia>

@btime foo($x)

  6.493 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = prod(a -> 2 * a, x)      #same output as prod(2 .* x)

Output in REPL

julia>

@btime foo($x)

  6.741 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = maximum(a -> 2 * a, x)   #same output as maximum(2 .* x)

Output in REPL

julia>

@btime foo($x)

  172.547 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = minimum(a -> 2 * a, x)   #same output as minimum(2 .* x)

Output in REPL

julia>

@btime foo($x)

  171.490 ns (0 allocations: 0 bytes)

Finally, all these functions accept transforming functions that require multiple arguments. To incorporate this possibility, it's necessary to enclose the multiple variables using zip, and referring to each variable through indexes. We illustrate this below, where the transforming function is x .* y.

x = rand(100); y = rand(100)

foo(x,y) = sum(a -> a[1] * a[2], zip(x,y))         #same output as sum(x .* y)

Output in REPL

julia>

@btime foo($x)

  29.127 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = prod(a -> a[1] * a[2], zip(x,y))        #same output as prod(x .* y)

Output in REPL

julia>

@btime foo($x)

  48.031 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = maximum(a -> a[1] * a[2], zip(x,y))     #same output as maximum(x .* y)

Output in REPL

julia>

@btime foo($x)

  172.580 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = minimum(a -> a[1] * a[2], zip(x,y))     #same output as minimum(x .* y)

Output in REPL

julia>

@btime foo($x)

  166.969 ns (0 allocations: 0 bytes)

The "reduce" and "mapreduce" Functions

Beyond the specific functions we've considered, we can also implement reductions as long as the operation satisfies the requirements for their application. This is implemented through the functions reduce and mapreduce. Their difference lies in that reduce applies the reduction directly, while mapreduce transforms the collection's elements prior to doing it.

It's worth remarking that reductions with sum, prod, max, and min should still be done via the dedicated functions. The reason is that they've been optimized for their respective tasks. In this context, our primary use case of reduce and mapreduce is for other types of reductions not covered or when packages provide their own implementations of these functions. [note] For instance, the package Folds provides a parallelized version of both map and mapreduce, enabling the utilization of all available CPU cores. Its syntax is identical to Julia's built-in functions.

Function "reduce"

The function reduce uses the syntax reduce(<function>,x), where <function> is a two-argument function. The following example demonstrates its use.

x      = rand(100)

foo(x) = reduce(+, x)           #same output as sum(x)

Output in REPL

julia>

@btime foo($x)

  6.168 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = reduce(*, x)           #same output as prod(x)

Output in REPL

julia>

@btime foo($x)

  6.176 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = reduce(max, x)         #same output as maximum(x)

Output in REPL

julia>

@btime foo($x)

  167.905 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = reduce(min, x)         #same output as minimum(x)

Output in REPL

julia>

@btime foo($x)

  167.440 ns (0 allocations: 0 bytes)

Note that all the examples provided could've been implemented as we did previously, where we directly applied sum, prod, maximum and minimum.

Function "mapreduce"

The function mapreduce combines the functions map and reduce: before applying the reduction, mapreduce transforms vectors via the function map. Recall that map(foo,x) transforms each element of the collection x by applying foo element-wise. Thus, mapreduce(<transformation>,<reduction>,x) first transforms x's elements through map, and then applies a reduction to the resulting output.

To illustrate its use, we make use of a log transformation.

x      = rand(100)

foo(x) = mapreduce(log, +, x)       #same output as sum(log.(x))

Output in REPL

julia>

@btime foo($x)

  294.805 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = mapreduce(log, *, x)       #same output as prod(log.(x))

Output in REPL

julia>

@btime foo($x)

  294.618 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = mapreduce(log, max, x)     #same output as maximum(log.(x))

Output in REPL

julia>

@btime foo($x)

  579.808 ns (0 allocations: 0 bytes)

x      = rand(100)

foo(x) = mapreduce(log, min, x)     #same output as minimum(log.(x))

Output in REPL

julia>

@btime foo($x)

  577.505 ns (0 allocations: 0 bytes)

Just like with reduce, note that the examples could've been implemented directly as we did previously, through the functions sum, prod, maximum, and minimum.

mapreduce can also be used with anonymous functions and transformations requiring multiple arguments. Below, we illustrate both, whose implementation is the same as with sum, prod, maximum, and minimum.

x = rand(100); y = rand(100)

foo(x,y) = mapreduce(a -> a[1] * a[2], +, zip(x,y))       #same output as sum(x .* y)

Output in REPL

julia>

@btime foo($x)

  29.165 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = mapreduce(a -> a[1] * a[2], *, zip(x,y))       #same output as prod(x .* y)

Output in REPL

julia>

@btime foo($x)

  48.221 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = mapreduce(a -> a[1] * a[2], max, zip(x,y))     #same output as maximum(x .* y)

Output in REPL

julia>

@btime foo($x)

  175.634 ns (0 allocations: 0 bytes)

x = rand(100); y = rand(100)

foo(x,y) = mapreduce(a -> a[1] * a[2], min, zip(x,y))     #same output as minimum(x .* y)

Output in REPL

julia>

@btime foo($x)

  166.995 ns (0 allocations: 0 bytes)

reduce or mapreduce?

reduce can be considered as a special case of mapreduce, where the latter transforms x through the identity function, identity(x). Likewise, mapreduce(<transformation>,<operator>,x) produces the same result as reduce(<operator>, map(<transformation>,x)). However, mapreduce is more efficient, since it avoids the allocation of the transformed vector. This is demonstrated below, where we compute sum(2 .* x) through a reduction.

x = rand(100)

foo(x) = mapreduce(a -> 2 * a, +, x)

Output in REPL

julia>

@btime foo($x)

  6.372 ns (0 allocations: 0 bytes)

x = rand(100)

foo(x) = reduce(+, map(a -> 2 * a, x))

Output in REPL

julia>

@btime foo($x)

  43.476 ns (1 allocation: 896 bytes)

NOTATION

PAGE LAYOUT

LINKS TO SECTIONS

KEYBOARD SHORTCUTS

TIME MEASUREMENT

Dark Mode

PART II: HIGH PERFORMANCE

Introduction

Intuition

Implementing Reductions

REDUCTION 1: sum of [1,2,3,4]

REDUCTION 2: maximum of [1,4,2,8]

Avoiding Memory Allocations via Reductions

Reductions via Built-in Functions

The "reduce" and "mapreduce" Functions

Function "reduce"

Function "mapreduce"

reduce or mapreduce?