allCode.jl
. They've been tested under Julia 1.11.3. <function>
or <operator>
).
This is just notation, and the symbols <
and >
should not be misconstrued as Julia's syntax.
Action | Keyboard Shortcut |
---|---|
Previous Section | Ctrl + 🠘 |
Next Section | Ctrl + 🠚 |
List of Sections | Ctrl + z |
List of Subsections | Ctrl + x |
Close Any Popped Up Window (like this one) | Esc |
Open All Codes and Outputs in a Post | Alt + 🠛 |
Close All Codes and Outputs in a Post | Alt + 🠙 |
Unit | Acronym | Measure in Seconds |
---|---|---|
Seconds | s | 1 |
Milliseconds | ms | 10-3 |
Microseconds | μs | 10-6 |
Nanoseconds | ns | 10-9 |
allCode.jl
. They've been tested under Julia 1.11.3. Reductions are a powerful technique for operations that take collections as inputs and return a single element. They arise naturally when we need to compute summary statistics such as an average, variance, or maximum of a collection.
The reduction process works by iteratively applying an operation to pairs of elements, accumulating the results at each step until the final output is obtained. For example, to compute sum(x)
for a vector x
, a reduction would start by adding the first two elements, then add the third element to the total, and continue this cumulative process until all elements have been summed.
The method is particularly convenient when we need to transform a vector's elements prior to computing a summary statistic. By operating on scalars, reductions sidestep the need of materializing intermediate outputs, thereby reducing memory allocations. This means that, if for example you have to compute sum(log.(x))
, a reduction would avoid the creation of the intermediate vector log.(x)
.
Reductions are typically implemented using a for-loop, with an operator applied to pairs of elements and the resulting output updated in each iteration. A classic example is the summation of all numeric elements in a vector. This involves applying the addition operator +
to pairs of elements, iteratively updating the accumulated sum. This is demonstrated below.
x = rand(100)
foo(x) = sum(x)
foo(x)
48.447
x = rand(100)
function foo(x)
output = 0.
for i in eachindex(x)
output = output + x[i]
end
return output
end
foo(x)
48.447
x = rand(100)
function foo(x)
output = 0.
for i in eachindex(x)
output += x[i]
end
return output
end
foo(x)
48.447
In reductions, it's common to see implementations like the last tab, which are based on update operators. This entails that an expression like x += a
is equivalent to x = x + a
.
Reductions are implemented by applying either a binary operator or a two-argument function during each iteration. An example of a binary operator is +
, as we used above. However, we could have also used +
as a two-argument function, replacing output = output + x[i]
with output = +(output, x[i])
. The possibility of using functions broadens the scope of reductions. For instance, it allows us to compute the maximum value of a vector x
by the function max
, where max(a,b)
returns the maximum of the scalars a
and b
.
Formally, a reduction requires the binary operation to satisfy two mathematical requirements:
Associativity: the way in which operations are grouped does not change the result. For example, addition is associative because (a + b) + c = a + (b + c)
.
Existence of an identity element: there exists an element that leaves any other element unchanged when the binary operation is applied. For example, the identity element of addition is 0 because a + 0 = a
.
The following list indicates the identity elements of each operation.
Operation | Identity Element |
---|---|
Sum | 0 |
Product | 1 |
Maximum | -Inf |
Minimum | Inf |
The relevance of identity elements lies in that they constitute the initial values of the iterative process. Based on these identity elements, we next implement reductions for several operations. The examples make use of the function foo1
to show the desired outcome, while foo2
provides the same output via a reduction.
x = rand(100)
foo1(x) = sum(x)
function foo2(x)
output = 0.
for i in eachindex(x)
output += x[i]
end
return output
end
x = rand(100)
foo1(x) = prod(x)
function foo2(x)
output = 1.
for i in eachindex(x)
output *= x[i]
end
return output
end
x = rand(100)
foo1(x) = maximum(x)
function foo2(x)
output = -Inf
for i in eachindex(x)
output = max(output, x[i])
end
return output
end
x = rand(100)
foo1(x) = minimum(x)
function foo2(x)
output = Inf
for i in eachindex(x)
output = min(output, x[i])
end
return output
end
We can also visually illustrate how reductions operate in these examples, as we do below.
One of the primary advantages of reductions is that they avoid the memory allocation of intermediate results. To illustrate this, consider the operation sum(log.(x))
for a vector x
. This operation involves transforming x
into log.(x)
and then summing the transformed elements. By default, broadcasting creates a new vector to store the result of log.(x)
, thereby allocating memory for it. However, we're only interested in the final sum and not the intermediate result itself. Therefore, an approach that bypasses the allocation of log.(x)
is beneficial. Reductions accomplish this by defining a scalar output
, which is iteratively updated by summing the transformed values of x
. [note] In the section Lazy Operations, we'll explore an alternative with broadcasting that doesn't materialize intermediate results either.
x = rand(100)
foo1(x) = sum(log.(x))
function foo2(x)
output = 0.
for i in eachindex(x)
output += log(x[i])
end
return output
end
@btime foo1($x)
315.584 ns (1 allocation: 896 bytes)
@btime foo2($x)
296.119 ns (0 allocations: 0 bytes)
x = rand(100)
foo1(x) = prod(log.(x))
function foo2(x)
output = 1.
for i in eachindex(x)
output *= log(x[i])
end
return output
end
@btime foo1($x)
311.840 ns (1 allocation: 896 bytes)
@btime foo2($x)
296.061 ns (0 allocations: 0 bytes)
x = rand(100)
foo1(x) = maximum(log.(x))
function foo2(x)
output = -Inf
for i in eachindex(x)
output = max(output, log(x[i]))
end
return output
end
@btime foo1($x)
482.602 ns (1 allocation: 896 bytes)
@btime foo2($x)
374.961 ns (0 allocations: 0 bytes)
x = rand(100)
foo1(x) = minimum(log.(x))
function foo2(x)
output = Inf
for i in eachindex(x)
output = min(output, log(x[i]))
end
return output
end
@btime foo1($x)
487.156 ns (1 allocation: 896 bytes)
@btime foo2($x)
368.502 ns (0 allocations: 0 bytes)
The previous examples implemented reductions through explicit for-loops. Unfortunately, this approach can compromise readability due to the verbosity of for-loops. To address the issue, Julia offers several streamline alternatives to implement reductions.
For common operations, Julia also implements mapreduce
as additional methods of the functions sum
, prod
, maximum
, and minimum
. These built-in implementations are more efficient than mapreduce
, as they've been optimized for each respective case. Their syntax is given by foo(<transforming function>, x)
, where foo
is one of the functions mentioned and x
is the vector to be transformed. For instance, the following examples consider reductions for the transformed vector 2 .* x
.
x = rand(100)
foo(x) = sum(log, x) #same output as sum(log.(x))
@btime foo($x)
294.889 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = prod(log, x) #same output as prod(log.(x))
@btime foo($x)
294.763 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = maximum(log, x) #same output as maximum(log.(x))
@btime foo($x)
579.940 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = minimum(log, x) #same output as minimum(log.(x))
@btime foo($x)
577.516 ns (0 allocations: 0 bytes)
While we've used the built-in function log
for transforming x
, the approach can be employed through anonymous functions.
x = rand(100)
foo(x) = sum(a -> 2 * a, x) #same output as sum(2 .* x)
@btime foo($x)
6.493 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = prod(a -> 2 * a, x) #same output as prod(2 .* x)
@btime foo($x)
6.741 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = maximum(a -> 2 * a, x) #same output as maximum(2 .* x)
@btime foo($x)
172.547 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = minimum(a -> 2 * a, x) #same output as minimum(2 .* x)
@btime foo($x)
171.490 ns (0 allocations: 0 bytes)
Finally, all these functions accept transforming functions that require multiple arguments. To incorporate this possibility, it's necessary to enclose the multiple variables using zip
, and referring to each variable through indexes. We illustrate this below, where the transforming function is x .* y
.
x = rand(100); y = rand(100)
foo(x,y) = sum(a -> a[1] * a[2], zip(x,y)) #same output as sum(x .* y)
@btime foo($x)
29.127 ns (0 allocations: 0 bytes)
x = rand(100); y = rand(100)
foo(x,y) = prod(a -> a[1] * a[2], zip(x,y)) #same output as prod(x .* y)
@btime foo($x)
48.031 ns (0 allocations: 0 bytes)
x = rand(100); y = rand(100)
foo(x,y) = maximum(a -> a[1] * a[2], zip(x,y)) #same output as maximum(x .* y)
@btime foo($x)
172.580 ns (0 allocations: 0 bytes)
x = rand(100); y = rand(100)
foo(x,y) = minimum(a -> a[1] * a[2], zip(x,y)) #same output as minimum(x .* y)
@btime foo($x)
166.969 ns (0 allocations: 0 bytes)
Beyond the specific functions we've considered, we can also implement reductions as long as the operation satisfies the requirements for their application. This is implemented through the functions reduce
and mapreduce
. Their difference lies in that reduce
applies the reduction directly, while mapreduce
transforms the collection's elements prior to doing it.
It's worth remarking that reductions with sum
, prod
, max
, and min
should still be done via the dedicated functions. The reason is that they've been optimized for their respective tasks. In this context, our primary use case of reduce
and mapreduce
is for other types of reductions not covered or when packages provide their own implementations of these functions. [note] For instance, the package Folds
provides a parallelized version of both map
and mapreduce
, enabling the utilization of all available CPU cores. Its syntax is identical to Julia's built-in functions.
The function reduce
uses the syntax reduce(<function>,x)
, where <function>
is a two-argument function. The following example demonstrates its use.
x = rand(100)
foo(x) = reduce(+, x) #same output as sum(x)
@btime foo($x)
6.168 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = reduce(*, x) #same output as prod(x)
@btime foo($x)
6.176 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = reduce(max, x) #same output as maximum(x)
@btime foo($x)
167.905 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = reduce(min, x) #same output as minimum(x)
@btime foo($x)
167.440 ns (0 allocations: 0 bytes)
Note that all the examples provided could've been implemented as we did previously, where we directly applied sum
, prod
, maximum
and minimum
.
The function mapreduce
combines the functions map
and reduce
: before applying the reduction, mapreduce
transforms vectors via the function map
. Recall that map(foo,x)
transforms each element of the collection x
by applying foo
element-wise. Thus, mapreduce(<transformation>,<reduction>,x)
first transforms x
's elements through map
, and then applies a reduction to the resulting output.
To illustrate its use, we make use of a log
transformation.
x = rand(100)
foo(x) = mapreduce(log, +, x) #same output as sum(log.(x))
@btime foo($x)
294.805 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = mapreduce(log, *, x) #same output as prod(log.(x))
@btime foo($x)
294.618 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = mapreduce(log, max, x) #same output as maximum(log.(x))
@btime foo($x)
579.808 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = mapreduce(log, min, x) #same output as minimum(log.(x))
@btime foo($x)
577.505 ns (0 allocations: 0 bytes)
Just like with reduce
, note that the examples could've been implemented directly as we did previously, through the functions sum
, prod
, maximum
, and minimum
.
mapreduce
can also be used with anonymous functions and transformations requiring multiple arguments. Below, we illustrate both, whose implementation is the same as with sum
, prod
, maximum
, and minimum
.
x = rand(100); y = rand(100)
foo(x,y) = mapreduce(a -> a[1] * a[2], +, zip(x,y)) #same output as sum(x .* y)
@btime foo($x)
29.165 ns (0 allocations: 0 bytes)
x = rand(100); y = rand(100)
foo(x,y) = mapreduce(a -> a[1] * a[2], *, zip(x,y)) #same output as prod(x .* y)
@btime foo($x)
48.221 ns (0 allocations: 0 bytes)
x = rand(100); y = rand(100)
foo(x,y) = mapreduce(a -> a[1] * a[2], max, zip(x,y)) #same output as maximum(x .* y)
@btime foo($x)
175.634 ns (0 allocations: 0 bytes)
x = rand(100); y = rand(100)
foo(x,y) = mapreduce(a -> a[1] * a[2], min, zip(x,y)) #same output as minimum(x .* y)
@btime foo($x)
166.995 ns (0 allocations: 0 bytes)
reduce
can be considered as a special case of mapreduce
, where the latter transforms x
through the identity function, identity(x)
. Likewise, mapreduce(<transformation>,<operator>,x)
produces the same result as reduce(<operator>, map(<transformation>,x))
. However, mapreduce
is more efficient, since it avoids the allocation of the transformed vector. This is demonstrated below, where we compute sum(2 .* x)
through a reduction.
x = rand(100)
foo(x) = mapreduce(a -> 2 * a, +, x)
@btime foo($x)
6.372 ns (0 allocations: 0 bytes)
x = rand(100)
foo(x) = reduce(+, map(a -> 2 * a, x))
@btime foo($x)
43.476 ns (1 allocation: 896 bytes)