pic
Personal
Website

9f. Lazy Operations

PhD in Economics

Introduction

Computational approaches can be broadly classified into the categories "lazy" and "eager". Eager operations are executed immediately upon definition, providing instant access to the results. So far, most operations on this website have fallen under this category.

In contrast, Lazy operations represent a form of deferring computation. They define the code to be executed, but without calculating its output. The technique is particularly valuable for operations involving heavy intermediate computations, as lazy evaluation sidesteps unnecessary memory allocations: by deferring computations until needed, it's possible to make computations on the fly and fed them directly into the final calculation.

This section provides various implementations for lazy computations. The first approach presented is based on the so-called generators, which are the lazy analogous of array comprehensions. After this, we introduce several functions from the package Iterators, which provide lazy implementations of functions like map and filter.

Generators

Array comprehensions offer a convenient method for creating vectors. They define their elements using a loop-like syntax, which are immediately computed and stored. For their part, generators provide a lazy alternative, deferring the creation of elements until they're needed.

The syntax of generators is nearly identical to array comprehensions, with the exception that generators are enclosed in parentheses ( ) instead of square brackets [ ]. They also retain the same flexibility, including the ability to add conditions and iterate simultaneously over multiple collections.

x = [a for a in 1:10]

y = [a for a in 1:10 if a > 5]
Output in REPL
julia>
x
10-element Vector{Int64}:
  1
  2
  â‹®
  9
 10

julia>
y
5-element Vector{Int64}:
  6
  7
  8
  9
 10
x = (a for a in 1:10)

y = (a for a in 1:10 if a > 5)
Output in REPL
julia>
x
Base.Generator{UnitRange{Int64}, typeof(identity)}(identity, 1:10)

The examples show that array comprehensions compute all their elements at the moment of defining the vector, giving immediate access to it. In contrast, generators formally define an object with type Base.Generator, where operations are described, but no output is materialized.

Generators are particularly valuable for computing reductions of transformed values. By producing values on-demand and fusing them with the reduction function, generators eliminate the materialization of temporary vectors, thereby reducing memory allocations.

To illustrate the performance benefits this creates, let's compute the sum of a vector y's elements. In particular, y arises by transforming another vector x through doubling each element's value. One way to compute this operation is by first creating the vector y and then sum all its elements. Alternatively, a generator bypasses the storage of the intermediate output y and thus reduces memory allocations. By doing so, the transformation is fed directly into sum, allowing the compiler to perform the addition through a cumulative operation on scalars.

x = rand(100)

function foo(x)
    y = [a * 2 for a in x]                  # 1 allocation (same as y = x .* 2)
    
    sum(y)
end
Output in REPL
julia>
@btime foo($x)
  46.945 ns (1 allocation: 896 bytes)
x = rand(100)

function foo(x)
    y = (a * 2 for a in x)                  # 0 allocations
    
    sum(y)
end
Output in REPL
julia>
@btime foo($x)
  23.996 ns (0 allocations: 0 bytes)
x = rand(100)


foo(x) = sum(a * 2 for a in x)              # 0 allocations
Output in REPL
julia>
@btime foo($x)
  23.996 ns (0 allocations: 0 bytes)

The last tab shows that generators can be incorporated directly as a function argument, resulting in a compact syntax. Remarkably, this syntax is applicable to any function accepting a collection as its input.

Iterators

Iterators are formally defined as lazy objects that create values on demand sequentially. Throughout the website, we've already presented numerous scenarios involving iterators. One example is when ranges like 1:10 were employed, which represents a sequence of numbers that can be accessed one-by-one. Ranges are convenient for iterations in a for-loop, as they avoid the materialization of the collection to be iterated. Their lazy nature also explains why we must call the function collect to materialize ranges into vectors. Other examples of iterators are eachindex, enumerate, and zip.

x = 1:10
Output in REPL
julia>
x
1:10
x = collect(1:10)
Output in REPL
julia>
x
10-element Vector{Int64}:
  1
  2
  â‹®
  9
 10

The built-in package Iterators, which is "imported" every time we run a session in Julia, provides various functions for generating lazy sequences. Additionally, it implements lazy analogous of various functions such as Iterators.filter and Iterators.map, which are the lazy analogous of filter and map. [note] The package IterTools extends Iterators further.

The following example focus on these functions, showcasing how they're capable of reducing memory allocations of intermediate computations.

x = collect(1:100)

function foo(x)
    y = filter(a -> a > 50, x)              # 1 allocation 

    sum(y)
end
Output in REPL
julia>
@btime foo($x)
  53.163 ns (1 allocation: 896 bytes)
x = collect(1:100)

function foo(x)
    y = Iterators.filter(a -> a > 50, x)    # 0 allocations 

    sum(y)
end
Output in REPL
julia>
@btime foo($x)
  55.239 ns (0 allocations: 0 bytes)
x = rand(100)

function foo(x) 
    y = map(a -> a * 2, x)                  # 1 allocation

    sum(y)
end
Output in REPL
julia>
@btime foo($x)
  47.963 ns (1 allocation: 896 bytes)
x = rand(100)

function foo(x)
    y = Iterators.map(a -> a * 2, x)        # 0 allocations

    sum(y)
end
Output in REPL
julia>
@btime foo($x)
  23.972 ns (0 allocations: 0 bytes)