pic
Personal
Website

5f. Array Indexing

PhD in Economics

Introduction

Previous to mutating vectors, you need first to identify the specific elements to be modified. This process is known as vector indexing, and we've already covered basic indexing methods. Specifically, the methods presented were based on vectors and ranges (e.g., x[[1,2,3]] or x[1:3]). While these approaches are effective for simple selections, they fall short for more complex scenarios, such as selecting elements based on conditions.

This section expands our toolkit by introducing more advanced forms of indexing. The techniques primarily build on broadcasting and Boolean operations.

Logical Indexing

Logical indexing, also known as Boolean indexing or masking, enables the user to select elements based on a set of conditions. The functionality is implemented through a Boolean vector y of the same length as x that serves as a filter. Specifically, the expression x[y] will return a new vector, where true retains the element and false discards it. The following example illustrates its use.

Code

x = [1,2,3]
y = [true, false, true]

Output in REPL
julia>
x[y]
2-element Vector{Int64}: 1 3

Boolean Functions and Operators for Logical Indexing

The power of logical indexing stems from the possibility of broadcasting operators and functions to build Boolean vectors. This feature makes it possible to state complex conditional statements easily.

For instance, the following code snippets leverages broadcasting to select x's elements that are lower than 10.

x = [1, 2, 3, 100, 200]

y = x[x .< 10]

Output in REPL
julia>
y
3-element Vector{Int64}: 1 2 3

x = [1, 2, 3, 100, 200]

condition(a) = (a < 10)             #function to eventually broadcast
y            = x[condition.(x)]

Output in REPL
julia>
y
3-element Vector{Int64}: 1 2 3

Each condition in x .< 10 is equivalent to condition.(x), making the two methods interchangeable. However, when dealing with multiple conditions, we must combine conditions expressed through the logical operators && and ||. [note] The logical operators && and || were introduced in the section about conditional statements . This has direct implications for readability, as the use of broadcasting operators can result in verbose code, due to the repeated dots in their syntax. This contrasts with the approach based a function, which minimizes boilerplate code, as the following example shows.

x = [3, 6, 8, 100]

# numbers greater than 5, lower than 10, but not including 8
y = x[(x .> 5) .&& (x .< 10) .&& (x .≠ 8)]

Output in REPL
julia>
y
1-element Vector{Int64}: 6

x = [3, 6, 7, 8, 100]

# numbers greater than 5, lower than 10, but not including 8
condition(a) = (a > 5) && (a < 10) && (a ≠ 8)           #function to eventually broadcast
y            = x[condition.(x)]

Output in REPL
julia>
y
1-element Vector{Int64}: 6

Note that all operators must be broadcasted when stating the condition. This includes &&, which is a scalar operator, and hence must be applied element-wise.

Logical Indexing Through "in" and "∈"

Remark
The symbols used in this section, ∈ and ∉, can be inserted through Tab Completion:

  • ∈ by \in

  • ∉ by \notin

To select elements of a vector through logical indexing, another approach involves the use of in and ∈. Each of these symbols is available as a function and an operator, and they determine whether an element belongs to a given set. [note] in and ∈ were introduced here.

When using these functions with broadcasting, there's an important caveat to consider. As introduced in the section on broadcasting over one argument only, we need to be mindful of how broadcasting behaves when the function takes multiple arguments.

To demonstrate their use, let's consider a vector x from which we want to select specific elements. Suppose in particular that your goal is to retain the numbers contained in set_numbers. If we use either in.(x, set_numbers) or x ∈. set_numbers, Julia will simultaneously iterate over each pair of x and set_numbers. However, this isn't the desired operation. Rather, our goal is to check whether each element of x belongs to set_numbers. This requires us treating set_numbers as a single object, which can be achieved via Ref(set_numbers).

Considering this, the following example demonstrates how to properly use in and ∈. The scenario supposes that the aim is to create a vector y, which contains the minimum and maximum of the vector x.

x           = [-100, 2, 4, 100]
set_numbers = [minimum(x), maximum(x)]

# logical indexing (both versions are equivalent)
bool_indices = in.(x, Ref(set_numbers))    #`Ref(set_numbers)` can be replaced by `(set_numbers,)`
bool_indices = (∈).(x,Ref(set_numbers))

y            = x[bool_indices]

Output in REPL
julia>
bool_indices
4-element BitVector: 1 0 0 1

julia>
y
2-element Vector{Int64}: -100 100

x           = [-100, 2, 4, 100]
set_numbers = [minimum(x), maximum(x)]

# logical indexing
bool_indices = x .∈ Ref(set_numbers)       #only option, not possible to broadcast `in`


y            = x[bool_indices]

Output in REPL
julia>
bool_indices
4-element BitVector: 1 0 0 1

julia>
y
2-element Vector{Int64}: -100 100

Remark
The in function has an alternative "curried" version, allowing the user to broadcast in without relying on the expression Ref(set_numbers). Currying techniques were introduced here, and they broadcast in through in(set_numbers).(x), as in the example below.

Code

x           = [2, 4, 100]
set_numbers = [minimum(x), maximum(x)]

#logical indexing
bool_indices = x[in(set_numbers).(x)]   #no need to use `Ref(set_numbers)`
y            = x[bool_indices]

Output in REPL
julia>
bool_indices
4-element BitVector: 1 0 0 1
julia>
y
2-element Vector{Int64}: -100 100
Remark
The functions and operators in and ∈ allow for negated versions, !in and ∉ (equivalent to !∈), which select elements not belonging to a set.

In the example below, these versions are applied to retain the elements of x that are neither its minimum nor its maximum.

x           = [-100, 2, 4, 100]
set_numbers = [minimum(x), maximum(x)]

#identical vectors for logical indexing
bool_indices = (!in).(x, Ref(set_numbers))
bool_indices = (∉).(x, Ref(set_numbers))          #or `(!∈).(x, Ref(set_numbers))`

Output in REPL
julia>
bool_indices
4-element BitVector: 0 1 1 0

julia>
x[bool_indices]
2-element Vector{Int64}: 2 4

x           = [-100, 2, 4, 100]
set_numbers = [minimum(x), maximum(x)]

#vector for logical indexing
bool_indices = x .∉ Ref(set_numbers)

Output in REPL
julia>
bool_indices
4-element BitVector: 0 1 1 0

julia>
x[bool_indices]
2-element Vector{Int64}: 2 4

The Functions "findall" and "filter"

We close this section with two additional methods to select elements from a vector. They're provided by the functions findall and filter.

The function findall returns the indices of a vector x that satisfy a given condition. The condition can be embedded in two ways: through a function that returns a Boolean scalar, or by directly passing a Boolean vector. The following example illustrates both approaches.

x = [5,6,7,8,9]

y = findall(a -> a > 7, x)
z = x[findall(a -> a > 7, x)]

Output in REPL
julia>
y
2-element Vector{Int64}: 4 5

julia>
z
2-element Vector{Int64}: 8 9

x = [5,6,7,8,9]

y = findall(x .> 7)
z = x[findall(x .> 7)]

Output in REPL
julia>
y
2-element Vector{Int64}: 4 5

julia>
z
2-element Vector{Int64}: 8 9

The other function is filter, which returns the elements of a vector x that satisfy a given condition. Despite what the name suggests, filter actually retain these elements, rather than discarding them. Similarly to findall, the condition in filter can be passed through a function or vector. In fact, the only difference between the functions is that filter returns the actual elements that meet the condition, whereas findall returns their corresponding indices.

Function filter

x = [5,6,7,8,9]

y = filter(a -> a > 7, x)

Output in REPL
julia>
y
2-element Vector{Int64}: 8 9