pic
Personal
Website

6d. Useful Functions for Vectors

PhD in Economics

Introduction

When working with data in Julia, you'll often need to manipulate and analyze vectors. This section provides an overview of essential functions for conducting these tasks, including sorting, identifying unique elements, counting occurrences, and ranking data. Our ultimate goal is to apply these functions in a practical context, as we'll do in the next section.

Sorting Vectors

The sort function allows the user to arrange elements in a specific order. By default, it sorts elements in ascending order, but this can be easily reversed to a descending order by setting the keyword argument rev = true. The function comes in two variants: sort, which returns a new sorted copy while preserving the original vector, and the in-place version sort!, which directly updates the vector.

x = [4, 5, 3, 2]

y = sort(x)
Output in REPL
julia>
y
4-element Vector{Int64}:
 2
 3
 4
 5
x = [4, 5, 3, 2]

y = sort(x, rev=true)
Output in REPL
julia>
y
4-element Vector{Int64}:
 5
 4
 3
 2
x = [4, 5, 3, 2]

sort!(x)
Output in REPL
julia>
x
4-element Vector{Int64}:
 2
 3
 4
 5

sort(x) provides an additional layer of flexibility, allowing the sorting order to be dictated by transformations of x. Specifically, given a function foo and leveraging the keyword argument by, the sorted order can be determined by the values of foo(x).

x      = [4, -5, 3]


y      = sort(x, by = abs)      # 'abs' computes the absolute value
Output in REPL
julia>
abs.(x)
3-element Vector{Int64}:
 4
 5
 3

julia>
y
3-element Vector{Int64}:
  3
  4
 -5
x      = [4, -5, 3]

foo(a) = a^2
y      = sort(x, by = foo)      # same as sort(x, by = x -> x^2)
Output in REPL
julia>
foo.(x)
3-element Vector{Int64}:
 16
 25
  9

julia>
y
3-element Vector{Int64}:
  3
  4
 -5
x      = [4, -5, 3]

foo(a) = -a
y      = sort(x, by = foo)      # same as sort(x, by = x -> -x)
Output in REPL
julia>
foo.(x)
3-element Vector{Int64}:
 -4
  5
 -3

julia>
y
3-element Vector{Int64}:
  4
  3
 -5

Indices of Sorted Elements

While sort returns the ordered values of the vectors, you may also be interested in the indices of the sorted elements. This is where sortperm comes into play, which returns the indices of x's elements that would sort x according to sort(x). In other words, x[sortperm(x)] == sort(x) is true. [note] The name sortperm originates from "sorting permutation". Although the name might seem somewhat opaque, it arises because the operation returns the permutation of indices that would sort the original vector.

x          = [1, 2, 3, 4]

sort_index = sortperm(x)
Output in REPL
julia>
sort_index
4-element Vector{Int64}:
 1
 2
 3
 4
x          = [3, 4, 5, 6]

sort_index = sortperm(x)
Output in REPL
julia>
sort_index
4-element Vector{Int64}:
 1
 2
 3
 4
x          = [1, 3, 4, 2]

sort_index = sortperm(x)
Output in REPL
julia>
sort_index
4-element Vector{Int64}:
 1
 4
 2
 3

In the first two examples, the elements are already in ascending order. As a result, sortperm returns the trivial permutation [1, 2, 3, 4], indicating that the original order is already sorted.

In contrast, the last example features an unordered vector x = [1, 3, 4, 2]. The resulting vector [1, 4, 2, 3] indicates that the smallest element 1 is at index 1, the second smallest 2 is at index 4, the third smallest 3 is at index 2, and the largest 4 is at index 3.

Similar to sort, sortperm also allows retrieving the indices in descending order. This requires setting the rev keyword argument to true.

x          = [9, 3, 2, 1]

sort_index = sortperm(x, rev=true)
Output in REPL
julia>
sort_index
4-element Vector{Int64}:
 1
 2
 3
 4
x          = [9, 5, 3, 1]

sort_index = sortperm(x, rev=true)
Output in REPL
julia>
sort_index
4-element Vector{Int64}:
 1
 2
 3
 4
x          = [9, 3, 5, 1]

sort_index = sortperm(x, rev=true)
Output in REPL
julia>
sort_index
4-element Vector{Int64}:
 1
 3
 2
 4

The function sortperm also supports the keyword argument by, thus returning the indices of sorted elements based on transformed values.

x      = [4, -5, 3]


value  = sort(x, by = abs)      # 'abs' computes the absolute value
index  = sortperm(x, by = abs)
Output in REPL
julia>
abs.(x)
3-element Vector{Int64}:
 4
 5
 3

julia>
value
3-element Vector{Int64}:
  3
  4
 -5

julia>
index
3-element Vector{Int64}:
 3
 1
 2
x      = [4, -5, 3]

foo(a) = a^2
value  = sort(x, by = foo)      # same as sort(x, by = x -> x^2)
index  = sortperm(x, by = foo)
Output in REPL
julia>
foo.(x)
3-element Vector{Int64}:
 16
 25
  9

julia>
value
3-element Vector{Int64}:
  3
  4
 -5

julia>
index
3-element Vector{Int64}:
 3
 1
 2
x      = [4, -5, 3]

foo(a) = -a
value  = sort(x, by = foo)      # same as sort(x, by = x -> -x)
index  = sortperm(x, by = foo)
Output in REPL
julia>
foo.(x)
3-element Vector{Int64}:
 -4
  5
 -3

julia>
value
3-element Vector{Int64}:
  4
  3
 -5

julia>
index
3-element Vector{Int64}:
 1
 3
 2

An Example

One primary use case of sortperm is to order a variable based on the values of another variable. For example, suppose we want to assess the daily failures of a machine. Focusing on the first three days of the month, the following code snippet ranks these days by their corresponding failure counts.

Days Sorted By Lowest Number of Failures
days     = ["one", "two", "three"]
failures = [8, 2, 4]

index            = sortperm(failures)
days_by_failures = days[index]                     # days sorted by lowest failures
Output in REPL
julia>
index
3-element Vector{Int64}:
 2
 3
 1

julia>
days_by_earnings
3-element Vector{String}:
 "two"
 "three"
 "one"

Unique Vectors

The unique function is designed to eliminate duplicates from a vector, retaining only a single occurrence of each element. In addition to unique that provides a new copy, the in-place version unique! directly updates the original vector.

x = [2, 2, 3, 4]

y = unique(x)       # returns a new vector
Output in REPL
julia>
x
4-element Vector{Int64}:
 2
 2
 3
 4

julia>
y
3-element Vector{Int64}:
 2
 3
 4
x = [2, 2, 3, 4]

unique!(x)          # mutates 'x'
Output in REPL
julia>
x
3-element Vector{Int64}:
 2
 3
 4

The StatsBase package also offers a related function called countmap. This enumerates the occurrences of each element in a vector. Formally, it returns a dictionary, where each unique element serves as a key, and its corresponding value represents the count of that element. As the keys in the dictionary are unsorted by design, you must apply the sort function if you prefer sorted keys. The application of sort will automatically transform an ordinary dictionary into an object with type OrderedDict.

using StatsBase
x           = [6, 6, 0, 5]

y           = countmap(x)              # Dict with `element => occurrences`

elements    = collect(keys(y))
occurrences = collect(values(y))
Output in REPL
julia>
y
Dict{Int64, Int64} with 3 entries:
  0 => 1
  5 => 1
  6 => 2

julia>
elements
3-element Vector{Int64}:
 0
 5
 6

julia>
occurrences
3-element Vector{Int64}:
 1
 1
 2
using StatsBase
x           = [6, 6, 0, 5]

y           = sort(countmap(x))        # OrderedDict with `element => occurrences`

elements    = collect(keys(y))
occurrences = collect(values(y))
Output in REPL
julia>
y
OrderedCollections.OrderedDict{Int64, Int64} with 3 entries:
  0 => 1
  5 => 1
  6 => 2

julia>
elements
3-element Vector{Int64}:
 0
 5
 6

julia>
occurrences
3-element Vector{Int64}:
 1
 1
 2

Rounding Numbers

Julia provides standard functions to approximate numerical values to a specific precision. They comprise the round, floor, and ceil functions.

  • round: it approximates a number to the nearest integer.

  • floor: it approximates a number down to the nearest integer.

  • ceil: it approximates a number up to the nearest integer.

Below, we show that these functions have multiple methods. They allow the user to specify the output's type (e.g., Int64 or Float64), the number of decimals to be included through the keyword argument digits, and the significant digits.

x = 456.175

round(x)                         # 456.0   

round(x, digits=1)               # 456.2
round(x, digits=2)               # 456.18

round(Int, x)                    # 456

round(x, sigdigits=1)            # 500.0
round(x, sigdigits=2)            # 460.0
x = 456.175

floor(x)                         # 456.0

floor(x, digits=1)               # 456.1
floor(x, digits=2)               # 456.17

floor(Int, x)                    # 456

floor(x, sigdigits=1)            # 400.0
floor(x, sigdigits=2)            # 450.0
x = 456.175

ceil(x)                          # 457.0

ceil(x, digits=1)                # 456.2
ceil(x, digits=2)                # 456.18   

ceil(Int, x)                     # 457   

ceil(x, sigdigits=1)             # 500.0
ceil(x, sigdigits=2)             # 460.0

Rankings

Instead of sorting a vector, you may be interested in determining the rank position of each element. The StatsBase package offers two functions for this purpose, competerank and ordinalrank. The main difference between them lies in how they handle identical elements. Specifically, competerank assigns the same rank to identical elements, while ordinalrank assigns different ranks to these elements. Both functions return a rank where 1 corresponds to the lowest value. If you prefer to invert the ranking, so that the highest value corresponds to a rank of 1, you can add the rev = true keyword argument.

using StatsBase
x = [6, 6, 0, 5]

y = competerank(x)
Output in REPL
julia>
y
4-element Vector{Int64}:
 3
 3
 1
 2
using StatsBase
x = [6, 6, 0, 5]

y = competerank(x, rev=true)
Output in REPL
julia>
y
4-element Vector{Int64}:
 1
 1
 4
 3
using StatsBase
x = [6, 6, 0, 5]

y = ordinalrank(x)
Output in REPL
julia>
y
4-element Vector{Int64}:
 3
 4
 1
 2
using StatsBase
x = [6, 6, 0, 5]

y = ordinalrank(x, rev=true)
Output in REPL
julia>
y
4-element Vector{Int64}:
 1
 2
 4
 3

Do not confuse ordinalrank and sortperm
The function ordinalrank indicates the position of each value in the sorted vector, while sortperm indicates the position of each value in the unsorted vector.

using StatsBase
x = [3, 1, 2]

y = ordinalrank(x)
Output in REPL
julia>
y
3-element Vector{Int64}:
 3
 1
 2
using StatsBase
x = [3, 1, 2]

y = sortperm(x)
Output in REPL
julia>
y
3-element Vector{Int64}:
 2
 3
 1

Extrema (Maximum and Minimum)

We conclude by identifying the extrema in a vector, along with their corresponding indices. The following examples illustrate the application with the maximum, but analogous functions are available for the minimum.

x = [6, 6, 0, 5]

y = maximum(x)
Output in REPL
julia>
y
6
x = [6, 6, 0, 5]

y = argmax(x)
Output in REPL
julia>
y
1
x = [6, 6, 0, 5]

y = findmax(x)
Output in REPL
julia>
y
(6, 1)

Additionally, Julia provides the function max and min, which respectively return the maximum and minimum of its arguments. These functions are particularly useful when applied in binary operations.

Max Function
x = 3
y = 4

z = max(x,y)
Output in REPL
julia>
z
4