pic
Personal
Website

7d. Preliminaries on Types

PhD in Economics

Introduction

High performance in Julia depends critically on the notion of type stability. Its definition is relatively straightforward: a function is type-stable when the types of its expressions can be inferred from the types of its arguments. When this property holds, Julia can specialize the computation method, resulting in significant performance gains.

Despite its simplicity, type stability is subject to various nuances, and a careful definition requires a solid foundation in two key areas: Julia's type system and the inner workings of functions. The current section equips you with the necessary knowledge to grasp the former, deferring the internals of functions to the next section. Moreover, we focus on scalars and vectors, leaving more complex objects for subsequent sections.

Before you continue, I recommend reviewing the basics of types introduced here.

Warning!
The subject is covered only to the extent necessary for understanding type stability. Julia's type system is indeed quite vast, and a comprehensive exploration would warrant a dedicated chapter.

Basics of Types

Variables in Julia are mere tags for objects. In turn, objects hold values described by types. The most common types for scalars are Float64 and Int64, with the vector counterparts being Vector{Float64} and Vector{Int64}. Recall that Vector is an alias for a one-dimensional array, so that a type like Vector{Float64} is equivalent to Array{Float,1}.

Int As an Alternative to Int64
You'll notice that packages tend to use Int as the default type for integers. The type Int is an alias that adapts its value to your CPU's architecture. Most modern computers are 64-bit systems, making Int be equivalent to Int64. On 32-bit systems, Int becomes Int32.

Julia's type system is organized in a hierarchical way. This allows for the definition of subsets and supersets of types, which are called subtypes and supertypes in the context of types. [note] Types don't necessarily have a subtype-supertype hierarchy. For example, Float64 and Vector{String} exist independently, without a hierarchical relationship. This fact will become clearer when the concepts of abstract and concrete types are defined. For instance, the type Any is a supertype that includes all possible types in Julia, occupying the highest position in any type hierarchy. Another example of supertype is Number, which encompasses all numeric types (Float64, Float32, Int64, etc.).

Supertypes provide great flexibility for writing code. They enable the grouping of values to define operations in common. For instance, defining + for the abstract type Number ensures its applicability to all number types, regardless of whether they are integers, floats, or their numerical precision.

A special supertype known as Union will be instrumental for our examples. It represents variables that can potentially hold values with different types, and its syntax is Union{<type1>, <type2>, ...}. For example, a variable with type Union{Int64, Float64} could be either an Int64 or Float64. Note that, by definition, unions are always supertypes of their arguments.

Union of Types
Unions of types emerge organically in data analysis. In particular for handling empty entries, which are represented by the type Missing. For instance, if we load a column that contains both integers and empty entries, the resulting data will be stored with type Vector{Union{Missing,Int64}}.

Abstract and Concrete Types

The hierarchical nature of types makes it possible to represent subtypes and supertypes as trees. This gives rise to the notions of abstract and concrete types.

An abstract type acts as a parent category, which necessarily breaks down into subtypes. The type Any in Julia is a prime example. In contrast, a concrete type represents an irreducible unit lacking subtypes, entailing that it's final.

The diagram below illustrates the difference between abstract and concrete types for scalars. The example is based on the hierarchy of the type Number. It's worth noting that the labels included match the corresponding type name in Julia. [note] The Signed subtype of Integers allows for the representation of negative and positive integers. Julia also offers the type Unsigned, which only accepts positive integers and comprises subtypes such as UInt64 and UInt32.

Hierarchy of "Number"

Note: Dashed red borders indicate abstract types, while solid blue borders indicate concrete types.

For scalars, the distinction between abstract and concrete types is relatively straightforward. On the contrary, the difference for vectors is more subtle, as shown in the diagram below.

Hierarchy of Vectors

Note: Dashed red borders indicate abstract types, while solid blue borders indicate concrete types.

The tree reveals that Vector{T} for a given type T is a concrete type. By definition, this implies that variables can be instances of some Vector{T}. Moreover, Vector{T} can't have subtypes. Consequently, vectors like Vector{Int64} aren't a subtype of Vector{Any}, despite Int64 being a subtype of Any. This feature stands in stark contrast to scalars, where Any is an abstract type. However, it also perfectly aligns with the understanding of vectors as collections of homogeneous elements, in the sense of sharing the same type.

Only concrete Types Can be Instantiated, Abstract Types Can't

In Julia, instantiation refers to the process of creating an object with a certain type. Importantly, only objects with concrete types can be instantiated, which entails that there can't be values with abstract types. This fundamental principle has implications for some certain colloquial expressions we commonly use. For example, when we say that a variable has type Any, it actually means that the variable can assume any concrete type, as long as it's a subtype of Any.

This distinction will become crucial in what comes, particularly for type-annotating variables. It implies that declaring a variable with an abstract type amounts to restricting the set of possible concrete types, with the variable ultimately adopting one of these concrete types.

Relevance for Type Stability

At this point, you may be wondering how all this relates to type stability. The answer is given by how Julia performs computations.

Given an operation, high performance requires specialization of the computation method. We'll see that this is unfeasible in the global scope, as Julia treats global variables as embodying any type. In contrast, when we wrap code in a function, Julia begins by identifying concrete types for each argument. With this information, it attempts to identify concrete types for all expressions within the function. When concrete types can indeed be identified, we say that the function is type stable, and Julia is able to specialize its method. Otherwise, if expressions could adopt various concrete types, performance is substantially degraded. This is because Julia is forced to consider multiple implementations, one for each possible type.

For scalars and vectors, this essentially means that expressions must ultimately operate over primitive types. Examples of numeric primitive types are integers and floating-point numbers, such as Int64, Float64, and Bool. Thus, operations like sum over Vector{Int64} or Vector{Float64} allow for specialization, while operating over Vector{Any} precludes it.

String Objects
While our focus is on numeric variables, the standard type String doesn't pose a problem for type stability. This is because String is internally represented as a collection of characters, which are represented by the primitive type Char.

The Operator <: to Identify Supertypes

The remainder of the section is devoted to operators and functions that handle types. Specifically, we'll present the operator <: to identify supertypes, along with several approaches to declaring a variable's type.

It's quite possible that you won't use any of the techniques presented. The reason is that, as we'll see, Julia identifies the types of variables passed to a function. Nevertheless, the operators introduced will be crucial to understand upcoming sections.

Use of <:

The symbol :< assesses whether a type T is a subset of S. This can be employed as an operator T <: S or as a function <:(T,S). For example, Int64 <: Number or <:(Int64, Number) verifies whether Int64 is a subtype of Number, which would return true.

# all the statements below are `true`
Float64 <: Any
Int64   <: Number
Int64   <: Int64

# all the statements below are `false`
Float64 <: Vector{Any}
Int64   <: Vector{Number}
Int64   <: Vector{Int64}

The fact that Int64 <: Int64 evaluates to true illustrates a fundamental principle: every type is a subtype of itself. Moreover, in the case of concrete types, this is the only subtype.

The Keyword where

By combining <: with the type with Union, you can also check if a type belongs to a set of types. For example, Int64 <: Union{Int64, Float64} evaluates whether Int64 equals Int64 or Float64, thus returning true.

The approach can be made more widely applicable by using the keyword where, along with a type parameter T that can take multiple values. The syntax is <type depending on T> where T <: <set of types>, where T can be represented by any other character.

# all the statements below are `true`
Float64 <: Any
Int64   <: Union{Int64, Float64}
Int64   <: Union{T, String} where T <: Number       # `String` represents text

# all the statements below are `true`
Vector{Float64} <: Vector{T} where T <: Any
Vector{Int64}   <: Vector{T} where T <: Union{Int64, Float64}
Vector{Number}  <: Vector{T} where T <: Any

# all the statements below are `false`
Vector{Float64} <: Vector{Any}
Vector{Int64}   <: Vector{Union{Int64, Float64}}
Vector{Number}  <: Vector{Any}

# all the statements below are `true`
Vector{Float64} <: Vector{<:Any}
Vector{Int64}   <: Vector{<:Union{Int64, Float64}}
Vector{Number}  <: Vector{<:Any}

# all the statements below are `false`
Vector{Float64} <: Vector{Any}
Vector{Int64}   <: Vector{Union{Int64, Float64}}
Vector{Number}  <: Vector{Any}

Types constructed through parameters like T are known as parametric types. In the example above, they allow us to distinguish between a concrete type like Vector{Any} and a set of concrete types Vector{T} where T <: Any, where the latter encompasses Vector{Int64}, Vector{Float64}, Vector{String}, etc.

Warning! - The Type Any
When we omit <: and simply write where T, Julia implicitly interprets the statement as where T <: Any.

# all the statements below are `true`
Float64       <: Any
Float64       <: T where T <: Any             # identical to the line above
Vector{Int64} <: Vector{T} where T <: Any

# all the statements below are `true`
Float64       <: Any
Float64       <: T where T                    # identical to the line above
Vector{Int64} <: Vector{T} where T

Type-Annotating Variables

We now indicate how to type-annotate a variable. The technique can be used to either assert a variable's type during an assignment or to restrict the types of function arguments.

There are two approaches to type-annotating variables. The first one relies on the binary operator :: and its syntax is x::<type>. The second approach leverages the Boolean binary operator <:, which must be combined with :: and the keyword where. Specifically, the syntax is x::T where T <: <type>, where T can be replaced with any other character.

Next, we illustrate both, by separately considering type-annotations for assignments and for function arguments.

Assignments

Let's start illustrating the approaches for scalar assignments. Each tab below declares an identical type for x and for y.

x::Int64               = 2      # only reassignments to `Int64` are possible

y::Number              = 2      # only reassignments to `Float64`, `Float32`, `Int64`, etc are possible

Output in REPL
julia>
x = 2.5
ERROR: InexactError: Int64(2.5)

julia>
y = 2.5
2.5

julia>
y = "hello"
ERROR: MethodError: Cannot convert an object of type String to an object of type Number

x::T where T <: Int64  = 2      # only reassignments to `Int64` are possible

y::T where T <: Number = 2      # only reassignments to `Float64`, `Float32`, `Int64`, etc are possible

Output in REPL
julia>
x = 2.5
ERROR: InexactError: Int64(2.5)

julia>
y = 2.5
2.5

julia>
y = "hello"
ERROR: MethodError: Cannot convert an object of type String to an object of type Number

Warning! - Modifying Types
Once you assert a type for x in an assignment, you can't modify x's type afterwards. The only way to fix this is by starting a new Julia session.

The fact that x has the same type across all tabs follows because T <: Float64 only includes Float64. This occurs as Float64 is a concrete type, which by definition has no subtypes other than itself. Considering this, it's common to directly assert scalar types through :: rather than <:.

On the contrary, the choice between :: or <: has different implications when a vector's type is asserted. The reason is that declaring Vector{Number} is quite different from Vector{T} where T <: Number. The former establishes that Vector{Number} is the only possible concrete type, while the latter that elements have a concrete type that is a subtype of Number.

x::Vector{Any}                 = [1,2,3]     # `x` will always be `Vector{Any}`

y::Vector{Number}              = [1,2,3]     # `y` will always be `Vector{Number}`

Output in REPL
julia>
typeof(x)
Vector{Any} (alias for Array{Any, 1})

julia>
typeof(y)
Vector{Number} (alias for Array{Number, 1})

x::Vector{T} where T <:Any     = [1,2,3]     # `x` can be reassigned to `Vector{Float64}`, `Vector{String}`, etc

y::Vector{T} where T <: Number = [1,2,3]     # `x` can be reassigned to `Vector{Float64}`, `Vector{Int64}`, etc

Output in REPL
julia>
typeof(x)
Vector{Int64} (alias for Array{Int64, 1})

julia>
typeof(y)
Vector{Int64} (alias for Array{Int64, 1})

The principles outlined apply even when a variable's type isn't explicitly annotated. The reason is that an assignment without :: implicitly annotates the variable with Any, where Any is the supertype that encompasses all possible types. Specifically, statements like x = 2 and x::Any = 2 are equivalent.

The same occurs when omitting <: from the expression where T, which implicitly takes T <: Any. Thus, for instance, x = 2 is equivalent to x::T where T = 2 and x::T where T <: Any = 2. Considering this, all the variables below restrict types in the same way.

# all are equivalent
a      = 2
b::Any = 2

# all are equivalent
a                   = 2
b::T where T        = 2
c::T where T <: Any = 2

The default restriction to Any is the reason why we can reassign variables with any value. For instance, given a = 1, executing a = "hello" afterwards is valid because a is implicitly type-annotated with Any.

Warning! - One-liner Statements Using `where`
Be careful with one-liner statements using where, especially when where T is shorthand for where T <: Any. These concise statements can easily lead to confusion.

a::T where T = 2                  # this is not `T = 2`, it's `a = 2`

a::T where {T}        = 2         # slightly less confusing notation
a::T where {T <: Any} = 2         # slightly less confusing notation

foo(x::T) where T = 2             # this is not `T = 2`, it's `foo(x) = 2`

foo(x::T) where {T}        = 2    # slightly less confusing notation
foo(x::T) where {T <: Any} = 2    # slightly less confusing notation

Functions

Function arguments can also be type-annotated. The examples below illustrate this, where the function only processes integers.

function foo1(x::Int64, y::Int64)
    x + y 
end

Output in REPL
julia>
foo1(1, 2)
3

julia>
foo1(1.5, 2)
ERROR: MethodError: no method matching foo1(::Float64, ::Int64)

function foo2(x::Vector{T}, y::Vector{T}) where T <: Int64
    x .+ y 
end

Output in REPL
julia>
foo2([1,2], [3,4])
2-element Vector{Int64}: 4 6

julia>
foo2([1,2], [3.0, 4.0])
ERROR: MethodError: no method matching foo2(::Vector{Int64}, ::Vector{Float64})

Note that employing the same parameter T for both arguments forces variables to share the same type. Moreover, types like Int64 preclude the use of Float64, even for numbers like 3.0. If you seek to stay flexible, a more suitable approach is to use an abstract type like Number and two different type parameters.

function foo2(x::T, y::T) where T <: Number
    x + y 
end

Output in REPL
julia>
foo2(1.5, 2.0)
3.5

julia>
foo2(1.5, 2)
ERROR: MethodError: no method matching foo2(::Float64, ::Int64)

function foo3(x::T, y::S) where {T <: Number, S <: Number} 
    x + y 
end

Output in REPL
julia>
foo3(1.5, 2.0)
3.5

julia>
foo3(1.5, 2)
3.5

In fact, the greatest flexibility is achieved when we don't type-annotate function arguments at all, as they will implicitly default to Any. This can be observed below, where all the tabs define the same function.

function foo(x, y)
    x + y 
end

function foo(x::Any, y::Any)
    x + y 
end

function foo(x::T, y::S) where {T <: Any, S <: Any} 
    x + y 
end

function foo(x::T, y::S) where {T, S} 
    x + y 
end

Creating Variables with Some Type

To conclude this section, we present a particular approach for defining variables. This replicates the values of another variable x, while constructing the object with a concrete type. The approach relies on the use of special functions called constructors, which create new instances of a concrete type. These functions are useful for transforming a variable x into another type, provided the transformation is possible.

Constructors are implemented by Type(x), where Type should be replaced with the literal name of the type (e.g., Vector{Float64}). Like any other function, it supports broadcasting.

x = 1

y = Float64(x)
z = Bool(x)

Output in REPL
julia>
y
1.0

julia>
z
true

x = [1, 2, 3]

y = Vector{Any}(x)

Output in REPL
julia>
y
3-element Vector{Any}: 1 2 3

x = [1, 2, 3]

y = Float64.(x)

Output in REPL
julia>
y
3-element Vector{Float64}: 1.0 2.0 3.0

Remark
We can use parametric types as constructors. Moreover, although abstract types can't be instantiated, some can still be used as constructors. In these cases, Julia will attempt to convert the object to a specific concrete type. As we show below, not all types can be used with this purpose.

x = 1

y = Number(x)

Output in REPL
julia>
typeof(y)
Int64

x = [1, 2]

y = (Vector{T} where T)(x)

Output in REPL
julia>
typeof(y)
Vector{Int64}

x = 1

z = Any(x)

Output in REPL
ERROR: MethodError: no constructors have been defined for Any

We can alternatively employ the function convert(T,x), which transforms x to the type T when possible.

x = 1

y = convert(Float64, x)
z = convert(Bool, x)

Output in REPL
julia>
y
1.0

julia>
z
true

x = [1, 2, 3]

y = convert(Vector{Any}, x)

Output in REPL
julia>
y
3-element Vector{Any}: 1 2 3

x = [1, 2, 3]

y = convert.(Float64, x)

Output in REPL
julia>
y
3-element Vector{Float64}: 1.0 2.0 3.0