pic
Personal
Website

8f. Gotchas for Type Stability

PhD in Economics
Code Script
This section's scripts are available here, under the name allCode.jl. They've been tested under Julia 1.10.0.

Introduction

This section considers various scenarios that cause type instabilities. We dub them as "gotchas" as they aren't immediately obvious. We also propose suggestions to address them.

Gotcha 1: Integers and Floats

When defining functions, keep in mind that Int64 and Float64 are distinct types. Mixing them may introduce type instability inadvertently.

In the following example, the function foo takes a numeric variable x as its argument and performs two tasks. The first one defines a variable y as a transformation of x, where all negative values are replaced by zero. The second task runs an operation based on the resulting y.

function foo(x)
    y = (x < 0) ?  0  :  x
    
    return [y * i for i in 1:100]
end

@code_warntype foo(1)      # type stable
@code_warntype foo(1.)     # type unstable
function foo(x)
    y = (x < 0) ?  zero(x)  :  x
    
    return [y * i for i in 1:100]
end

@code_warntype foo(1)      # type stable
@code_warntype foo(1.)     # type stable

As for the first tab, given that 0 has type Int64, no type instability arises if x is Int64 too. On the contrary, if x is Float64, the compiler must contemplate the possibility that y could be an Int64 or a Float64. [note] A similar problem would occur if we replaced 0 by 0. and x is an integer.

Julia can handle combinations of Int64 and Float64 quite effectively. Therefore, the latter type instability wouldn't be a significant issue if the operation involving y calls y only once. Indeed, @code_warntype would only issue a yellow warning that therefore could be safely ignored. However, in our example, foo performs an operation that repeatedly uses y, incurring the cost of type instability multiple times. As a result, @code_warntype issues a red warning, indicating a more serious performance issue.

The second tab proposes a solution based on a function that returns the zero element corresponding to the type of x. This approach isn't limited to zero, and can actually be applied to any value. This would be achieved by using either convert(typeof(x), <value>) or oftype(x, <value>) to convert <value> to the same type as x.

Gotcha 2: Collections of Collections

When working in data analysis, collections of collections emerge naturally. For example, the DataFrames package, which is the standard tool for handling datasets, relies on this structure. It defines a table where each column correspond to a variable. As we haven't introduced this package, we consider a vector of vectors in the following, which replicates the same structure.

For the demonstration, suppose a vector of vectors named data, along with an operation depending on one of its inner vectors. The key feature giving rise to type instability is that data has type Vector{Vector}. This is actually a desirable feature in data analysis, as it makes it possible to hold inner vectors of different types (e.g., strings, numbers). However, it also means that data lacks information about the types of its inner vectors—Vector{Vector} only conveys that the elements held are vectors, without specifying the type comprised by the elements held in the inner vectors. Consequently, any function that takes data as an argument and operates on an inner vector will be type unstable.

A straightforward solution is to include barrier functions that take inner vectors as their arguments. This approach is able to correct the type instability, as the function will attempt to identify concrete types for the inner vectors passed.

In the following, we illustrate the approach by supposing that the operation is based on the inner vector vec2. The code snippets in both tabs pass data as a function's argument, with the second one additionally using a barrier function with vec2 as its argument. Note that the barrier function is defined in-place, making the value of vec2 (and hence of data) be updated.

vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2] 

function foo(data) 
    for i in eachindex(data[2])
        data[2][i] = 2 * i
    end
end

@code_warntype foo(data)            # type unstable
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2] 

foo(data) = operation!(data[2])

function operation!(x)
    for i in eachindex(x)
        x[i] = 2 * i
    end
end

@code_warntype foo(data)            # barrier-function solution

Gotcha 3: Barrier Functions

While barrier functions can mitigate type instabilities, it's essential to remember that the parent function remains type unstable. This implies that if we fail to resolve the type instability before executing a repeated operation, the performance cost of that instability will be incurred multiple times.

To illustrate this point, let's revisit the last example involving a vector of vectors. Below, we present two incorrect approaches to using a barrier function under such scenario, followed by a demonstration of its proper application.

vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2] 

operation(i) = (2 * i)

function foo(data) 
    for i in eachindex(data[2])
        data[2][i] = operation(i)
    end
end

@code_warntype foo(data)            # type unstable
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2] 

operation!(x,i) = (x[i] = 2 * i)

function foo(data) 
    for i in eachindex(data[2])
        operation!(data[2], i)
    end
end

@code_warntype foo(data)            # type unstable
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2] 

function operation!(x)
    for i in eachindex(x)
        x[i] = 2 * i
    end
end

foo(data) = operation!(data[2])

@code_warntype foo(data)            # barrier-function solution

Gotcha 4: Inference is by Type, Not By Value

Type inference in a function is exclusively based on the types of its arguments. This entails in particular that values do not participate in the identification of types. To explain its implications, let's consider the following concrete example.

Type Inference Is Not By Value
function foo(condition)
    y = condition ?  2.5  :  1
    
    return [y * i for i in 1:100]
end

@code_warntype foo(true)         # type unstable
@code_warntype foo(false)        # type unstable

At first glance, we might erroneously conclude that foo(true) should be type stable: the value of condition is true, so that y = 2.5 and therefore y will have type Float64. However, values don't participate in multiple dispatch, meaning that Julia's compiler ignores the value of condition when inferring y's type. Ultimately, y is treated as potentially being either Int64 or Float64, leading to type instability.

The issue in this example is reminiscent of the first "gotcha" we explored. Indeed, it can be resolved using similar techniques, or alternatively employing a barrier function. However, analyzing the type instability from the type-inference process reveals far-reaching consequences, extending beyond this specific example. In particular, it has deep implications for the use of tuples.

Identifying a type requires having information on all the parameters that characterizes it. The type Tuple in particular is defined by both the type of each variable and the number of elements. For example, x = (1, 2.4) has type Tuple{Int64, Float64}, meaning that x is an object with 2 elements with types Int64 and Float64 respectively. This type is different from the one of x = (1, 2.4, 3), which has 3 elements with types Int64, Float64, and Int64 respectively. In comparison, the type Vector only identifies the same type for each element, and doesn't require information on the number of elements . For instance, both x = [1, 2.4] and x = [1, 2.4, 3] have the same type, given by Vector{Float64}.

One consequence of this is that when we create tuples using vectors as their source, we lack sufficient information to identify a tuple's type. In particular, a tuple created from an object Vector{Float64} successfully identifies the type of each element (Float64), but the number of elements remains unknown. Consequently, if we create a tuple inside a function based on a vector, the function will be type unstable.

This can be observed in the following examples. The second tab shows that the issue won't be solved by merely passing the number of elements as an argument. The reason is that this wouldn't identify the value of the argument, only its type. This information isn't relevant to know for dispatch, as the compiler will only infer that the number of elements is an integer, not the number itself.

The easiest fix is to define the tuple outside the function and then pass it as an argument. This allows the compiler to gather all the information from the Tuple type.

x       = [1,2,3]


function foo(x)                         # 'Vector{Int64}' has no info on the number of elements
    tuple_x = Tuple(x)          

    2 .+ tuple_x
end

@code_warntype foo(x)                   # type unstable
x       = [1,2,3]


function foo(x, N)                      # The value of 'N' isn't considered, only its type
    tuple_x = NTuple{N, eltype(x)}(x)   

    2 .+ tuple_x
end

@code_warntype foo(x, length(x))        # type unstable
x       = [1,2,3]
tuple_x = Tuple(x)

function foo(x)


    2 .+ x
end

@code_warntype foo(tuple_x)             # type stable

Dispatch by Value

An alternative solution relies on a workaround that allows Julia to dispatch by value. This is achieved by the type Val, which makes it possible to pass information regarding values to the compiler. Specifically, for any function foo and value a that you seek the compiler to know, you need to include ::Val{a} as an argument and then use Val(a) when calling foo.

The following example applies this approach by revisiting the cases studied above. For the implementation, you could find it useful to revisit the keyword where for types introduced here.

In the first example, the type instability of foo emerged because the value of condition wasn't known by the compiler. Adding Val, along with the keyword where, allows passing the value of condition to the compiler. Note that where is necessary, as otherwise it won't identify the specific value of condition.

function foo(condition)
    y = condition ?  2.5  :  1
    
    return [y * i for i in 1:100]
end

@code_warntype foo(true)         # type unstable
@code_warntype foo(false)        # type unstable
function foo(::Val{condition}) where condition
    y = condition ?  2.5  :  1
    
    return [y * i for i in 1:100]
end

@code_warntype foo(Val(true))    # type stable
@code_warntype foo(Val(false))   # type stable

Below, we apply the Val approach to the example with tuples.

x = [1,2,3]

function foo(x)
    tuple_x = Tuple(x)          

    2 .+ tuple_x
end

@code_warntype foo(x)                   # type unstable
x = [1,2,3]

function foo(x, ::Val{N}) where N
    tuple_x = NTuple{N, eltype(x)}(x)   

    2 .+ tuple_x    
end

@code_warntype foo(x, Val(length(x)))   # type stable

Gotcha 5: Variables as Default Values for Keyword Arguments

In functions, it's possible to set variables as default values when defining keyword arguments. Nonetheless, these variables will be considered global, turning the function type unstable.

As a best practice, it's recommended to avoid using variables as default values whenever possible. Instead, opt for concrete values. In cases where using a variable as a default value is necessary, there are several strategies you could deploy.

The first set of solutions affect the variable to be used as a default value in its global scope. They include to type-annotate the global variable or define it as a const. Additionally, you could define a function storing the default value, taking advantage the identification of types when a function is called. The second set of solutions is based on a local approach. They require type-annotating either the keyword argument or the default value. All these strategies are demonstrated below.

foo(; x) = x

β = 1
@code_warntype foo(x=β)         #type stable
foo(; x = 1) = x


@code_warntype foo()            #type stable
foo(; x = β) = x

β = 1
@code_warntype foo()            #type unstable

foo(; x = β) = x

const β = 1
@code_warntype foo()            #type stable

foo(; x = β) = x

β::Int64 = 1
@code_warntype foo()            #type stable

foo(; x = β()) = x

β() = 1
@code_warntype foo()            #type stable

foo(; x::Int64 = β) = x

β = 1
@code_warntype foo()            #type stable
foo(; x = β::Int64) = x

β = 1
@code_warntype foo()            #type stable

Positional Arguments as Keyword Arguments
It's possible to reuse positional arguments as default values for keyword arguments without compromising type instability.

Based on this, we can propose the following solution to the code above, where β is introduced as a positional argument.

Type Stable
foo(β; x = β) = x

β = 1
@code_warntype foo(β)            #type stable

Gotcha 6: Closures can Easily Introduce Type Instabilities

Closures are created when a function is nested inside another function, granting the inner function access to the outer function's scope. They show up explicitly when defining functions within a function, but also implicitly when using anonymous functions as function arguments in a function definition.

The biggest downside of closures is that they can easily introduce type instabilities. And, although there have been some improvements in this area, the issues have been around for several years. This is why it's crucial to be aware of its subtle consequences and how to address them.

Closures Are Common in Coding

Writing code with closures emerge naturally in various contexts. The approach allows us to write self-contained code, keeping all steps of a task within a single function. One common scenario where closures are natural is when we need to prepare data (e.g., defining parameters or initializing variables), and then compute an output based on that data. Below, we implement a scenario like this with and without a closure. We do it generically, so that you can easily identify the pattern.

function task()    
        # <here, you define parameters and initialize variables>
       
    function output() 
        # <here, you do some computations with the variables and parameters>
    end

    return output()
end

task()

function task()
        # <here, you define parameters and initialize variables>
    
    return output(<variables>, <parameters>)
end

function output(<variables>, <parameters>)
        # <here, you do some computations with the variables and parameters>
end

task()

While the approach with closures seems natural, it can introduce type instability if we:

  • redefine variables/arguments (e.g., when we update a variable in an output),

  • affect the order in which we define functions, or

  • use anonymous functions.

We explore these cases below, where we refer to the containing function as the outer function and the closure as the inner function.

When the Issue Arises

Before considering specific scenarios, let's start examining three examples. They encompass all the possible situations where closures could make type instability emerge.

The first example reveals that the location of the inner function could matter.

function foo()
    x            = 1
    bar()        = x
    
    return bar()
end

@code_warntype foo()      # type stable
function foo()
    bar(x)       = x
    x            = 1    
    
    return bar(x)
end

@code_warntype foo()      # type stable
function foo()
    bar()        = x
    x            = 1
    
    return bar()
end

@code_warntype foo()      # type unstable
function foo()
    bar()::Int64 = x::Int64
    x::Int64     = 1       

    return bar()
end

@code_warntype foo()      # type unstable
function foo()    
    x = 1
    
    return bar(x)
end

bar(x) = x

@code_warntype foo()      # type stable

The second example shows that type instability arises if you use closures and simultaneously redefine variables or arguments. This holds even when you redefine a variable x by the same object, including a trivial expression as x = x. The example also reveals that type annotating either the variable redefined or the closure doesn't solve the problem.

function foo()
    x            = 1
    x            = 1            # or 'x = x', or 'x = 2'
    
    return x
end

@code_warntype foo()            # type stable
function foo()
    x            = 1
    x            = 1            # or 'x = x', or 'x = 2'
    bar(x)       = x
    
    return bar(x)
end

@code_warntype foo()            # type stable
function foo()
    x            = 1
    x            = 1            # or 'x = x', or 'x = 2'
    bar()        = x
        
    return bar()
end

@code_warntype foo()            # type unstable
function foo()
    x::Int64     = 1
    x            = 1
    bar()::Int64 = x::Int64
    
    return bar()
end

@code_warntype foo()            # type unstable
function foo()
    x::Int64     = 1
    bar()::Int64 = x::Int64
    x            = 1
    
    return bar()
end

@code_warntype foo()            # type unstable
function foo()
    bar()::Int64 = x::Int64
    x::Int64     = 1
    x            = 1
    
    return bar()
end

@code_warntype foo()            # type unstable
function foo()
    x            = 1
    x            = 1            # or 'x = x', or 'x = 2'    
        
    return bar(x)
end

bar(x) = x

@code_warntype foo()            # type stable

Finally, the last example highlights that the order in which you define closures could matter for type stability. The third tab in particular shows that you could avoid the issue if you pass any subsequent closure as an argument. However, this approach is at odds with how Julia works in general, as we don't usually include functions as arguments of functions.

function foo(x)
    closure1(x) = x
    closure2(x) = closure1(x)
    
    return closure2(x)
end

@code_warntype foo(1)            # type stable
function foo(x)
    closure2(x) = closure1(x)
    closure1(x) = x
    
    return closure2(x)
end

@code_warntype foo(1)            # type unstable
function foo(x)
    closure2(x, closure1) = closure1(x)
    closure1(x)           = x
    
    return closure2(x, closure1)
end

@code_warntype foo(1)            # type stable
function foo(x)
    closure2(x) = closure1(x)    
    
    return closure2(x)
end

closure1(x) = x

@code_warntype foo(1)            # type stable

Overall, the examples reveal a common pattern: the closure accesses variables or functions from the outer function's scope, without passing them as arguments. If we still want to employ closures, one generic solution is to specify all the arguments of the inner function, including the functions used.

In the following, we'll examine specific implementations of these patterns. They reveal that the issue can occur more frequently than we might expect. For each scenario, we'll also provide alternative case-dependent solutions, allowing you to employ a closure approach. Nonetheless, if you want to stay on the safe, the general guideline is to avoid closures if it's possible.

"But No One Writes Code like That"

i) Transforming Variables through Conditionals

x = [1,2]; β = 1

function foo(x, β)
    (β < 0) && (β = -β)         # transform 'β' to use its absolute value

    bar(x) = x * β

    return bar(x)
end

@code_warntype foo(x, β)        # type unstable
x = [1,2]; β = 1

function foo(x, β)
    (β < 0) && (β = -β)         # transform 'β' to use its absolute value

    bar(x,β) = x * β

    return bar(x,β)
end

@code_warntype foo(x, β)        # type stable
x = [1,2]; β = 1

function foo(x, β)
    δ = (β < 0) ? -β : β        # transform 'β' to use its absolute value    

    bar(x) = x * δ

    return bar(x)
end

@code_warntype foo(x, β)        # type stable
x = [1,2]; β = 1

function foo(x, β)
    β = abs(β)                  # 'δ = abs(β)' is preferable (you should avoid redefining variables) 

    bar(x) = x * δ

    return bar(x)
end

@code_warntype foo(x, β)        # type stable

Recall that the compiler doesn't dispatch by value, and so whether the condition holds is irrelevant. For instance, the type instability would still hold if we wrote 1 < 0 instead of β < 0. Moreover, the value used to redefine β is unimportant for the issue to arise, with the same conclusion holding if you write β = β.

ii) Anonymous Functions inside a Function

Using an anonymous function inside a function is a common type of closure. Considering this, type instability also arises in the example above if we replace the inner function bar for an anonymous function. To demonstrate this, we apply filter with an anonymous function that selects the values in x greater than β.

x = [1,2]; β = 1

function foo(x, β)
    (β < 0) && (β = -β)         # transform 'β' to use its absolute value
    
    filter(x -> x > β, x)       # keep elements greater than 'β'
end

@code_warntype foo(x, β)        # type unstable
x = [1,2]; β = 1

function foo(x, β)
    δ = (β < 0) ? -β : β        # define 'δ' as the absolute value of 'β'
    
    filter(x -> x > δ, x)       # keep elements greater than 'δ'
end

@code_warntype foo(x, β)        # type stable
x = [1,2]; β = 1

function foo(x, β)
    β = abs(β)                  # 'δ = abs(β)' is preferable (you should avoid redefining variables) 
    
    filter(x -> x > β, x)       # keep elements greater than β
end

@code_warntype foo(x, β)        # type stable

iii) Variable Updates

function foo(x)
    β = 0                      # or 'β::Int64 = 0'
    for i in 1:10
        β = β + i              # equivalent to 'β += i'
    end

    bar() = x + β              # or 'bar(x) = x + β'

    return bar()
end

@code_warntype foo(1)          # type unstable
function foo(x)
    β = 0
    for i in 1:10
        β = β + i
    end

    bar(x,β) = x + β

    return bar(x,β)
end

@code_warntype foo(1)          # type stable

No Dispatch By Value
x = [1,2]; β = 1

function foo(x, β)
    (1 < 0) && (β = β)

    bar(x) = x * β

    return bar(x)
end

@code_warntype foo(x, β)        # type unstable

iv) The Order in Which you Define Functions Could Matter Inside a Function

To illustrate this claim, suppose you want to define a variable x that depends on a parameter β. However, there's a catch: β is measured in one unit (e.g., meters), while x requires β to be in a different unit of measurement (e.g., centimeters). This implies that we must rescale β to the appropriate unit, before defining x.

Depending on how we implement the operation, a type instability could emerge.

function foo(β)
    x(β)                  =  2 * rescale_parameter(β)
    rescale_parameter(β)  =  β / 10

    return x(β)
end

@code_warntype foo(1)      # type unstable
function foo(β)
    rescale_parameter(β)  =  β / 10
    x(β)                  =  2 * rescale_parameter(β)  
    
    return x(β)
end

@code_warntype foo(1)      # type stable