pic
Personal
Website

8f. Type Stability with Tuples

PhD in Economics

Introduction

A function is considered type stable when, given the types of its arguments, the compiler can accurately predict single concrete types for its expressions. This definition, while universal, takes on different forms when applied to specific objects. So far, we've exclusively concentrated on scalars and vectors, whose conditions for type stability are relatively straightforward.

In this section, we begin the analysis of type stability for other data structures. In particular, we cover tuples, deferring functions and types themselves to subsequent sections. Guaranteeing type stability with tuples is more nuanced compared to vectors, as their type characterization demands more information. Its exploration will challenge our understanding of type stability, demanding a clear grasp of its definition and subtleties.

Warning! - Tuples Are Only Suitable For Small Collections
Remember that tuples should only be used for collections that comprise a few elements. Using them for large collections will result in significant performance degradation, or directly trigger fatal errors.

Features of Tuples

Tuples and vectors are the most ubiquitous forms of collections in Julia, with tuples playing a vital role for two reasons. Firstly, tuples are instrumental in optimizing performance, as they avoid the overhead of memory allocation. This is an aspect that we'll expand on when we analyze static vectors, which can be viewed as tuples with a vector-like representation. The second reason is that exploring tuples automatically encompasses the case of named tuples. These are just tuples with keys given by symbols instead of indices.

In comparison to vectors, tuples possess a more intricate type system. To appreciate this, let's compare the inherent characteristics of each. As for vectors, the necessary information to describe them is relatively minor, as they represent collections of elements sharing a uniform type. Moreover, since vectors can dynamically vary in size, their type doesn't demand information about the number of elements. For instance, the type Vector{Float64} conveys that all elements have type Float64, with the number of elements remaining indeterminate.

For their part, tuples are fixed-size collections that can accommodate heterogeneous types. This makes the type characterization more demanding, requiring the number of elements and the type of each individual element. For instance, consider the variable tup = ("hello", 1), which has type Tuple{String, Int64}. This explicitly indicates that the first element has type String and the second one Int64. Furthermore, it implicitly sets the number of elements to two, with no possibility of appending or removing elements.

The fact that the number of elements is part of the type is evident with tuples containing N elements of the same type T. In this case, Julia provides the convenient alias Ntuple{N, Float64}, which is just syntactic sugar for Tuple{T, T,...,T} where T appears N times. [note] Don't confuse NTuple as an abbreviation for the type NamedTuple.

In the following, we present scenarios where the choice between tuples and vectors have different consequences for type stability.

Tuple Elements Can Have Heterogeneous Types

The type characterizing provides information on each individual element's type. This deems type promotion unnecessary, as it occurs when we define vectors with heterogeneous types. The difference in behavior can be observed in the following example.

tup    = (1, 2, 3.5)                    # type is `Tuple{Int64, Int64, Float64}`

foo(x) = sum(x[1:2])

@code_warntype foo(tup)                 # type stable (output returned is `Int64`)
vector = [1, 2, 3.5]                    # type is `Vector{Float64}` (type promotion)

foo(x) = sum(x[1:2])

@code_warntype foo(vector)              # type stable (output returned is `Float64`)

A consequence of this is that type stability with tuples can be achieved even when its elements have dissimilar types.

tup    = (1, 2, "hello")                # type is `Tuple{Int64, Int64, String}`

foo(x) = sum(x[1:2])

@code_warntype foo(tup)                 # type stable (output is `Int64`)
vector = [1, 2, "hello"]                # type is `Vector{Any}`

foo(x) = sum(x[1:2])

@code_warntype foo(vector)              # type UNSTABLE

Vectors Contain Less Information than Tuples

In some scenarios, our task may demand converting tuples to vectors or vice versa. This conversion can pose challenges for type stability.

Specifically, when creating a vector from a tuple, type instability will arise if the tuple has elements with heterogeneous types. This occurs because vectors must have a uniform type for all their elements. Consequently, the heterogeneous types of the tuple must be encompassed through a supertype. The consequence is that Julia won't be able to specialize.

tup = (1, 2, "hello")         # `Tuple{Int64, Int64, String}`


function foo(tup)
    x = Vector(tup)           # 'x' has type `Vector(Any)}`

    sum(x)
end

@code_warntype foo(tup)       # type UNSTABLE
tup = (1, 2, 3)               # `Tuple{Int64, Int64, Int64}` or just `NTuple{3, Int64}`


function foo(tup)
    x = Vector(tup)           # 'x' has type `Vector(Int64)}`

    sum(x)
end

@code_warntype foo(tup)       # type UNSTABLE

For its part, creating a tuple from a vector will inevitably cause a type instability, regardless of the vector's characteristics. The reason is that vectors don't store information about the number of elements contained. Consequently, the compiler must treat tuples as having a variable number of arguments. This makes the function call type unstable, as each possible number of elements results in a different concrete type.

x   = [1, 2, "hello"]           # 'Vector{Any}' has no info on each individual type


function foo(x)
    tup = Tuple(x)              # 'tup' has type `Tuple`

    sum(tup[1:2])
end

@code_warntype foo(x)           # type UNSTABLE
x   = [1, 2, 3]                 # 'Vector{Int64}' has no info on the number of elements


function foo(x)
    tup = Tuple(x)              # 'tup' has type `Tuple{Vararg(Int64)}` (`Vararg` means "variable arguments")

    sum(tup[1:2])
end

@code_warntype foo(x)           # type UNSTABLE

Dispatch By Value

A key takeaway from the previous example is that defining tuples from vectors invariably introduce type instability. A simple remedy for this is to create the corresponding tuple outside the function, and then pass it as an argument. This is demonstrated in the code snippet below.

Tuple as Function Argument
x   = [1, 2, 3]
tup = Tuple(x)

foo(tup) = sum(tup[1:2])

@code_warntype foo(tup)         # type stable

This approach should be your first consideration when addressing the type instability. Nonetheless, there may be scenarios where defining the tuple inside the function is unavoidable. For these cases, there are a few alternatives we could consider.

Note first that simply passing the vector's number of elements as an argument doesn't solve the issue. The reason is that the compiler generates method instances based on information about types, not values. This means that a function argument like length(x) merely informs that the number of elements has type Int64, without providing any additional insight.

Instead, one effective solution is to define the tuple's length using a literal value, as demonstrated below.

x   = [1, 2, 3]

function foo(x)
    tup = NTuple{length(x), eltype(x)}(x)

    sum(tup)
end

@code_warntype foo(x)        # type UNSTABLE
x   = [1, 2, 3]

function foo(x)
    tup = NTuple{3, eltype(x)}(x)

    sum(tup)
end

@code_warntype foo(tup)       # type stable

The downside of this solution is that it defeats the purpose of having a generic function, as it restricts the function to tuples of a single predetermined size. In this context, to eliminate the type instability without constraining functionality, we need to introduce a more advanced solution. This is based on a technique known as dispatch by value. Since this approach is more complex to implement, textit{I recommend using it only when passing the tuple as a function argument is unfeasible}.

Next, we lay out the principles of dispatch by value, and then apply the technique to the specific case of tuples.

Definition

Dispatch by value enables users to pass information about values to the compiler. Implementing this feature, nonetheless, requires a workaround, since the compiler only gathers information about types. The hack consists of creating a type that stores values as type parameters. In the case of tuples, this type parameter is simply the vector's number of elements.

The functionality is achieved via the built-in type Val, whose syntax can be best explained with an example. Suppose a function foo and a value a that you wish the compiler to know. The technique requires defining foo with a type-annotated argument having no name, ::Val{a}. After this, you must call foo passing an argument Val(a), which instantiates a type with parameter a.

To illustrate the syntax, we revisit an example from previous sections. This considers a variable y that could be an Int64 or Float64, contingent upon a condition. The ambiguity of y's type is then transmitted to any subsequent operation, leading to type instability.

Dispatch by value is implemented by defining the condition as a type parameter of Val. In this way, the compiler will receive information about whether condition is true or false, and therefore know y's type. This makes it possible to specialize its operations.

function foo(condition)
    y = condition ? 1 : 0.5      # either `Int64` or `Float64`
    
    [y * i for i in 1:100]
end

@code_warntype foo(true)         # type UNSTABLE
@code_warntype foo(false)        # type UNSTABLE
function foo(::Val{condition}) where condition
    y = condition ? 1 : 0.5      # either `Int64` or `Float64`
    
    [y * i for i in 1:100]
end

@code_warntype foo(Val(true))    # type stable
@code_warntype foo(Val(false))   # type stable

Warning!
The function argument Val must be defined with {}, but called with (). This is because types define their parameters with {}, while instances of types require functions.

Dispatching by Value with Tuples

Let's now revisit the conversion of vectors to tuples. As we previously discussed, type instability arises because vectors don't store their size as part of their type information, leaving the compiler without sufficient information to determine the tuple's type.

Dispatch by value provides a solution to this issue: by passing the vector's length as a type parameter, the function call becomes type stable.

x = [1, 2, 3]

function foo(x, N)
    tuple_x = NTuple{N, eltype(x)}(x)   

    2 .+ tuple_x    
end

@code_warntype foo(x, length(x))        # type UNSTABLE
x = [1, 2, 3]

function foo(x, ::Val{N}) where N
    tuple_x = NTuple{N, eltype(x)}(x)   

    2 .+ tuple_x    
end

@code_warntype foo(x, Val(length(x)))   # type stable