pic
Personal
Website

7e. Functions: Type Inference and Multiple Dispatch

PhD in Economics

Introduction

In Julia, functions are not only a fundamental building block of programming, but also a key to unlocking high performance. This is by design, as Julia's functions have been engineered from the outset to generate efficient machine code. However, to fully unlock their potential, we must understand the underlying process of function calls.

Essentially, when a function is called, Julia attemps to identify concrete types for its variables and selects the most suitable method accordingly. At the heart of the process are three interconnected mechanisms: dispatch, compilation, and type inference. This section will provide a detailed explanation of each.

Relevance of Functions

Beyond technical details, the main takeaway from this subsection is that wrapping operations in functions is a necessary condition for high performance. To understand why this is so, this subsection starts by outlining the detrimental consequences of global variables.

Global variables include those defined outside a function's scope. The issue causing their low performance is that Julia treats them as potentially embodying any value and therefore any type. This decision has been adopted given that, even if the variable holds a value at that moment, the user may reassign it at any part of the script. For their part, local variables are those defined within the function, including function arguments.

The use of global variables isn't limited to operations in the global scope, but also to functions when the variable isn't passed as an argument. Considering this, our conclusions apply to both following cases.

x     = 2

y     = 3 * x

Output in REPL
julia>
y
6

x     = 2

foo() = 3 * x

Output in REPL
julia>
foo()
6

Recall that an expression like x = 2 is shorthand for x::Any = 2. This reflects that global variables default to Any, unless they're type-annotated. Also remember that only concrete types can be instantiated, meaning that values can only adopt a concrete type. Consequently, x::Any shouldn't be understood as x having type Any, but rather that x can take any concrete type that is a subtype of Any. Since Any is always the top hierarchy in Julia, this is equivalent to saying that x's types are unrestricted.

Working with such a variable prevents specialization of *, as Julia must consider multiple possible methods for the computation, one for each possible concrete type of x. In practical terms, it entails Julia generating code with multiple branches, possibly involving checks, type conversions, and object creations. The consequence is degraded performance.

Even if we had type-annotated x with a concrete type like x::Int64 = 2, the performance limitations wouldn't completely go away. The reason is that certain optimizations are only possible if the scope of variables is clearly delimited and values are known. This is because Julia gains a holistic view of all the operations to be performed, thus allowing for further optimizations.

Functions address all these considerations, which is achieved by executing a series of steps that we cover next.

Functions and Methods

A function is nothing but a name gathering an arbitrary number of methods. In turn, each method defines the function body for calls with a specific number and types of arguments. The list of methods associated with a function foo can be retrieved by executing methods(foo).

To illustrate these concepts, let's consider an example where we explicitly define methods of a function foo. This is achieved through the aid of the operator ::.

Methods

foo(a,b)                  = a + b
foo(a::String, b::String) = "This is $a and this is $b"

Output in REPL
julia>
foo(1,2)
3

julia>
foo("some text", "more text")
"This is some text and this is more text"

The code snippet defines the behavior of foo, which depends on the argument types at call time. Specifically, it specifies that the body of foo(a,b) applies to any combination of types for a and b, except when each has type String, in which case the relevant function body is foo(a::String, b::String). This explains the different outputs obtained for each function call.

The example also reveals that methods don't have to share the same functionality. Obviously, grouping drastically different operations under the same function's name isn't recommended. However, the technique is valuable when used properly. For instance, it enables the adoption of more efficient algorithms, tailored to the specific types being considered.

Furthermore, methods don't need to have the same number of arguments. For instance, defining the following methods for a function bar is possible.

Code

bar(x)       = x
bar(x, y)    = x + y
bar(x, y, z) = x + y + z

Output in REPL
julia>
bar(1)
1

julia>
bar(1, 2)
3

julia>
bar(1, 2, 3)
6

The feature is particularly relevant for extending a function's functionality. To illustrate this point, let's take the function sum. Until now, we've only used its simplest form, sum(x), which adds all the elements of a collection x. However, sum also supports another method sum(<function>, x), where the elements of x are transformed through <function>, before summing them.

Code

x = [2, 3, 4]

y = sum(x)          # 2 + 3 + 4
z = sum(log, x)     # log(2) + log(3) + log(4)

Multiple Dispatch

Given a function and its methods, let's now consider the process triggered when a function is called. With this purpose, suppose in particular the following function foo.

foo(a, b) = 2 + a * b

Output in REPL
julia>
foo(1, 2)
4

foo(a::T, b::S) where {T,S} = 2 + a * b

Output in REPL
julia>
foo(1, 2)
4

The first tab defines the method foo(a,b), which takes arguments without any explicit type restrictions. Recall that untyped variables are actually implicitly type-annotated with Any. Consequently, the function body holds for any combination of types of a and b. This can be observed more clearly in the second tab, which defines the same function but explicitly stating the assumptions on types: as {T,S} is equivalent to {T<:Any, S:<Any}, a and b accept values of any type.

Method vs Method Instances

When foo(1,2) is executed, Julia employs a mechanism known as multiple dispatch, where it selects a specific method implementation based on the concrete types of a and b. It's worth remarking that this choice is exclusively based on the arguments' types, not their values.

To fully understand multiple dispatch, it's essential to distinguish between methods and method instances. The former defines the function body to be executed, based on the number and argument types of the function call. A method instance, on the other hand, is a specific implementation of the method.

The following diagram illustrates the difference, depicting the process that unfolds after running foo(1,2).

Multiple Dispatch

Specifically, foo(1,2) starts by inferring concrete types for a and b, which in this case are Int64. This information is then used to select the relevant function body to be called. This involves searching through all available methods of foo until a match for the concrete types of a and b is found. In our example, foo has only one method foo(a,b) = 2 + a * b, which applies to all combinations of a and b that are subtypes of Any. Consequently, the relevant function body is 2 + a * b.

The operations to be performed are then forwarded to the compiler, which is charge of their implementation. This entails choosing a method instance for arguments a and b with concrete type Int64. If a method instance exists, Julia will directly use it to compute foo(1,2). Otherwise, a new will be created and stored for subsequent runs. This procedure is referred to as compilation, which we explain in detail next.

Compilation and Type Inference

Let's continue with the same example, but adding several function calls.

Code

foo(a, b) = 2 + a * b

Output in REPL
julia>
foo(1, 2)
4

julia>
foo(3, 2)
8

julia>
foo(3.0, 2)
8.0

If we've just started a new session, so that foo(1,2) is called for the first time, Julia won't find any method instance for foo(a::Int64, b::Int64). Consequently, it'll have to generate a new implementation for this unique combination of types. Compiling code is the reason a function's first call is slower than the subsequent runs, a phenomenon that is dubbed Time To First Plot.

Note that this code is eventually stored, so that the computation of foo(3, 2) will directly invoke the method instance foo(a::Int64, b::Int64). Nonetheless, the call foo(3.0, 2) requires compiling a new method instance foo(a::Float64, b::Int64), resulting in a first-call latency again.

Type Inference

The compilation process follows a series of steps whose understanding is crucial for performance. To explain them, let's consider compilation for the method instance foo(a::Int64, b::Int64).

In this case, the compiler's task is to generate computer instructions to calculate 2 + a * b. This process begins with what's known as type inference, where the compiler attempts to identify concrete types of all variables and expressions involved. In our example, this means inferring concrete types for 2, a = 1, and b = 2.

Type inference establishes a crucial difference between executing operations in the global scope or through functions. In the former, Julia doesn't attempt to infer concrete types when it creates to code for computing the operations, thereby preventing specialization.

Since all variables are Int64 in the example considered, the compiler is capable of creating instructions specialized to variables of type Int64. This is the essence of type stability, where the compiler identifies unique concrete types and specialization is possible.

Type stability is far from being guaranteed, with the compiler unable to determine a single concrete type. When this is the case, it must consider multiple possibilities, generating code for each potential type. This lack of specialization can degrade performance severely. For instance, this is what occurs below.

Type Unstable

x      = [1, 2, "hello"]    # Vector{Any}

foo(x) = x[1] + x[2]        # type unstable

Output in REPL
julia>
foo(x)
3

In the example, the compiler can't identify a unique concrete type for x[1] and x[2]. Consequently, it can't specialize the operation +.

Remarks on Type Inference

Below, we indicate various remarks about type inference that are worth keeping in mind.

Global Variables Inherit Its Global Type

Julia's attempt to identify concrete types is limited to local variables. Instead, any global variable will have its type inherited from the global scope. For instance, consider the following example.

a         = 2
b         = 1

foo(a)    = a * b

Output in REPL
julia>
foo(a)
2

a         = 2
b::Number = 1

foo(a)    = a * b

Output in REPL
julia>
foo(a)
2

In both examples b is a global variable. Consequently, b's type in the first tab is inferred to Any, while in the second tab to Number.

Type-Annotating Function Arguments Is Not Necessary For Performance

Given the concepts discussed so far, it's clear that identifying types is key for achieving performance. In fact, as we'll see in upcoming sections, type stability is directly related to this, as it refers to a function's capability for accurately predicting concrete types.

From this, it could be wrongly inferred that type-annotating function arguments is essential for performance (or that, at least, it could provide a boost). However, thanks to type inference, this would be actually redundant. Furthermore, doing so may be counterproductive, as it would unnecessarily constrain the function to specific types. To appreciate this loss of flexibility, compare the following scripts.

foo1(a::Float64, b::Float64) = a * b

Output in REPL
julia>
foo1(0.5, 2.0)
1.0

julia>
foo1(1, 2)
ERROR: MethodError: no method matching foo1(::Int64, ::Int64)

foo2(a, b) = a * b

Output in REPL
julia>
foo2(0.5, 2.0)
1.0

julia>
foo2(1, 2)
2

The function's first tab only accepts arguments with type Float64. Note that even integer variables are rejected, as function arguments aren't converted to a common type. On the contrary, the function's second tab entails the same process for Float64 inputs, but additionally allows for other types. The flexibility stems from the implicit type-annotation Any for the function arguments.

Packages Commonly Type-Annotate Function Arguments
When inspect the code of packages, you may notice that function arguments are often type-annotated. The reason for this isn't related to performance, but rather to ensure the function's intended usage, safeguarding against inadvertent type mismatches.

For instance, suppose a function that computes a theater's revenues through nr_tickets * price. In Julia, the operator * multiplies numbers, but also concatenates words for expressions with type String. Therefore, without type-annotations, the function can be misused. This is shown in the first table, with the second tab addressing the issue by asserting types.

revenue1(nr_tickets, price) = nr_tickets * price

Output in REPL
julia>
revenue1(3, 2)
6

julia>
revenue1("this is ", "allowed")
"this is allowed"

revenue2(nr_tickets::Int64, price::Number) = nr_tickets * price

Output in REPL
julia>
revenue2(3, 2)
6

julia>
revenue2("this is ", "not allowed")
ERROR: MethodError: no method matching revenue2(::String, ::String)