pic
Personal
Website

8d. Type Stability with Global Variables

PhD in Economics

Introduction

Variables can be categorized as local or global according to the code block in which they live. Specifically, global variables can be accessed and modified anywhere in the code, while local variables are only accessible within a specific scope. In the context of this section, the scope of interest is a function, so local variables will exclusively refer to function arguments and variables defined within the function.

The distinction is especially relevant for this chapter since global variables are a common source of type instability. The issue arises because Julia's type system doesn't assign specific concrete types to global variables. As a result, the compiler is forced to consider multiple possibilities for any computation involving these variables. This limitation prevents specialization, leading to reduced performance.

The current section explores two approaches to working with global variables: type-annotations and constants. Defining global variables as constants is a natural choice when the values are truly fixed, such as in the case of π = 3.14159. More broadly, constants can be used in any scenario where the variable remains unmodified throughout the script. Compared to type annotations, constants offer better performance, as the compiler gains knowledge of both the type and value, rather than just the type. This feature allows for further optimizations, effectively making the behavior of constants within a function be indistinguishable from that of a literal value. [note] Literal values refer to values expressed directly in the code (e.g., 1, "hello", or true), in contrast to values coming from a variable input.

Warning! - You Should Still Wrap Code in a Function
Even if you implement the fixes proposed for global variables, optimal performance still calls for wrapping tasks in functions. The reason is thatfunctions implement optimizations that aren't possible in the global scope.

When Are We Using Global Variables?

Before exploring approaches for handling global variables, let's first identify scenarios in which global variables arise. To this end, we present two cases. The first one presents the simplest scenario possible, where operations are performed directly in the global scope. For its part, the second tab illustrates a more nuanced case, where a function accesses and operates on a global variable.

The third tab serves as a counterpoint, implementing the same operations but within a self-contained function. By definition, self-contained functions exclusively operate with locally defined variables. Consequently, the comparison of the last two tabs highlights the performance lost by relying on global variables.

# all operations are type UNSTABLE (they're defined in the global scope)
x = 2

y = 2 * x 
z = log(y)
x = 2

function foo() 
    y = 2 * x 
    z = log(y)

    return z
end

@code_warntype foo() # type UNSTABLE
x = 2

function foo(x) 
    y = 2 * x 
    z = log(y)

    return z
end

@code_warntype foo(x)  # type stable

Self-contained functions offer advantages that extend beyond performance gains: they enhance readability, predictability, testability, and reusability. These benefits were briefly considered in a previous section, and come from the view of functions as embodying a specific task.

Specifically, self-contained functions are easier to reason about, as understanding their logic doesn't require tracking variables across the entire script. Moreover, their output depends solely on their input parameters, without depending on the script's state regarding global variables. This makes them more predictable, also simplifying the debugging of code. Finally, by acting as a standalone program with a clear well-defined purpose, they can be reused for similar tasks, reducing code duplication and facilitating code maintainability.

Achieving Type Stability With Global Variables

The previous example emphasized the benefits of self-contained functions, providing compelling reasons to avoid global variables. Despite this, global variables can be highly convenient in certain scenarios, as when we work with truly constants. Considering this, next we present two approaches that let us work with global variables, while addressing their performance penalty.

Constant Global Variables

Declaring global variables as constants requires adding the const keyword before the variable's name. For instance, const x = 3. This approach can be applied to variables of any type, including collections.

const a = 5
foo()   = 2 * a

@code_warntype foo()        # type stable
const b = [1, 2, 3]
foo()   = sum(b)

@code_warntype foo()        # type stable

Warning!
Global variables should only be declared constants if their value will remain unchanged throughout the script. The possibility of redefining constants was introduced to facilitate testing during interactive use. In this way, users avoid the need to restart a Julia session for each new constant value. However, the practice assumes that all dependent functions are re-declared when the constant's value is modified: any function that isn't redefined will still rely on the constant's original value. Considering this, if you absolutely need to reassign the value of a constant, you should re-run the entire script.

To illustrate the potential consequences of overlooking this practice, let's compare the following code snippets that execute the function foo. Both define a constant value of x=1, which is subsequently redefined as x=2. The first example runs the script without re-executing the definition of foo, in which case the value returned by foo is still based on x = 1. Instead, the second example emulates the re-execution of the entire script. This is achieved by rerunning foo's definition, ensuring that foo uses the updated value of x.

const x = 1
foo()   = x
foo()             # it gives 1

x     = 2
foo()             # it still gives 1

const x = 1
foo()   = x
foo()             # it gives 1

x     = 2
foo() = x 
foo()             # it gives 2

Type-Annotating a Global Variable

The second approach to address type instability involves asserting a concrete type for a global variable. This is done by including the operator :: after the variable's name (e.g., x::Int64).

x::Int64           = 5
foo()              = 2 * x

@code_warntype foo()     # type stable

y::Vector{Float64} = [1, 2, 3]
foo()              = sum(y)

@code_warntype foo()     # type stable

z::Vector{Number}  = [1, 2, 3]
foo()              = sum(z)

@code_warntype foo()     # type UNSTABLE

Note that simply declaring a global variable with an abstract type won't resolve the type instability issue.

Differences Between Approaches

The two approaches explored have distinct implications for both code behavior and performance. The key to these differences lies in that type-annotations assert a variable's type, while constants additionally declare its value. Next, we analyze each consequence.

Differences in Code

Unlike the case of constants, type-annotations allow you to reassign global variable without unexpected consequences. This means you don't need to re-run the entire script when redefining the variable.

x::Int64 = 5
foo()    = 2 * x
foo()               # output is 10

x        = 2
foo()    = 2 * x
foo()               # output is 4

Differences in Performance

Type-annotated global variables are more flexible, as we only need to declare their types without committing to a specific value. However, this flexibility comes at the cost of performance, since they prevent certain optimizations that are feasible with constants. This possibility arises because not only information about their types, but also indicate that their value is fixed. Within a function, this allows constants to behave like literal values, embedded directly in the code. Consequently, the compiler can potentially substitute certain expressions for their resulting outcome.

The following code demonstrates a scenario where this occurs. It consists of an operation that can be pre-calculated if the global variable's value is known. Given this feature, declaring the global variable as a constant enables the compiler to replace this operation by its result, making it equivalent to a hard-coded value. On the contrary, merely type-annotating the global variable only allows specializing code for the type provided. To starkly reveal the effect, we call this operation in a for-loop.

const k1  = 2

function foo()
    for _ in 1:100_000
       2^k1
    end
end
Output in REPL
julia>
@btime foo()
  0.800 ns (0 allocations: 0 bytes)
k2::Int64 = 2

function foo()
    for _ in 1:100_000
       2^k2
    end
end
Output in REPL
julia>
@btime foo()
  115.600 μs (0 allocations: 0 bytes)

Remark
Even without declaring a variable as a constant, the compiler could still recognize the invariance of some operations and perform optimizations accordingly. To illustrate this, suppose we want to compute each element of x relative to the sum of the elements. A naive approach would involve a for-loop with sum(x) incorporated into the for-loop body, resulting in the repeated computation of sum(x). If, on the contrary, we calculate shares through x ./ sum(x), the compiler is smart enough to recognize the invariance of sum(x) across iterations and proceeds to its pre-computation.

x           = rand(100_000)


function foo(x)
    y    = similar(x)
    
    for i in eachindex(x,y)
        y[i] = x[i] / sum(x)
    end

    return y
end
Output in REPL
julia>
@btime foo($x)
  633.245 ms (2 allocations: 781.30 KiB)
x           = rand(100_000)


foo(x) = x ./ sum(x)
Output in REPL
julia>
@btime foo($x)
  49.400 μs (2 allocations: 781.30 KiB)
x           = rand(100_000)
const sum_x = sum(x)

foo(x) = x ./ sum_x
Output in REPL
julia>
@btime foo($x)
  41.500 μs (2 allocations: 781.30 KiB)