pic
Personal
Website

8d. Type Stability with Global Variables

PhD in Economics

Introduction

Variables can be categorized according to the code block in which they live. Specifically, global variables can be accessed and modified anywhere in the code, while local variables are only accessible within a specific scope. For the purpose of this section, local variables will refer exclusively to those existing in a function's scope, thus encompassing both function arguments and variables defined inside the function.

The distinction is relevant, as global variables are a common source of type instability. The issue arises because Julia's type system doesn't assign a concrete type to global variables. Therefore, the compiler has to consider multiple possibilities when processing computations, thus preventing the specialization of operations. The consequence is a computational slowdown.

The current section explores two approaches to working with global variables without incurring this issue: type-annotations and constants. Defining global variables as constants is a natural choice when the values are truly fixed, like in the case of π = 3.14159. More broadly, constants can be used in any scenario where the variable remains unmodified throughout the script. One particular advantage of constants is that they're more performant than type-annotated global variables. The reason is that the compiler knows both the type and value of constants, thus allowing for further optimizations. Indeed, the behavior of constants within a function is indistinguishable from that of a literal value. [note] Literal values refer to values expressed directly in the code rather than through a variable (e.g., 1, "hello", or true).

Warning! - You Should Still Wrap Code in a Function
Even if you implement the fixes proposed for global variables, functions implement optimizations that aren't possible in the global scope. Consequently, you should use still wrap tasks in functions for optimal performance.

When Are We Using Global Variables?

Before exploring approaches for handling global variables, let's first identify scenarios where they're commonly used. The simplest case consists of operations performed in the global scope, which is presented in the first tab below. The second tab illustrates a more subtle case, where a function references a global variable.

To highlight the benefits of avoiding global variables, the final tab showcases a self-contained function as an alternative to the latter case. By definition, self-contained functions operate exclusively with locally defined variables, eliminating any dependence on global variables.

# all operations are type UNSTABLE (they're defined in the global scope)
x = 2

y = 2 * x 
z = log(y)
x = 2

function foo() 
    y = 2 * x 
    z = log(y)

    return z
end

@code_warntype foo() # type UNSTABLE
x = 2

function foo(x) 
    y = 2 * x 
    z = log(y)

    return z
end

@code_warntype foo(x)  # type stable

Beyond any performance gain, self-contained functions offer additional advantages. For one, they're easier to reason about, as they don't require tracking variables across the entire script to grasp their purpose. Moreover, by not depending on global variables, their output is independent of the script's state. These features allow self-contained functions to embody a specific task, giving them a clear well-defined functionality, which in turn enables their reuse for similar tasks.

Achieving Type Stability With Global Variables

The previous example emphasized the benefits of self-contained functions, providing compelling reasons to avoid global variables. In Julia, global variables are further discouraged due to their detrimental impact on performance, as they introduce type instability.

While global variables pose issues, they can be convenient in certain scenarios. For instance, it'd be highly impractical to define constants locally in every single function. Considering this, we present two approaches that let us work with global variables, while addressing their performance penalty: declaring a global variable as a constant and annotating it with a concrete type. Next, we explore each separately.

Constant Global Variables

Declaring global variables as constants requires adding the const keyword before the variable's name. For instance, const x = 3. This approach can be applied to any type of variable, including collections.

const a = 5
foo()   = 2 * a

@code_warntype foo()        # type stable
const b = [1, 2, 3]
foo()   = sum(b)

@code_warntype foo()        # type stable

Warning!
Global variables should only be declared constants if their value will remain unchanged throughout the script. The possibility of redefining constants was introduced to facilitate testing during interactive use. In this way, users avoid the need to restart a Julia session for each new constant value. However, the practice assumes that all dependent functions are re-declared when the constant's value is modified: any function that isn't redefined will still rely on the constant's original value. Considering this, if you absolutely need to reassign the value of a constant, you should re-run the entire script.

To illustrate the potential consequences of overlooking this practice, let's compare the following code snippets that execute the function foo. Both define a constant value of x=1, which is subsequently redefined as x=2. The first example runs the script without re-executing the definition of foo, in which case the value returned by foo is still based on x = 1. Instead, the second example emulates the re-execution of the entire script. This is achieved by rerunning foo's definition, ensuring that foo uses the updated value of x.

const x = 1
foo()   = x
foo()             # it gives 1

x     = 2
foo()             # it still gives 1

const x = 1
foo()   = x
foo()             # it gives 1

x     = 2
foo() = x 
foo()             # it gives 2

Type-Annotating a Global Variable

The second approach to address type instability involves asserting a concrete type for a global variable. This is done by including the operator :: after the variable's name (e.g., x::Int64).

x::Int64 = 5
foo()    = 2 * x

@code_warntype foo()     # type stable

y::Vector{Float64} = [1, 2, 3]
foo()              = sum(y)

@code_warntype foo()     # type stable

z::Vector{Number}  = [1, 2, 3]
foo()              = sum(z)

@code_warntype foo()     # type UNSTABLE

Note that simply declaring a global variable with an abstract type won't resolve the type instability issue.

Differences Between Approaches

The two approaches explored have distinct implications for both code behavior and performance. The key to these differences lies in that type-annotations assert a variable's type, while constants additionally declare its value. Next, we analyze each consequence.

Differences in Code

Unlike the case of constants, type-annotations allow you to reassign global variable without unexpected consequences. This means you don't need to re-run the entire script when redefining the variable.

Redefining Global Variable

x::Int64 = 5
foo()    = 2 * x
foo()               # output is 10

x        = 2
foo()    = 2 * x
foo()               # output is 4

Differences in Performance

While type-annotations provide more flexibility, they can also prevent certain optimizations achievable with constants. This is because the compiler regards constant values as fixed, thus behaving like literal values embedded directly in the code.

The following code shows one scenario where this feature makes a difference. It involves repeating an operation, which can be fully determined if the global variable's value is known. From the compiler's point of view, a constant is equivalent to a hardcoded value, enabling the compiler to replace the operation by its output. On the contrary, when a global variable is only type-annotated, the compiler can't assume that the value is fixed. Consequently, it creates a generic method instance that recomputes the operation in each iteration. This fact explains the differences in timing between the tabs.

const k1  = 2

function foo()
    for _ in 1:100_000
       2^k1
    end
end
Output in REPL
julia>
@btime foo()
  0.800 ns (0 allocations: 0 bytes)
k2::Int64 = 2

function foo()
    for _ in 1:100_000
       2^k2
    end
end
Output in REPL
julia>
@btime foo()
  115.600 μs (0 allocations: 0 bytes)

Remark
Don't infer from this example that the compiler is incapable of recognizing invariant operations within a for-loop. The compiler could figure this out depending on the approach implemented.

To illustrate this, let's calculate the shares of x's elements, defined as each element's value relative to the sum of x's elements. A naive approach would involve a for-loop with sum(x) incorporated into the loop body, resulting in the repeated computation of sum(x). If, on the contrary, we calculate shares through x ./ sum(x), the compiler is smart enough to recognize the invariance of sum(x) across iterations. Consequently, it pre-computes sum(x), thus achieving a similar speed to a version where sum(x) is a constant.

x           = rand(100_000)


function foo(x)
    y    = similar(x)
    
    for i in eachindex(x,y)
        y[i] = x[i] / sum(x)
    end

    return y
end
Output in REPL
julia>
@btime foo($x)
  633.245 ms (2 allocations: 781.30 KiB)
x           = rand(100_000)


foo(x) = x ./ sum(x)
Output in REPL
julia>
@btime foo($x)
  49.400 μs (2 allocations: 781.30 KiB)
x           = rand(100_000)
const sum_x = sum(x)

foo(x) = x ./ sum_x
Output in REPL
julia>
@btime foo($x)
  41.500 μs (2 allocations: 781.30 KiB)