pic
Personal
Website

3d. Variable Scope & Relevance of Functions

PhD in Economics

Introduction

Variable scope refers to the code block in which a variable is accessible. The concept allows us to distinguish between global variables, which are accessible in any part of the code, and local variables, which only exist within a specific code block. The existence of scopes determines that the same variable x could refer to different objects, depending on where it's called.

When it comes to functions, Julia adheres to specific rules for variable scope. Specifically, given a variable x defined outside a function:

  • if a new variable x is defined inside a function or is passed to the function as an argument, then x is considered local to that function. Moreover, any reference to x within the function refers to the local variable, without any relation to the variable x defined outside the function,

  • if the function includes x, but it doesn't define a new x nor x is a function argument, then x refers to the variable defined outside the function (i.e., the global variable).

The rules can be more effectively understood by illustrating them, as we do next.

Global and Local Variables

A variable that is local to a function only exists within the function's scope. Consequently, if we attempt to reference it outside the function, Julia will indicate that the variable doesn't exist.

Variables local to a function encompass: i) the function arguments, and ii) variables defined in the function body. Any other variable appearing in a function necessarily refers to a global variable.

Identifying local variables in a function is crucial, as a local variable may share the same name as a global variable without them being related. The distinction between global and local variables is easier to grasp through the following examples.

x = "hello"

function foo(x)                # 'x' is local, unrelated to 'x = hello' above
    y = x + 2                  # 'y' is local, 'x' refers to the function argument 
    
    return x,y
end

Output in REPL
julia>
foo(1)
1 # local x 3 # local y

julia>
x
"hello"

julia>
y
ERROR: UndefVarError: y not defined

z = 2

function foo(x)                 
    y = x + z                   # 'x' refers to the function argument, 'z' refers to the global

    return x,y,z
end

Output in REPL
julia>
foo(1)
1 # local x 3 # local y 2 # global z

julia>
x
ERROR: UndefVarError: x not defined

julia>
z
2

The Role of Functions

A function should be understood as a self-cointained mini-program to represent a specific task. Under this interpretation, local variables simply act as labels that help articulate the mechanics of the task. Indeed, variables local to a function are inaccessible outside of it. [note] Local variables play a similar role to integration variables in math. Formally, \(t\) in \(\int f\left(t\right)\,\mathrm{d}t\) for some function \(f\) is just a symbol indicating over which variable we're integrating. This is why the integral can be equivalently expressed through \(x\), as \(\int f\left(x\right)\,\mathrm{d}x\), or any other label for \(t\).

To explain what this means, consider the existence of a variable x, and another variable y computed by transforming x through the function f. Formally, y = f(x), where we directly assume that the transformation doubles x, so that y = 2 * x. The following are two approaches to computing y.

x = 3

double() = 2 * x
y        = double()

x = 3

double(x) = 2 * x
y         = double(x)

x = 3

double(πŸ’) = 2 * πŸ’
y          = double(x)

The function in Approach 1 utilizes the global variable x. This practice is highly discouraged for several reasons. Firstly, it prevents the reusability of the function, as it's specifically designed to double the global variable x, rather than acting as a mini-program that doubles any variable.

Second, the inclusion of the global variable x compromises the function's self-containment, as the function's output depends on the value of x at the moment of execution. If you work on a long project, this feature makes the code more prone to bugs.

Lastly, global variables have a detrimental impact on performance, a topic we'll study later on the website. In fact, global variables in Julia are directly a performance killer.

In contrast, Approach 2 refers to x as a local variable. This x is unrelated to the global variable xβ€”it simply serves as a label to identify the variable to be doubled. Indeed, we could've replaced it with any other label, as demonstrated in Approach 3 through the monkey emoji, πŸ’.

By avoiding referencing any variable outside its scope, Approach 2 makes the function self-contained. This allows users to predict the consequence of applying double by simply inspecting the function, eliminating the need to review the entire codebase. Thus, Approach 2 aligns with the interpretation of a function as a self-contained mini-program: the function embodies the task of doubling a variable, turning the function reusable and applicable to any variable. In this context, applying double to the global variable x is just one possible use case.

Recommendations For The Use Of Functions

Structuring code around functions offer numerous advantages. However, to fully realize these benefits, it's essential to follow certain principles when writing code. This section outlines a few of them, and should be considered as a mere introduction to the subject. The topic will be investigated further, when we explore high performance.

Avoid Global Variables in Functions

Global variables are strongly discouraged. This is not only due to the reasons mentioned previously, but also because they can have a devastating impact on performance. The easiest solution to this issue is to pass global variables as function arguments. This practice will actually become second nature once you start viewing functions as self-contained mini-programs. Specifically, by adopting this perspective, you'll conceive local variables as labels to describe a task, rather than references to global variables. This shift in mindset can help you write more efficient and maintainable code.

Avoid Redefining Variables within Functions

The suggestion applies to both local variables and function arguments. Redefining these variables can have several disadvantages, including reduced code readability and potential performance degradation. Therefore, it's recommended that you define new variables instead of redefining existing ones. This approach is demonstrated in the following example.

function foo(x)
   x      = 2 + x           # redefines the argument
   
   y      = 2 * x
   y      = x + y           # redefines a local variable
end

function foo(x)
   z      = 2 + x           # new variable
   
   y      = 2 * x
   output = z + y           # new variable
end

Modularity

We've emphasized the importance of viewing functions as self-contained mini-programs, designed to perform specific tasks. This perspective leads us to highlight the importance of modularity: the practice of breaking down a program into multiple small functions, each with its own distinct purpose, inputs, and outputs.

The primary benefit of modularity is the ability to work with independent code blocks. By keeping these blocks separate, we can decompose complex problems into multiple manageable tasks, making it easier to test and debug code. Additionally, modularity makes it possible to eventually improve or substitute parts of the code, without breaking the entire program.

A helpful way to understand this principle is by considering the analogy of building a Lego minifigure. In the first step, multiple blocks are created independently, each representing a specific part of the figure, such as the legs, torso, arms, and head. Then, in the second stage, these individual blocks are brought together and assembled into an integrated minifigure.

This two-step approach offers several advantages. By focusing on each block individually, we can concentrate and refine each part without worrying about the entire structure. Additionally, it provides great flexibility: since each block is created independently, we can modify specific blocks without having to rebuild the entire figure. For instance, if we want to change the figure's head, we can simply swap out the corresponding block, without starting from scratch.

The principle of modularity is closely tied to the suggestion of writing short functions. Some proponents even argue that functions should be limited to fewer than five lines of code Indeed, entire books have been written based on this principle. Although this viewpoint may be considered rather extreme, it clearly emphasizes the advantages of avoiding lengthy functions.