allCode.jl
. They've been tested under Julia 1.10.4. <function>
or <operator>
).
This is just notation, and the symbols <
and >
should not be misconstrued as Julia's syntax.
Action | Keyboard Shortcut |
---|---|
Previous Section | Ctrl + 🠘 |
Next Section | Ctrl + 🠚 |
List of Sections | Ctrl + z |
List of Subsections | Ctrl + x |
Close Any Popped Up Window (like this one) | Esc |
Open All Codes and Outputs in a Post | Alt + 🠛 |
Close All Codes and Outputs in a Post | Alt + 🠙 |
Unit | Acronym | Measure in Seconds |
---|---|---|
Seconds | s | 1 |
Milliseconds | ms | 10-3 |
Microseconds | μs | 10-6 |
Nanoseconds | ns | 10-9 |
allCode.jl
. They've been tested under Julia 1.10.4. This section considers various scenarios that cause type instabilities. We dub them as "gotchas" as they aren't immediately obvious. We also propose suggestions to address them.
When working with scalars, keep in mind that Int64
and Float64
are distinct types. Mixing them may introduce type instability inadvertently.
In the following example, the function foo
takes a numeric variable x
as its argument and performs two tasks. The first one defines a variable y
by a transformation of x
, where all negative values are replaced by zero. The second task runs an operation based on the resulting y
.
function foo(x)
y = (x < 0) ? 0 : x
return [y * i for i in 1:100]
end
@code_warntype foo(1) # type stable
@code_warntype foo(1.) # type UNSTABLE
function foo(x)
y = (x < 0) ? zero(x) : x
return [y * i for i in 1:100]
end
@code_warntype foo(1) # type stable
@code_warntype foo(1.) # type stable
As for the first tab, given that 0
has type Int64
, no type instability arises if x
is Int64
too. On the contrary, if x
is Float64
, the compiler must contemplate the possibility that y
could be an Int64
or a Float64
. [note] A similar problem would occur if we replaced 0
by 0.
and x
is an integer.
Julia can handle combinations of Int64
and Float64
quite effectively. Therefore, the latter type instability wouldn't be a significant issue if the operation involving y
calls y
only once. Indeed, @code_warntype
would only issue a yellow warning that therefore could be safely ignored. However, in our example, foo
performs an operation that repeatedly uses y
, incurring the cost of type instability multiple times. As a result, @code_warntype
issues a red warning, indicating a more serious performance issue.
The second tab proposes a solution based on a function that returns the zero element corresponding to the type of x
. This approach isn't limited to zero, and can actually be applied to any value. This would be achieved by using either convert(typeof(x), <value>)
or oftype(x, <value>)
to convert <value>
to the same type as x
.
When working in data analysis, collections of collections emerge naturally. An example of this data structure is given by the DataFrames
package, which is the standard tool for handling datasets. A DataFrame defines a table where each column corresponds to a variable. As we haven't introduced this package, we'll keep matters simple and consider a vector of vectors, which replicates a similar structure.
Specifically, suppose a vector data
comprising multiple inner vectors. The issue lies in that a function taking data
as its argument and operates on an inner vector will be type unstable. The feature of data
determining this is that its type is Vector{Vector}
. This flexibility is a desirable feature in data analysis, as it allows for holding inner vectors of different types (e.g., strings, numbers). However, it also determines that data
lacks information about the types of its inner vectors. The reason is that Vector{Vector}
only indicates that its elements are vectors, without specifying the type of their elements.
The simplest solution is to include a barrier function that takes the inner vector as its argument. This approach is able to correct the type instability, as the function will attempt to identify concrete types for the inner vectors passed.
In the following, we illustrate the approach by supposing that the operation is based on an inner vector vec2
. The code snippets in both tabs pass data
as a function's argument, with the second one additionally using a barrier function with vec2
as its argument. Note that the barrier function is defined in-place, making the value of vec2
and hence of data
be updated.
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2]
function foo(data)
for i in eachindex(data[2])
data[2][i] = 2 * i
end
end
@code_warntype foo(data) # type UNSTABLE
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2]
foo(data) = operation!(data[2])
function operation!(x)
for i in eachindex(x)
x[i] = 2 * i
end
end
@code_warntype foo(data) # barrier-function solution
While barrier functions can mitigate type instabilities, it's essential to keep in mind that the parent function remains type unstable. This implies that if we fail to resolve the type instability before executing a repeated operation, the performance cost of that instability will be incurred multiple times.
To illustrate this point, let's revisit the last example involving a vector of vectors. Below, we present two incorrect approaches to using a barrier function under such scenario, followed by a demonstration of its proper application.
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2]
operation(i) = (2 * i)
function foo(data)
for i in eachindex(data[2])
data[2][i] = operation(i)
end
end
@code_warntype foo(data) # type UNSTABLE
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2]
operation!(x,i) = (x[i] = 2 * i)
function foo(data)
for i in eachindex(data[2])
operation!(data[2], i)
end
end
@code_warntype foo(data) # type UNSTABLE
vec1 = ["a", "b", "c"] ; vec2 = [1, 2, 3]
data = [vec1, vec2]
function operation!(x)
for i in eachindex(x)
x[i] = 2 * i
end
end
foo(data) = operation!(data[2])
@code_warntype foo(data) # barrier-function solution
Compiler creates method instances exclusively based on types, with values not participating in the process. To explain its implications, let's consider the following concrete example.
function foo(condition)
y = condition ? 2.5 : 1
return [y * i for i in 1:100]
end
@code_warntype foo(true) # type UNSTABLE
@code_warntype foo(false) # type UNSTABLE
At first glance, we might erroneously conclude that foo(true)
should be type stable: the value of condition
is true
, so that y = 2.5
and therefore y
will have type Float64
. However, values don't participate in multiple dispatch, meaning that Julia's compiler ignores the value of condition
when inferring y
's type. Ultimately, y
is treated as potentially being either Int64
or Float64
, leading to type instability.
The issue in this example is reminiscent of the first "gotcha" we explored. Indeed, it can be resolved using similar techniques, or alternatively employing a barrier function. However, analyzing the type instability from the type-inference process reveals far-reaching consequences, extending beyond this specific example. In particular, it has deep implications for the use of tuples.
Identifying a type requires having information on all the parameters that characterizes it. The type Tuple
in particular is defined by both the type of each variable and the number of elements. For example, x = (1, 2.4)
has type Tuple{Int64, Float64}
, meaning that x
is an object with 2 elements with types Int64
and Float64
respectively. This type is different from the one of x = (1, 2.4, 3)
, which has 3 elements with types Int64
, Float64
, and Int64
respectively. In comparison, the type Vector
only identifies the same type for each element, and doesn't require information on the number of elements . For instance, both x = [1, 2.4]
and x = [1, 2.4, 3]
have the same type, given by Vector{Float64}
.
One consequence of this is that when we create tuples using vectors as their source, we lack sufficient information to identify a tuple's type. In particular, a tuple created from an object Vector{Float64}
successfully identifies the type of each element (Float64
), but the number of elements remains unknown. Consequently, if we create a tuple inside a function based on a vector, the function will be type unstable.
This can be observed in the following examples. The second tab shows that the issue won't be solved by merely passing the number of elements as an argument. The reason is that this wouldn't identify the value of the argument, only its type. This information isn't relevant to know for dispatch, as the compiler will only infer that the number of elements is an integer, not the number itself.
The easiest fix is to define the tuple outside the function and then pass it as an argument. This allows the compiler to gather all the information from the Tuple
type.
x = [1,2,3]
function foo(x) # 'Vector{Int64}' has no info on the number of elements
tuple_x = Tuple(x)
2 .+ tuple_x
end
@code_warntype foo(x) # type UNSTABLE
x = [1,2,3]
function foo(x, N) # The value of 'N' isn't considered, only its type
tuple_x = NTuple{N, eltype(x)}(x)
2 .+ tuple_x
end
@code_warntype foo(x, length(x)) # type UNSTABLE
x = [1,2,3]
tuple_x = Tuple(x)
function foo(x)
2 .+ x
end
@code_warntype foo(tuple_x) # type stable
An alternative solution relies on a workaround that allows Julia to dispatch by value. This is achieved by the type Val
, which makes it possible to pass information regarding values to the compiler. Specifically, for any function foo
and value a
that you seek the compiler to know, you need to include ::Val{a}
as an argument and then use Val(a)
when calling foo
.
The following example applies this approach by revisiting the cases studied above. For the implementation, you could find it useful to revisit the keyword where
for types introduced here.
In the first example, the type instability of foo
emerged because the value of condition
wasn't known by the compiler. Adding Val
, along with the keyword where
, allows passing the value of condition
to the compiler. Note that where
is necessary, as otherwise it won't identify the specific value of condition
.
function foo(condition)
y = condition ? 2.5 : 1
return [y * i for i in 1:100]
end
@code_warntype foo(true) # type UNSTABLE
@code_warntype foo(false) # type UNSTABLE
function foo(::Val{condition}) where condition
y = condition ? 2.5 : 1
return [y * i for i in 1:100]
end
@code_warntype foo(Val(true)) # type stable
@code_warntype foo(Val(false)) # type stable
Below, we apply the Val
approach to the example with tuples.
x = [1,2,3]
function foo(x)
tuple_x = Tuple(x)
2 .+ tuple_x
end
@code_warntype foo(x) # type UNSTABLE
x = [1,2,3]
function foo(x, ::Val{N}) where N
tuple_x = NTuple{N, eltype(x)}(x)
2 .+ tuple_x
end
@code_warntype foo(x, Val(length(x))) # type stable
In functions, it's possible to set variables as default values when defining keyword arguments. Nonetheless, these variables will be considered global, turning the function type unstable.
As a best practice, it's recommended to avoid using variables as default values whenever possible. Instead, opt for concrete values. In cases where using a variable as a default value is necessary, there are several strategies you could deploy.
The first set of solutions affect the variable to be used as a default value in its global scope. They include to type-annotate the global variable or define it as a const
. Additionally, you could define a function storing the default value, taking advantage the identification of types when a function is called. The second set of solutions is based on a local approach. They require type-annotating either the keyword argument or the default value. All these strategies are demonstrated below.
foo(; x) = x
β = 1
@code_warntype foo(x=β) #type stable
foo(; x = 1) = x
@code_warntype foo() #type stable
foo(; x = β) = x
β = 1
@code_warntype foo() #type UNSTABLE
foo(; x = β) = x
const β = 1
@code_warntype foo() #type stable
foo(; x = β) = x
β::Int64 = 1
@code_warntype foo() #type stable
foo(; x = β()) = x
β() = 1
@code_warntype foo() #type stable
foo(; x::Int64 = β) = x
β = 1
@code_warntype foo() #type stable
foo(; x = β::Int64) = x
β = 1
@code_warntype foo() #type stable
Based on this, we can propose the following solution to the code above, where β
is introduced as a positional argument.
foo(β; x = β) = x
β = 1
@code_warntype foo(β) #type stable
Closures are created when a function is nested inside another function, granting the inner function access to the outer function's scope. They show up explicitly when defining functions within a function, but also implicitly when using anonymous functions as function arguments in a function definition.
The biggest downside of closures is that they can easily introduce type instabilities. And, although there have been some improvements in this area, the issues have been around for several years. This is why it's crucial to be aware of its subtle consequences and how to address them.
Writing code with closures emerge naturally in various contexts. The approach allows us to write self-contained code, keeping all steps of a task within a single function. One common scenario where closures are natural is when we need to prepare data (e.g., defining parameters or initializing variables), and then compute an output based on that data. Below, we implement a scenario like this with and without a closure. We do it generically, so that you can easily identify the pattern.
function task()
# <here, you define parameters and initialize variables>
function output()
# <here, you do some computations with the variables and parameters>
end
return output()
end
task()
function task()
# <here, you define parameters and initialize variables>
return output(<variables>, <parameters>)
end
function output(<variables>, <parameters>)
# <here, you do some computations with the variables and parameters>
end
task()
While the approach with closures seems natural, it can introduce type instability if we:
redefine variables/arguments (e.g., when we update a variable in an output),
affect the order in which we define functions, or
use anonymous functions.
We explore these cases below, where we refer to the containing function as the outer function and the closure as the inner function.
Before considering specific scenarios, let's start examining three examples. They encompass all the possible situations where closures could make type instability emerge.
The first example reveals that the location of the inner function could matter.
function foo()
x = 1
bar() = x
return bar()
end
@code_warntype foo() # type stable
function foo()
bar(x) = x
x = 1
return bar(x)
end
@code_warntype foo() # type stable
function foo()
bar() = x
x = 1
return bar()
end
@code_warntype foo() # type UNSTABLE
function foo()
bar()::Int64 = x::Int64
x::Int64 = 1
return bar()
end
@code_warntype foo() # type UNSTABLE
function foo()
x = 1
return bar(x)
end
bar(x) = x
@code_warntype foo() # type stable
The second example shows that type instability arises if you use closures and simultaneously redefine variables or arguments. This holds even when you redefine a variable x
by the same object, including a trivial expression as x = x
. The example also reveals that type annotating either the variable redefined or the closure doesn't solve the problem.
function foo()
x = 1
x = 1 # or 'x = x', or 'x = 2'
return x
end
@code_warntype foo() # type stable
function foo()
x = 1
x = 1 # or 'x = x', or 'x = 2'
bar(x) = x
return bar(x)
end
@code_warntype foo() # type stable
function foo()
x = 1
x = 1 # or 'x = x', or 'x = 2'
bar() = x
return bar()
end
@code_warntype foo() # type UNSTABLE
function foo()
x::Int64 = 1
x = 1
bar()::Int64 = x::Int64
return bar()
end
@code_warntype foo() # type UNSTABLE
function foo()
x::Int64 = 1
bar()::Int64 = x::Int64
x = 1
return bar()
end
@code_warntype foo() # type UNSTABLE
function foo()
bar()::Int64 = x::Int64
x::Int64 = 1
x = 1
return bar()
end
@code_warntype foo() # type UNSTABLE
function foo()
x = 1
x = 1 # or 'x = x', or 'x = 2'
return bar(x)
end
bar(x) = x
@code_warntype foo() # type stable
Finally, the last example highlights that the order in which you define closures could matter for type stability. The third tab in particular shows that you could avoid the issue if you pass any subsequent closure as an argument. However, this approach is at odds with how Julia works in general, as we don't usually include functions as arguments of functions.
function foo(x)
closure1(x) = x
closure2(x) = closure1(x)
return closure2(x)
end
@code_warntype foo(1) # type stable
function foo(x)
closure2(x) = closure1(x)
closure1(x) = x
return closure2(x)
end
@code_warntype foo(1) # type UNSTABLE
function foo(x)
closure2(x, closure1) = closure1(x)
closure1(x) = x
return closure2(x, closure1)
end
@code_warntype foo(1) # type stable
function foo(x)
closure2(x) = closure1(x)
return closure2(x)
end
closure1(x) = x
@code_warntype foo(1) # type stable
Overall, the examples reveal a common pattern: the closure accesses variables or functions from the outer function's scope, without passing them as arguments. If we still want to employ closures, one generic solution is to specify all the arguments of the inner function, including the functions used.
In the following, we'll examine specific implementations of these patterns. They reveal that the issue can occur more frequently than we might expect. For each scenario, we'll also provide alternative case-dependent solutions, allowing you to employ a closure approach. Nonetheless, if you want to stay on the safe, the general guideline is to avoid closures if it's possible.
x = [1,2]; β = 1
function foo(x, β)
(β < 0) && (β = -β) # transform 'β' to use its absolute value
bar(x) = x * β
return bar(x)
end
@code_warntype foo(x, β) # type UNSTABLE
x = [1,2]; β = 1
function foo(x, β)
(β < 0) && (β = -β) # transform 'β' to use its absolute value
bar(x,β) = x * β
return bar(x,β)
end
@code_warntype foo(x, β) # type stable
x = [1,2]; β = 1
function foo(x, β)
δ = (β < 0) ? -β : β # transform 'β' to use its absolute value
bar(x) = x * δ
return bar(x)
end
@code_warntype foo(x, β) # type stable
x = [1,2]; β = 1
function foo(x, β)
β = abs(β) # 'δ = abs(β)' is preferable (you should avoid redefining variables)
bar(x) = x * δ
return bar(x)
end
@code_warntype foo(x, β) # type stable
Recall that the compiler doesn't dispatch by value, and so whether the condition holds is irrelevant. For instance, the type instability would still hold if we wrote 1 < 0
instead of β < 0
. Moreover, the value used to redefine β
is unimportant for the issue to arise, with the same conclusion holding if you write β = β
.
Using an anonymous function inside a function is a common type of closure. Considering this, type instability also arises in the example above if we replace the inner function bar
for an anonymous function. To demonstrate this, we apply filter
with an anonymous function that selects the values in x
greater than β
.
x = [1,2]; β = 1
function foo(x, β)
(β < 0) && (β = -β) # transform 'β' to use its absolute value
filter(x -> x > β, x) # keep elements greater than 'β'
end
@code_warntype foo(x, β) # type UNSTABLE
x = [1,2]; β = 1
function foo(x, β)
δ = (β < 0) ? -β : β # define 'δ' as the absolute value of 'β'
filter(x -> x > δ, x) # keep elements greater than 'δ'
end
@code_warntype foo(x, β) # type stable
x = [1,2]; β = 1
function foo(x, β)
β = abs(β) # 'δ = abs(β)' is preferable (you should avoid redefining variables)
filter(x -> x > β, x) # keep elements greater than β
end
@code_warntype foo(x, β) # type stable
function foo(x)
β = 0 # or 'β::Int64 = 0'
for i in 1:10
β = β + i # equivalent to 'β += i'
end
bar() = x + β # or 'bar(x) = x + β'
return bar()
end
@code_warntype foo(1) # type UNSTABLE
function foo(x)
β = 0
for i in 1:10
β = β + i
end
bar(x,β) = x + β
return bar(x,β)
end
@code_warntype foo(1) # type stable
x = [1,2]; β = 1
function foo(x, β)
(1 < 0) && (β = β)
bar(x) = x * β
return bar(x)
end
@code_warntype foo(x, β) # type UNSTABLE
To illustrate this claim, suppose you want to define a variable x
that depends on a parameter β
. However, there's a catch: β
is measured in one unit (e.g., meters), while x
requires β
to be in a different unit of measurement (e.g., centimeters). This implies that we must rescale β
to the appropriate unit, before defining x
.
Depending on how we implement the operation, a type instability could emerge.
function foo(β)
x(β) = 2 * rescale_parameter(β)
rescale_parameter(β) = β / 10
return x(β)
end
@code_warntype foo(1) # type UNSTABLE
function foo(β)
rescale_parameter(β) = β / 10
x(β) = 2 * rescale_parameter(β)
return x(β)
end
@code_warntype foo(1) # type stable