allCode.jl
. It's been tested under Julia 1.9.2
.For more accurate benchmarks, each variable is wrapped in
ref(x) = (Ref(x))[]
and interpolated through $
. Action | Keyboard Shortcut |
---|---|
Open All Codes and Outputs in a Post | Ctrl + 🠙 |
Close All Codes and Outputs in a Post | Ctrl + 🠛 |
Close Any Popped Up Window (like this one) | Esc |
Previous Post | Ctrl + 🠘 |
Next Post | Ctrl + 🠚 |
List of Sections in a Post | Ctrl + x |
List of Keyboard Shortcuts | Ctrl + z |
allCode.jl
. It's been tested under Julia 1.9.2
.ref(x) = (Ref(x))[]
and interpolated through $
. Goal of the post: To present strategies for quantifying a model, maintaining a clear link between the equations and code, and without compromising speed. The programming language used is Julia.
Motivation: While code readability aligns with the old programming maxim that "code is read more often than it's written", it may act against speed. Furthermore, code readability doesn't guarantee a clear mapping between the code and the mathematical model.
Benefits of having a tight link between code and equations: it makes code easier to understand for both the programmer and third parties, additionally reducing the possibility of bugs when translating equations into code.
Tasks for which the post can be relevant: the approaches presented seek a balance between speed and code readability. This implies that, while there could be other methods that increase computational speed further, the emphasis is on approaches not demanding many additional lines of code.
The main conclusions of the post are:
Unicode characters require more typing, but are effective in establishing a direct link with the mathematical terms.
While for-loops commonly represent the most efficient approach in Julia, they can affect readability. A vectorized form can circumvent this and simultaneously achieve an identical performance.
We consider a simple exercise modeling an industry. The goal is to compute the impact of tougher import competition on consumer welfare. On a technical level, this requires solving a non-linear system of equations.
Our data consists of
each domestic firm's market share
total import market share
The market structure considered is extremely simple: an oligopoly with no entry/exit and competition in prices. In this simple model, tougher import competition only affects domestic firms by reducing their market share, which in turn benefits consumers by reducing markups.
There is a set of firms Ω. The notation for a given firm i
qi​ → i's quantity
pi​ → i's price
zi​ → i's "quality" (exogenous).
Moreover, the industry values are:
E → industry expenditure (exogenous)
P → price index
W:=PE​ → welfare
Sm​ → total market share of imports
We'll make use of a CES demand function. This is well-suited for expressing all our variables in percentage variations through the "hat-algebra" procedure. Specifically, the demand for firm i is
where the price index is given by P:=[∑i∈Ω​(zi​)δ(pi​)1−σ]1−σ1​.
This demand determines a price elasticity given by
Given a revenue ri​:=qi​pi​, the market share of i defined by si​:=Ej​ri​​. This can be expressed as si​=(zi​)δ(pi​)1−σ(P)σ−1., which substituting in the definition of the price index becomes
Optimal price of a firm i can be expressed as a function of its market share by using that
where μ(εi​)=εi​−1εi​​ and ε(si​)=σ−si​(σ−1).
Finally, there's equilibrium in the industry when the sum of market shares equals one:
We consider two scenarios:
initial situation (data observed), denoted ∗
counterfactual (hypothetical situation), denoted ∗∗
In the counterfactual, there's an increase in the import market share, Sm​, which is formally defined by ΔSm​:=Sm∗∗​−Sm∗​>0.
We'll express the system using hat notation. This means that for any variable x, the equilibrium conditions are expressed through
Our goal is to compute the impact on welfare, measured as real income W:=PE​. As industry expenditure is given, we seek to compute
We next state the system of equations to be solved, which pins down P. The derivations are unimportant and are included below as optional.
The derivations exploit that x∗∗=xx∗ for any variable x.
Subtracting each (0)
Using si​:=si∗​si∗∗​​ and the definition of si​, we determine that
Using the definition of pi​,
Markups in turn are
and using εi∗∗​=εi​εi∗​
Finally, i's elasticity hat can be expressed as a function of the market share, using that
Given εi∗​=σ−(σ−1)si∗​.
Identifying P can be done by solving for (P,(si​)i∈Ω​) simultaneously. This requires solving the following system of equations,
i∈Ω∑​(1−si​)si∗​=ΔSm​ si​=(p​i​)1−σ(P)σ−1where (2) holds for each i and
p​i​=εi​εi∗​−1εi​(εi∗​−1)​ εi​=σ−(σ−1)si∗​σ−(σ−1)si∗​si​​Note that p​i​(εi​) and εi​(si​), so that they're fully determined by si​.
Information at our disposal to solve the system:
si∗​ for each i∈Ω.
εi∗​ for each i∈Ω, computed by knowledge of si∗​.
Parameters: only σ, for which we'll assume a value.
Shock: ΔSm​, for which we'll assume a value.
The system of equations is solved using the package NLsolve
. Additionally, we'll apply various optimization stemming from the packages LoopVectorization
and LazyArrays
. Below, we provide how the mock data is constructed and the list of packages used.
# parameters
const σ::Float64 = 3
# shock and outcome sought
ΔSₘ = 0.10
welfare_change(PÌ‚) = round( (1/PÌ‚ - 1) * 100, digits=2)
# (mock) data
nr_firms = 500
Sₘ = 0.05
sáµ¢ = rand(Pareto(log(4, 5)), nr_firms) |>
x -> (x ./ sum(x) .* (1 - Sₘ))
elasticity(share, σ) = σ + share - share*σ
eᵢ = elasticity.(sᵢ, Ref(σ))
using Distributions, Random
Random.seed!(1234)
using NLsolve
using LazyArrays, LoopVectorization
The goal is to compute P, from which we then can use welfare_change
to obtain the result. This is why we directly focus on the computation of P.
Notice that the code uses Unicode characters, which aligns directly with the notation used for the equations. Unfortunately, Unicode characters break code alignment with =
, hence hindering readability. Nevertheless, the benefits of using Unicode characters seem to outweigh this issue.
The mock data comprises 500 firms, which is a large number for the use of StaticArrays
. Moreover, we have information on si​, from which we can determine εi​. We also establish a shock where import market share increases by 10 percentage points, represented by ΔSm​=0.10, in a scenario where imports represent 5% of the market share.
Lastly, the only parameter required to compute the result is σ, which we suppose equals 3. Although we set σ as a const
, we'll include it as an argument function whenever it's applied. This is done solely to emphasize the dependence of the function on this value.
The package NLsolve
requires specifying the equations to be solved through residuals. Each residual represents the difference between the value provided by an equation against its value under a proposed solution. The code includes the residuals through a vector res
, whose entries need to be zero in equilibrium. Likewise, the solutions are represented by a vector called sol
. The rest of the variables in the code use the same notation as the equations, thanks to the use of Unicode characters.
One downside of NLsolve
is that its solver function must depend only on the residuals and the solutions—you can't add parameters to the function. Therefore, to achieve type stability, you need to wrap the solver and computation functions within the same function.
We start with a baseline implementation relying on for-loops. This will help us explain what the code does.
function computation(sᵢ, ΔSₘ, eᵢ, σ)
êᵢ = similar(sᵢ); p̂ᵢ = similar(sᵢ)
function solver!(res, sol)
PÌ‚ = sol[1]
ŝᵢ = sol[2:length(sᵢ)+1]
for i in eachindex(êᵢ)
êᵢ[i] = (σ - (σ - 1) * sᵢ[i] * ŝᵢ[i]) / (σ - (σ - 1) * sᵢ[i])
p̂ᵢ[i] = êᵢ[i] * (eᵢ[i] - 1) / (êᵢ[i] * eᵢ[i] - 1)
end
sum_shares = [(1 - ŝᵢ[i]) * sᵢ[i] for i in eachindex(sᵢ)]
res[1] = sum(sum_shares) - ΔSₘ
for i in 2:length(sáµ¢)+1
res[i] = ŝᵢ[i-1] - p̂ᵢ[i-1]^(1 - σ) * P̂^(σ - 1)
end
end
function execute()
sol0 = vcat(0.9, 0.8 .* ones(length(sáµ¢)))
solution = nlsolve(solver!, sol0)
converged(solution) || error("Failed to converge")
return solution.zero
end
return execute()
end
@btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));
50.182 ms (8094 allocations: 41.60 MiB)
Next, we'll propose several ways to express the function solver!
and its components, which capture the system of equations and variables to be solved. Most of the computational time is spent on the multiple calls to solver!
. As the rest of the terms have a negligible impact on time, it's possible to reexpress them if that increases readability.
Moreover, the function execute
sets the initial conditions and the call to the solver. It'll stay the same in all the code examples.
The first entry sol[1]
represents P, while the other entries in sol
refer to si​ for each i∈Ω. Likewise, res[1]
expresses (1), with the rest of the entries in res
referring to equation (2) for each i∈Ω.
The improvements proposed are below.
function computation(sᵢ, ΔSₘ, eᵢ, σ)
êᵢ, p̂ᵢ = (similar(sᵢ) for _ in 1:2)
function solver!(res, sol)
PÌ‚ = sol[1]
ŝᵢ = @view sol[2:length(sᵢ)+1]
sum_shares = ((1 - ŝᵢ[i]) * sᵢ[i] for i in eachindex(sᵢ))
res[1] = sum(sum_shares) - ΔSₘ
rest = @view res[2:length(sáµ¢)+1] # to update the results below
@turbo for i in eachindex(êᵢ)
êᵢ[i] = (σ - (σ - 1) * sᵢ[i] * ŝᵢ[i]) / (σ - (σ - 1) * sᵢ[i])
p̂ᵢ[i] = êᵢ[i] * (eᵢ[i] - 1) / (êᵢ[i] * eᵢ[i] - 1)
rest[i] = ŝᵢ[i] - p̂ᵢ[i]^(1 - σ) * P̂^(σ - 1)
end
end
function execute()
sol0 = vcat(0.9, 0.8 .* ones(length(sáµ¢)))
solution = nlsolve(solver!, sol0)
converged(solution) || error("Failed to converge")
return solution.zero
end
return execute()
end
@btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));
29.187 ms (64 allocations: 9.74 MiB)
The modifications and their justifications are the following. First, the line êᵢ = similar(sᵢ); p̂ᵢ = similar(sᵢ)
is wordy, and can be simplified by using êᵢ, p̂ᵢ = (similar(sᵢ) for _ in 1:2)
. This makes it possible to initialize multiple variables in one line. Note we could've also used an array comprehension instead of a generator, with a minimal impact on performance (the comprehension is only called once as it's not part of solver!
).
Second, slices consisting of a scalar don't allocate memory, while slices with more than two elements do. This determines that the slice of sol
comprising each si​ allocates. We circumvent this by the use of @view
.
Third, we're not interested in sum_shares
itself. Rather, it acts as an auxiliary that is eventually called to compute sum(sum_shares)
. In the baseline case, sum_shares
is defined through an array comprehension. This stores the result in a new vector, and therefore allocates memory. Instead, we can delay its computation through a generator, allowing the result to fuse with sum
—Julia can avoid the allocation when it knows that the vector will ultimately be "reduced" to a scalar.
Fourth, the package LoopVectorization
provides the macro @turbo
. This significantly improves the speed of for-loops and vectorized code when the operations involve multiple terms.
Fifth, all the for-loops to compute the entries greater than 2 of res
can be combined into a single for-loop. This requires defining a view of the object, which we call rest
, enabling us to include rest
and the auxiliaries for calculating εi​ and p​i​.
Combining these computations into one for-loop is crucial for efficiently parallelizing the code, as it gathers the three most demanding calculations of solver!
. The number of firms in our example is too small to see any gain from parallelizing code. Nonetheless, it's worth remarking that, if necessary, we can combine the advantages of both parallelizing code and applying @turbo
through the macro @tturbo
.
The code based on for-loops allows for speed enhancements. However, coming up with a version that is both efficient and readable can be challenging. In this section, we propose a vectorized version that improves on readability, while maintaining the same speed as the for-loop version.
The baseline code is as follows.
function computation(sᵢ, ΔSₘ, eᵢ, σ)
function solver!(res, sol)
PÌ‚ = sol[1]
ŝᵢ = sol[2:length(sᵢ)+1]
êᵢ = @. (σ - (σ - 1) * sᵢ * ŝᵢ) / (σ - (σ - 1) * sᵢ)
p̂ᵢ = @. êᵢ * (eᵢ - 1) / (êᵢ * eᵢ - 1)
res[1] = sum((1 .- ŝᵢ) .* sᵢ) - ΔSₘ
res[2:length(sᵢ)+1] = @. ŝᵢ - p̂ᵢ^(1 - σ) * P̂^(σ - 1)
end
function execute()
sol0 = vcat(0.9, 0.8 .* ones(length(sáµ¢)))
solution = nlsolve(solver!, sol0)
converged(solution) || error("Failed to converge")
return solution.zero
end
return execute()
end
@btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));
53.760 ms (20137 allocations: 89.38 MiB)
The following code snippet provides the same type of enhancements that we included in the for-loop version.
function computation(sᵢ, ΔSₘ, eᵢ, σ)
êᵢ, p̂ᵢ = (similar(sᵢ) for _ in 1:2)
idx_shares = 2:length(sáµ¢)+1
function solver!(res, sol)
PÌ‚ = sol[1]
ŝᵢ = @view sol[idx_shares]
res[1] = sum(@~ (1 .- ŝᵢ) .* sᵢ) - ΔSₘ
@turbo @. êᵢ = (σ - (σ - 1) * sᵢ * ŝᵢ) / (σ - (σ - 1) * sᵢ)
@turbo @. p̂ᵢ = êᵢ * (eᵢ - 1) / (êᵢ * eᵢ - 1)
@turbo res[idx_shares] .= @. ŝᵢ - p̂ᵢ^(1 - σ) * P̂^(σ - 1)
end
function execute()
sol0 = vcat(0.9, 0.8 .* ones(length(sáµ¢)))
solution = nlsolve(solver!, sol0)
converged(solution) || error("Failed to converge")
return solution.zero
end
return execute()
end
@btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));
28.782 ms (67 allocations: 9.74 MiB)
Several of the speed improvements applied to for-loops have a direct analogous in the vectorized code. This is the case of the use of @view
and @turbo
. The main difference is the use of the macro @~
from the package LazyArrays
. This is applied to the expression (1 .- ŝᵢ) .* s
, which corresponds to sum_shares
in the for-loop version. It has a similar effect as applying a generator: it delays its computation, and therefore allows for fusing the operation with sum
. Technically, @~
implements the vectorized code lazily instead of eagerly.
For readability, we improved upon the baseline code in various ways. First, we've applied the macro @.
to avoid the need to write .
for each element-wise operation. Moreover, we incorporated idx_shares
, which represents the indices of res
representing equation (2) for each i∈Ω.
We conclude by providing an alternative for the last code presented. It improves readability further by separating the system of equations to be solved and the definition of the intermediate variables, p​i​ and εi​. The code is also well-suited for cases where the computations of p​i​ and εi​ are more complex.
The approach relies on in-place functions for p​i​ and εi​. The function solver!
uses them by initializing p​i​ and εi​, and updating these variables in each iteration.
The auxiliary functions are defined below. We provide two alternatives. The second one in particular simply shows that we could repeat the vectorized form from the previous code.
function êᵢ!(êᵢ, ŝᵢ, sᵢ, σ)
@turbo for i in eachindex(êᵢ)
numerator = σ - (σ - 1) * sᵢ[i] * ŝᵢ[i]
denominator = σ - (σ - 1) * sᵢ[i]
êᵢ[i] = numerator / denominator
end
end
function p̂ᵢ!(p̂ᵢ, êᵢ, eᵢ)
@turbo for i in eachindex(êᵢ)
numerator = êᵢ[i] * (eᵢ[i] - 1)
denominator = êᵢ[i] * eᵢ[i] - 1
p̂ᵢ[i] = numerator / denominator
end
end
êᵢ!(êᵢ, ŝᵢ, sᵢ, σ) = @turbo @. êᵢ = (σ - (σ - 1) * sᵢ * ŝᵢ) / (σ - (σ - 1) * sᵢ)
p̂ᵢ!(p̂ᵢ, êᵢ, eᵢ) = @turbo @. p̂ᵢ = êᵢ * (eᵢ - 1) / (êᵢ * eᵢ - 1)
The solver then initializes êᵢ
and p̂ᵢ
, and updates them in each iteration by executing êᵢ!
and p̂ᵢ!
. The rest of the code is identical to the previous section.
function computation(sᵢ, ΔSₘ, eᵢ, σ)
êᵢ, p̂ᵢ = (similar(sᵢ) for _ in 1:2)
idx_shares = 2:length(sáµ¢)+1
@views function solver!(res, sol)
PÌ‚ = sol[1]
ŝᵢ = sol[idx_shares]
êᵢ!(êᵢ, ŝᵢ, sᵢ, σ)
p̂ᵢ!(p̂ᵢ, êᵢ, eᵢ)
res[1] = sum(@~ (1 .- ŝᵢ) .* sᵢ) - ΔSₘ
@turbo @. res[idx_shares] = ŝᵢ - p̂ᵢ^(1 - σ) * P̂^(σ - 1)
end
function execute()
sol0 = vcat(0.9, 0.8 .* ones(length(sáµ¢)))
solution = nlsolve(solver!, sol0)
converged(solution) || error("Failed to converge")
return solution.zero
end
return execute()
end
@btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));
28.925 ms (67 allocations: 9.74 MiB)
Note the use of @views
before the function statement. This is a neat way to indicate that each slice within the function should be treated as a view, rather than a copy.