T.E.C.H. - Martin Alfaro

Action	Keyboard Shortcut
Open All Codes and Outputs in a Post	`Ctrl + 🠙`
Close All Codes and Outputs in a Post	`Ctrl + 🠛`
Close Any Popped Up Window (like this one)	`Esc`
Previous Post	`Ctrl + 🠘`
Next Post	`Ctrl + 🠚`
List of Sections in a Post	`Ctrl + x`
List of Keyboard Shortcuts	`Ctrl + z`

Introduction

Goal of the post: To present strategies for quantifying a model, maintaining a clear link between the equations and code, and without compromising speed. The programming language used is Julia.

Motivation: While code readability aligns with the old programming maxim that "code is read more often than it's written", it may act against speed. Furthermore, code readability doesn't guarantee a clear mapping between the code and the mathematical model.

Benefits of having a tight link between code and equations: it makes code easier to understand for both the programmer and third parties, additionally reducing the possibility of bugs when translating equations into code.

Tasks for which the post can be relevant: the approaches presented seek a balance between speed and code readability. This implies that, while there could be other methods that increase computational speed further, the emphasis is on approaches not demanding many additional lines of code.

The main conclusions of the post are:

Unicode characters require more typing, but are effective in establishing a direct link with the mathematical terms.
While for-loops commonly represent the most efficient approach in Julia, they can affect readability. A vectorized form can circumvent this and simultaneously achieve an identical performance.

Warning!

The aspects regarding Julia don't require any knowledge of Economics. The post basically is about solutions for a non-linear system of equations. If you're only interested in the Julia implementation and don't care about Economics, go directly to this section. There, I start by summarizing the system of equations to solve, regardless of how they were obtained.

Generalities of the Setup

We consider a simple exercise modeling an industry. The goal is to compute the impact of tougher import competition on consumer welfare. On a technical level, this requires solving a non-linear system of equations.

Our data consists of

each domestic firm's market share
total import market share

The market structure considered is extremely simple: an oligopoly with no entry/exit and competition in prices. In this simple model, tougher import competition only affects domestic firms by reducing their market share, which in turn benefits consumers by reducing markups.

Setup

There is a set of firms $\Omega$ . The notation for a given firm $i$

$q_{i}$ $\rightarrow$ $i$ 's quantity
$p_{i}$ $\rightarrow$ $i$ 's price
$z_{i}$ $\rightarrow$ $i$ 's "quality" (exogenous).

Moreover, the industry values are:

$E$ $\rightarrow$ industry expenditure (exogenous)
$\mathbb{P}$ $\rightarrow$ price index
$\mathbb{W}:=\frac{E}{\mathbb{P}}$ $\rightarrow$ welfare
$S_{m}$ $\rightarrow$ total market share of imports

We'll make use of a CES demand function. This is well-suited for expressing all our variables in percentage variations through the "hat-algebra" procedure. Specifically, the demand for firm $i$ is

\begin{aligned} q_{i}=E\left(z_{i}\right)^{\delta}\left(p_{i}\right)^{-\sigma}\left(\mathbb{P}\right)^{\sigma-1} \end{aligned}

where the price index is given by $\mathbb{P}:=\left[\sum_{i\in\Omega}\left(z_{i}\right)^{\delta}\left(p_{i}\right)^{1-\sigma}\right]^{\frac{1}{1-\sigma}}$ .

This demand determines a price elasticity given by

\begin{aligned} \varepsilon_{i}=\sigma-\left(\sigma-1\right)s_{i}. \end{aligned}

Given a revenue $r_{i}:=q_{i}p_{i}$ , the market share of $i$ defined by $s_{i}:=\frac{r_{i}}{E_{j}}$ . This can be expressed as $s_{i}=\left(z_{i}\right)^{\delta}\left(p_{i}\right)^{1-\sigma}\left(\mathbb{P}\right)^{\sigma-1}.$ , which substituting in the definition of the price index becomes

\begin{aligned} s_{i}=\frac{\left(z_{i}\right)^{\delta}\left(p_{i}\right)^{1-\sigma}}{\sum_{i\in\Omega}\left(z_{i}\right)^{\delta}\left(p_{i}\right)^{1-\sigma}}. \end{aligned}

Optimal Prices and Equilibrium

Optimal price of a firm $i$ can be expressed as a function of its market share by using that

\begin{aligned} p_{i}=\mu\left[\varepsilon_{i}\left(s_{i}\right)\right]c_{i} \end{aligned}

where $\mu\left(\varepsilon_{i}\right)=\frac{\varepsilon_{i}}{\varepsilon_{i}-1}$ and $\varepsilon\left(s_{i}\right)=\sigma-s_{i}\left(\sigma-1\right)$ .

Finally, there's equilibrium in the industry when the sum of market shares equals one:

\begin{aligned} \sum_{i\in\omega}s_{i}+S_{m}=1 \end{aligned}

Scenarios

We consider two scenarios:

initial situation (data observed), denoted $*$
counterfactual (hypothetical situation), denoted $**$

In the counterfactual, there's an increase in the import market share, $S_{m}$ , which is formally defined by $\Delta S_{m}:=S_{m}^{**}-S_{m}^{*}>0$ .

We'll express the system using hat notation. This means that for any variable $x$ , the equilibrium conditions are expressed through

\begin{aligned} \widehat{x}:=\frac{x^{**}}{x^{*}}. \end{aligned}

Our goal is to compute the impact on welfare, measured as real income $\mathbb{W}:=\frac{E}{\mathbb{P}}$ . As industry expenditure is given, we seek to compute

\begin{aligned} \mathbb{\frac{W^{**}-\mathbb{W}^{*}}{\mathbb{W}^{*}}}=\frac{1}{\widehat{\mathbb{P}}}-1 \end{aligned}

We next state the system of equations to be solved, which pins down $\widehat{\mathbb{P}}$ . The derivations are unimportant and are included below as optional.

Summary of Equations to be Solved

Identifying $\widehat{\mathbb{P}}$ can be done by solving for $\left(\widehat{\mathbb{P}},\left(\widehat{s}_{i}\right)_{i\in\Omega}\right)$ simultaneously. This requires solving the following system of equations,

\sum_{i\in\Omega}\left(1-\widehat{s}_{i}\right)s_{i}^{*}=\Delta S_{m}

\widehat{s}_{i}=\left(\widehat{p}_{i}\right)^{1-\sigma}\left(\widehat{\mathbb{P}}\right)^{\sigma-1}

where (2) holds for each $i$ and

\widehat{p}_{i}=\frac{\widehat{\varepsilon}_{i}\left(\varepsilon_{i}^{*}-1\right)}{\widehat{\varepsilon}_{i}\varepsilon_{i}^{*}-1}

\widehat{\varepsilon}_{i}=\frac{\sigma-\left(\sigma-1\right)s_{i}^{*}\widehat{s}_{i}}{\sigma-\left(\sigma-1\right)s_{i}^{*}}

Note that $\widehat{p}_{i}\left(\widehat{\varepsilon}_{i}\right)$ and $\widehat{\varepsilon}_{i}\left(\widehat{s}_{i}\right)$ , so that they're fully determined by $\widehat{s}_{i}$ .

Information at our disposal to solve the system:

$s_{i}^{*}$ for each $i\in\Omega$ .
$\varepsilon_{i}^{*}$ for each $i\in\Omega$ , computed by knowledge of $s_{i}^{*}$ .
Parameters: only $\sigma$ , for which we'll assume a value.
Shock: $\Delta S_{m}$ , for which we'll assume a value.

This text is hidden but takes up space. This text is hidden but takes up space. This text is hidden but takes up space. This text is hidden but takes up space. This text is hidden but takes up space. This text is hidden but takes up space.

Data and Julia Packages

The system of equations is solved using the package NLsolve. Additionally, we'll apply various optimization stemming from the packages LoopVectorization and LazyArrays. Below, we provide how the mock data is constructed and the list of packages used.

# parameters
const σ::Float64 = 3

# shock and outcome sought 
ΔSₘ               = 0.10
welfare_change(P̂) = round( (1/P̂ - 1) * 100, digits=2)

# (mock) data
nr_firms = 500
Sₘ       = 0.05
sᵢ        = rand(Pareto(log(4, 5)), nr_firms) |> 
           x -> (x ./ sum(x) .* (1 - Sₘ))

elasticity(share, σ) = σ + share - share*σ
eᵢ                   = elasticity.(sᵢ, Ref(σ))

using Distributions, Random
Random.seed!(1234)

using NLsolve
using LazyArrays, LoopVectorization

The goal is to compute $\widehat{\mathbb{P}}$ , from which we then can use welfare_change to obtain the result. This is why we directly focus on the computation of $\widehat{\mathbb{P}}$ .

Notice that the code uses Unicode characters, which aligns directly with the notation used for the equations. Unfortunately, Unicode characters break code alignment with =, hence hindering readability. Nevertheless, the benefits of using Unicode characters seem to outweigh this issue.

The mock data comprises 500 firms, which is a large number for the use of StaticArrays. Moreover, we have information on $s_{i}$ , from which we can determine $\varepsilon_{i}$ . We also establish a shock where import market share increases by 10 percentage points, represented by $\Delta S_{m}=0.10$ , in a scenario where imports represent 5% of the market share.

Lastly, the only parameter required to compute the result is $\sigma$ , which we suppose equals 3. Although we set $\sigma$ as a const, we'll include it as an argument function whenever it's applied. This is done solely to emphasize the dependence of the function on this value.

Remark

The example is extremely simple, and aims to illustrate the benefits of a balance between speed and code clarity. Speed considerations could be disregarded in our particular example, as the running time is negligible.

Remark

It can be shown that all the approaches to computing the result are identical and type stable.

Solving the System: For-Loops

The package NLsolve requires specifying the equations to be solved through residuals. Each residual represents the difference between the value provided by an equation against its value under a proposed solution. The code includes the residuals through a vector res, whose entries need to be zero in equilibrium. Likewise, the solutions are represented by a vector called sol. The rest of the variables in the code use the same notation as the equations, thanks to the use of Unicode characters.

One downside of NLsolve is that its solver function must depend only on the residuals and the solutions—you can't add parameters to the function. Therefore, to achieve type stability, you need to wrap the solver and computation functions within the same function.

We start with a baseline implementation relying on for-loops. This will help us explain what the code does.

Baseline

function computation(sᵢ, ΔSₘ, eᵢ, σ)
    êᵢ = similar(sᵢ); p̂ᵢ = similar(sᵢ)

    function solver!(res, sol)
        P̂  = sol[1]
        ŝᵢ  = sol[2:length(sᵢ)+1]

        for i in eachindex(êᵢ)
            êᵢ[i] = (σ - (σ - 1) * sᵢ[i] * ŝᵢ[i]) / (σ - (σ - 1) * sᵢ[i])
            p̂ᵢ[i] = êᵢ[i] * (eᵢ[i] - 1) / (êᵢ[i] * eᵢ[i] - 1)
        end

        sum_shares = [(1 - ŝᵢ[i]) * sᵢ[i] for i in eachindex(sᵢ)]
        res[1]     = sum(sum_shares) - ΔSₘ

        for i in 2:length(sᵢ)+1
            res[i] = ŝᵢ[i-1] - p̂ᵢ[i-1]^(1 - σ) * P̂^(σ - 1)
        end

    end

    function execute()
        sol0     = vcat(0.9, 0.8 .* ones(length(sᵢ)))
        solution = nlsolve(solver!, sol0)

        converged(solution) || error("Failed to converge")
        return solution.zero
    end

    return execute()
end

Output in REPL

julia> @btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));

  50.182 ms (8094 allocations: 41.60 MiB)

Next, we'll propose several ways to express the function solver! and its components, which capture the system of equations and variables to be solved. Most of the computational time is spent on the multiple calls to solver!. As the rest of the terms have a negligible impact on time, it's possible to reexpress them if that increases readability.

Moreover, the function execute sets the initial conditions and the call to the solver. It'll stay the same in all the code examples.

The first entry sol[1] represents $\widehat{\mathbb{P}}$ , while the other entries in sol refer to $\widehat{s}_{i}$ for each $i\in\Omega$ . Likewise, res[1] expresses (1), with the rest of the entries in res referring to equation (2) for each $i\in\Omega$ .

Improvements

The improvements proposed are below.

Improvements

function computation(sᵢ, ΔSₘ, eᵢ, σ)
    êᵢ, p̂ᵢ = (similar(sᵢ) for _ in 1:2)

    function solver!(res, sol)
        P̂  = sol[1]
        ŝᵢ  = @view sol[2:length(sᵢ)+1]

        sum_shares = ((1 - ŝᵢ[i]) * sᵢ[i] for i in eachindex(sᵢ))
        res[1]     = sum(sum_shares) - ΔSₘ

        rest  = @view res[2:length(sᵢ)+1]          # to update the results below
            @turbo for i in eachindex(êᵢ)
                êᵢ[i]     =  (σ - (σ - 1) * sᵢ[i] * ŝᵢ[i]) / (σ - (σ - 1) * sᵢ[i])
                p̂ᵢ[i]     =  êᵢ[i] * (eᵢ[i] - 1) / (êᵢ[i] * eᵢ[i] - 1)
                rest[i]  =  ŝᵢ[i] - p̂ᵢ[i]^(1 - σ) * P̂^(σ - 1)
            end
    end

    function execute()
        sol0     = vcat(0.9, 0.8 .* ones(length(sᵢ)))
        solution = nlsolve(solver!, sol0)

        converged(solution) || error("Failed to converge")
        return solution.zero
    end

    return execute()
end

Output in REPL

julia> @btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));

  29.187 ms (64 allocations: 9.74 MiB)

The modifications and their justifications are the following. First, the line êᵢ = similar(sᵢ); p̂ᵢ = similar(sᵢ) is wordy, and can be simplified by using êᵢ, p̂ᵢ = (similar(sᵢ) for _ in 1:2). This makes it possible to initialize multiple variables in one line. Note we could've also used an array comprehension instead of a generator, with a minimal impact on performance (the comprehension is only called once as it's not part of solver!).

Second, slices consisting of a scalar don't allocate memory, while slices with more than two elements do. This determines that the slice of sol comprising each $\widehat{s}_{i}$ allocates. We circumvent this by the use of @view.

Third, we're not interested in sum_shares itself. Rather, it acts as an auxiliary that is eventually called to compute sum(sum_shares). In the baseline case, sum_shares is defined through an array comprehension. This stores the result in a new vector, and therefore allocates memory. Instead, we can delay its computation through a generator, allowing the result to fuse with sum—Julia can avoid the allocation when it knows that the vector will ultimately be "reduced" to a scalar.

Fourth, the package LoopVectorization provides the macro @turbo. This significantly improves the speed of for-loops and vectorized code when the operations involve multiple terms.

Fifth, all the for-loops to compute the entries greater than 2 of res can be combined into a single for-loop. This requires defining a view of the object, which we call rest, enabling us to include rest and the auxiliaries for calculating $\widehat{\varepsilon}_{i}$ and $\widehat{p}_{i}$ .

Combining these computations into one for-loop is crucial for efficiently parallelizing the code, as it gathers the three most demanding calculations of solver!. The number of firms in our example is too small to see any gain from parallelizing code. Nonetheless, it's worth remarking that, if necessary, we can combine the advantages of both parallelizing code and applying @turbo through the macro @tturbo.

Vectorized Code

The code based on for-loops allows for speed enhancements. However, coming up with a version that is both efficient and readable can be challenging. In this section, we propose a vectorized version that improves on readability, while maintaining the same speed as the for-loop version.

The baseline code is as follows.

Baseline

function computation(sᵢ, ΔSₘ, eᵢ, σ)
    function solver!(res, sol)
        P̂  = sol[1]
        ŝᵢ  = sol[2:length(sᵢ)+1]

        êᵢ = @. (σ - (σ - 1) * sᵢ * ŝᵢ) / (σ - (σ - 1) * sᵢ)
        p̂ᵢ = @. êᵢ * (eᵢ - 1) / (êᵢ * eᵢ - 1)


        res[1]             = sum((1 .- ŝᵢ) .* sᵢ) - ΔSₘ
        res[2:length(sᵢ)+1] = @. ŝᵢ - p̂ᵢ^(1 - σ) * P̂^(σ - 1)
    end

    function execute()
        sol0     = vcat(0.9, 0.8 .* ones(length(sᵢ)))
        solution = nlsolve(solver!, sol0)

        converged(solution) || error("Failed to converge")
        return solution.zero
    end

    return execute()
end

Output in REPL

julia> @btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));

  53.760 ms (20137 allocations: 89.38 MiB)

The following code snippet provides the same type of enhancements that we included in the for-loop version.

Improvements

function computation(sᵢ, ΔSₘ, eᵢ, σ)
    êᵢ, p̂ᵢ      = (similar(sᵢ) for _ in 1:2)
    idx_shares = 2:length(sᵢ)+1

    function solver!(res, sol)
        P̂      = sol[1]
        ŝᵢ      = @view sol[idx_shares]
        
        res[1] = sum(@~ (1 .- ŝᵢ) .* sᵢ) - ΔSₘ
        
        @turbo @. êᵢ = (σ - (σ - 1) * sᵢ * ŝᵢ) / (σ - (σ - 1) * sᵢ)
        @turbo @. p̂ᵢ = êᵢ * (eᵢ - 1) / (êᵢ * eᵢ - 1)
                          
        @turbo res[idx_shares] .= @. ŝᵢ - p̂ᵢ^(1 - σ) * P̂^(σ - 1)
    end

    function execute()
        sol0     = vcat(0.9, 0.8 .* ones(length(sᵢ)))
        solution = nlsolve(solver!, sol0)

        converged(solution) || error("Failed to converge")
        return solution.zero
    end

    return execute()
end

Output in REPL

julia> @btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));

  28.782 ms (67 allocations: 9.74 MiB)

Several of the speed improvements applied to for-loops have a direct analogous in the vectorized code. This is the case of the use of @view and @turbo. The main difference is the use of the macro @~ from the package LazyArrays. This is applied to the expression (1 .- ŝᵢ) .* s, which corresponds to sum_shares in the for-loop version. It has a similar effect as applying a generator: it delays its computation, and therefore allows for fusing the operation with sum. Technically, @~ implements the vectorized code lazily instead of eagerly.

For readability, we improved upon the baseline code in various ways. First, we've applied the macro @. to avoid the need to write . for each element-wise operation. Moreover, we incorporated idx_shares, which represents the indices of res representing equation (2) for each $i\in\Omega$ .

Vectorized Code: My Preferred Solution

We conclude by providing an alternative for the last code presented. It improves readability further by separating the system of equations to be solved and the definition of the intermediate variables, $\widehat{p}_{i}$ and $\widehat{\varepsilon}_{i}$ . The code is also well-suited for cases where the computations of $\widehat{p}_{i}$ and $\widehat{\varepsilon}_{i}$ are more complex.

The approach relies on in-place functions for $\widehat{p}_{i}$ and $\widehat{\varepsilon}_{i}$ . The function solver! uses them by initializing $\widehat{p}_{i}$ and $\widehat{\varepsilon}_{i}$ , and updating these variables in each iteration.

The auxiliary functions are defined below. We provide two alternatives. The second one in particular simply shows that we could repeat the vectorized form from the previous code.

function êᵢ!(êᵢ, ŝᵢ, sᵢ, σ)
    @turbo for i in eachindex(êᵢ)
        numerator   = σ - (σ - 1) * sᵢ[i] * ŝᵢ[i]
        denominator = σ - (σ - 1) * sᵢ[i]
        
        êᵢ[i] = numerator / denominator
    end
end

function p̂ᵢ!(p̂ᵢ, êᵢ, eᵢ)
    @turbo for i in eachindex(êᵢ)
        numerator   = êᵢ[i] * (eᵢ[i] - 1)
        denominator = êᵢ[i] * eᵢ[i] - 1

        p̂ᵢ[i] = numerator / denominator
    end   
end

êᵢ!(êᵢ, ŝᵢ, sᵢ, σ) = @turbo @. êᵢ = (σ - (σ - 1) * sᵢ * ŝᵢ) / (σ - (σ - 1) * sᵢ)
p̂ᵢ!(p̂ᵢ, êᵢ, eᵢ)    = @turbo @. p̂ᵢ = êᵢ * (eᵢ - 1) / (êᵢ * eᵢ - 1)

The solver then initializes êᵢ and p̂ᵢ, and updates them in each iteration by executing êᵢ! and p̂ᵢ!. The rest of the code is identical to the previous section.

Solver Implementation

function computation(sᵢ, ΔSₘ, eᵢ, σ)
    êᵢ, p̂ᵢ      = (similar(sᵢ) for _ in 1:2)
    idx_shares = 2:length(sᵢ)+1

    @views function solver!(res, sol)
        P̂  = sol[1]
        ŝᵢ  = sol[idx_shares]
        
        êᵢ!(êᵢ, ŝᵢ, sᵢ, σ)
        p̂ᵢ!(p̂ᵢ, êᵢ, eᵢ)        

                  res[1]          = sum(@~ (1 .- ŝᵢ) .* sᵢ) - ΔSₘ
        @turbo @. res[idx_shares] = ŝᵢ - p̂ᵢ^(1 - σ) * P̂^(σ - 1)
    end

    function execute()
        sol0     = vcat(0.9, 0.8 .* ones(length(sᵢ)))
        solution = nlsolve(solver!, sol0)

        converged(solution) || error("Failed to converge")
        return solution.zero
    end

    return execute()
end

Output in REPL

julia> @btime computation(ref($sᵢ), ref($ΔSₘ), ref($eᵢ), ref($σ));

  28.925 ms (67 allocations: 9.74 MiB)

Note the use of @views before the function statement. This is a neat way to indicate that each slice within the function should be treated as a view, rather than a copy.

NOTATION AND KEYBOARD SHORTCUTS

Introduction

Generalities of the Setup

Setup

Optimal Prices and Equilibrium

Scenarios

Derivations - Optional

Equilibrium

Market Shares

Prices

Summary of Equations to be Solved

Data and Julia Packages

Solving the System: For-Loops

Improvements

Vectorized Code

Vectorized Code: My Preferred Solution