pic
Personal
Website

9d. Slice Views to Decrease Allocations

PhD in Economics

Introduction

Previously, we defined a slice as a subvector derived from a parent vector x. Common examples include expressions such as x[1:2], which extracts elements at positions 1 and 2, or x[x .> 0], which selects only those elements that are positive. By default, these operations create a copy of the data and therefore allocate memory. The only exception to this rule occurs when a slice comprises a single object.

In this section, we address the issue of memory allocations associated with slices. To do this, we highlight the role of views, which bypass the need for a copy by directly referencing the parent object. The strategy is particularly effective when slices are indexed through ranges. On the contrary, it's not suitable for slices that employ Boolean indexing, like x[x .> 0], where memory allocation will still occur even with views.

Interestingly, we'll show scenarios where copying data could actually be faster than using views, despite the additional memory allocation involved. This apparent paradox emerges because copied data is stored in a contiguous block of memory, which facilitates more efficient access patterns.

Views of Slices

We begin by showing that views don't allocate memory when a slice is indexed by a range. This behavior can yield performance improvements over regular slices, which create a copy of the data by default.

x = [1, 2, 3]

foo(x) = sum(x[1:2])           # allocations from the slice 'x[1:2]'
Output in REPL
julia>
@btime foo($x)
  11.568 ns (2 allocations: 80 bytes)
x = [1, 2, 3]

foo(x) = sum(@view(x[1:2]))    # it doesn't allocate
Output in REPL
julia>
@btime foo($x)
  1.944 ns (0 allocations: 0 bytes)

Keep in mind, though, that views created through Boolean indexing neither reduce memory allocations nor are more performant. This is why you shouldn't rely on views of these objects to speed up computations. This fact is illustrated below.

x = rand(1_000)

foo(x) = sum(x[x .> 0.5])
Output in REPL
julia>
@btime foo($x)
  294.075 ns (6 allocations: 4.10 KiB)
x = rand(1_000)

foo(x) = @views sum(x[x .> 0.5])
Output in REPL
julia>
@btime foo($x)
  462.459 ns (6 allocations: 4.10 KiB)

Copying Data May Be Faster

Although views can reduce memory allocations, there are scenarios where copying data can result in faster performance. A detailed comparison of copies versus views will be provided in Part II. Here, we simply remark on this possibility.

Essentially, the choice between copies and views reflects a fundamental trade-off between memory allocation and data access patterns. On the one hand, newly created vectors store data in contiguous blocks of memory, enabling more efficient CPU access and allowing for certain optimizations. On the other hand, views avoid allocation, but may also require accessing data scattered across non-contiguous memory regions.

Below, we illustrate a scenario in which the overhead of creating a copy is outweighed by the benefits of contiguous memory access, making copying the more efficient choice.

x = rand(100_000)

foo(x) = max.(x[1:2:length(x)], 0.5)
Output in REPL
julia>
@btime foo($x)
  27.871 μs (6 allocations: 781.39 KiB)
x = rand(100_000)

foo(x) = max.(@view(x[1:2:length(x)]), 0.5)
Output in REPL
julia>
@btime foo($x)
  131.474 μs (3 allocations: 390.70 KiB)