Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
- [Debugging and Testing](./incrcomp-debugging.md)
- [Profiling Queries](./queries/profiling.md)
- [Salsa](./salsa.md)
- [Memory Management in Rustc](./memory.md)
- [Lexing and Parsing](./the-parser.md)
- [`#[test]` Implementation](./test-implementation.md)
- [Panic Implementation](./panic-implementation.md)
Expand Down
3 changes: 2 additions & 1 deletion src/appendix/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ them better.

Term | Meaning
------------------------|--------
arena/arena allocation | an _arena_ is a large memory buffer from which other memory allocations are made. This style of allocation is called _area allocation_. See [this chapter](../memory.md) for more info.
AST | the abstract syntax tree produced by the syntax crate; reflects user syntax very closely.
binder | a "binder" is a place where a variable or type is declared; for example, the `<T>` is a binder for the generic type parameter `T` in `fn foo<T>(..)`, and \|`a`\|` ...` is a binder for the parameter `a`. See [the background chapter for more](./background.html#free-vs-bound)
bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expression \|`a`\|` a * 2`. See [the background chapter for more](./background.html#free-vs-bound)
Expand Down Expand Up @@ -33,7 +34,7 @@ ICE | internal compiler error. When the compiler crashes.
ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type.
infcx | the inference context (see `librustc/infer`)
intern | interning refers to storing certain frequently-used constant data, such as strings, and then referring to the data by an identifier (e.g. a `Symbol`) rather than the data itself, to reduce memory usage.
intern | interning refers to storing certain frequently-used constant data, such as strings, and then referring to the data by an identifier (e.g. a `Symbol`) rather than the data itself, to reduce memory usage and number of allocations. See [this chapter](../memory.md) for more info.
IR | Intermediate Representation. A general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it.
IRLO | `IRLO` or `irlo` is sometimes used as an abbreviation for [internals.rust-lang.org](https://internals.rust-lang.org).
item | a kind of "definition" in the language, such as a static, const, use statement, module, struct, etc. Concretely, this corresponds to the `Item` type.
Expand Down
88 changes: 88 additions & 0 deletions src/memory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Memory Management in Rustc

Rustc tries to be pretty careful how it manages memory. The compiler allocates
_a lot_ of data structures throughout compilation, and if we are not careful,
it will take a lot of time and space to do so.

One of the main way the compiler manages this is using arenas and interning.

## Arenas and Interning

We create a LOT of data structures during compilation. For performance reasons,
we allocate them from a global memory pool; they are each allocated once from a
long-lived *arena*. This is called _arena allocation_. This system reduces
allocations/deallocations of memory. It also allows for easy comparison of
types for equality: for each interned type `X`, we implemented [`PartialEq for
X`][peqimpl], so we can just compare pointers. The [`CtxtInterners`] type
contains a bunch of maps of interned types and the arena itself.

[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena

### Example: `ty::TyS`

Taking the example of [`ty::TyS`] which represents a type in the compiler (you
can read more [here](./ty.md)). Each time we want to construct a type, the
compiler doesn’t naively allocate from the buffer. Instead, we check if that
type was already constructed. If it was, we just get the same pointer we had
before, otherwise we make a fresh pointer. With this schema if we want to know
if two types are the same, all we need to do is compare the pointers which is
efficient. `TyS` is carefully setup so you never construct them on the stack.
You always allocate them from this arena and you always intern them so they are
unique.

At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
to that buffer is freed and our `'tcx` references would be invalid.

In addition to types, there are a number of other arena-allocated data structures that you can
allocate, and which are found in this module. Here are a few examples:

- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
defined and discussed in depth in the `AdtDef and DefId` section.
- [`Predicate`] defines something the trait system has to prove (see `traits` module).

[subst]: ./generic_arguments.html#subst
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html

[`ty::TyS`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html

## The tcx and how it uses lifetimes

The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
context:

```rust,ignore
tcx: TyCtxt<'tcx>
// ----
// |
// arena lifetime
```

As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
the arenas, anyhow).

### A Note On Lifetimes

The Rust compiler is a fairly large program containing lots of big data
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
references are heavily relied upon to minimize unnecessary memory use. This
manifests itself in the way people can plug into the compiler (i.e. the
[driver](./rustc-driver.md)), preferring a "push"-style API (callbacks) instead
of the more Rust-ic "pull" style (think the `Iterator` trait).

Thread-local storage and interning are used a lot through the compiler to reduce
duplication while also preventing a lot of the ergonomic issues due to many
pervasive lifetimes. The [`rustc::ty::tls`][tls] module is used to access these
thread-locals, although you should rarely need to touch it.

[tls]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/tls/index.html
13 changes: 0 additions & 13 deletions src/rustc-driver.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,6 @@ replaces this functionality.
> **Warning:** By its very nature, the internal compiler APIs are always going
> to be unstable. That said, we do try not to break things unnecessarily.

## A Note On Lifetimes

The Rust compiler is a fairly large program containing lots of big data
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
references are heavily relied upon to minimize unnecessary memory use. This
manifests itself in the way people can plug into the compiler, preferring a
"push"-style API (callbacks) instead of the more Rust-ic "pull" style (think
the `Iterator` trait).

Thread-local storage and interning are used a lot through the compiler to reduce
duplication while also preventing a lot of the ergonomic issues due to many
pervasive lifetimes. The `rustc::ty::tls` module is used to access these
thread-locals, although you should rarely need to touch it.

[cb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/trait.Callbacks.html
[rd_rc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/fn.run_compiler.html
Expand Down
117 changes: 30 additions & 87 deletions src/ty.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,41 @@ field of type [`TyKind`][tykind], which represents the key type information. `Ty
which represents different kinds of types (e.g. primitives, references, abstract data types,
generics, lifetimes, etc). `TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They
are convenient hacks for efficiency and summarize information about the type that we may want to
know, but they don’t come into the picture as much here.
know, but they don’t come into the picture as much here. Finally, `ty::TyS`s
are [interned](./memory.md), so that the `ty::Ty` can be a thin pointer-like
type. This allows us to do cheap comparisons for equality, along with the other
benefits of interning.

[tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
[kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html#structfield.kind
[tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html

## Allocating and working with types

To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
that correspond mostly to the various kinds of types. For example:

```rust,ignore
let array_ty = tcx.mk_array(elem_ty, len * 2);
```

These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
allocate exactly the same type twice).

> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
> – however, this is almost never what you want to do unless you happen to be hashing and looking
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
> particularly once inference is involved. If you are going to be testing for type equality, you
> probably need to start looking into the inference code to do it right.

You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
`tcx.types.char`, etc (see [`CommonTypes`] for more).

[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html

## `ty::TyKind` Variants

Note: `TyKind` is **NOT** the functional programming concept of *Kind*.

Whenever working with a `Ty` in the compiler, it is common to match on the kind of type:
Expand All @@ -147,8 +176,6 @@ types in the compiler.
There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes,
“substitutions”, etc).

## `ty::TyKind` Variants

There are a bunch of variants on the `TyKind` enum, which you can see by looking at the rustdocs.
Here is a sampling:

Expand Down Expand Up @@ -191,90 +218,6 @@ will discuss this more later.
[kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variant.Error
[kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variants

## Interning

We create a LOT of types during compilation. For performance reasons, we allocate them from a global
memory pool, they are each allocated once from a long-lived *arena*. This is called _arena
allocation_. This system reduces allocations/deallocations of memory. It also allows for easy
comparison of types for equality: we implemented [`PartialEq for TyS`][peqimpl], so we can just
compare pointers. The [`CtxtInterners`] type contains a bunch of maps of interned types and the
arena itself.

[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena

Each time we want to construct a type, the compiler doesn’t naively allocate from the buffer.
Instead, we check if that type was already constructed. If it was, we just get the same pointer we
had before, otherwise we make a fresh pointer. With this schema if we want to know if two types are
the same, all we need to do is compare the pointers which is efficient. `TyS` which represents types
is carefully setup so you never construct them on the stack. You always allocate them from this
arena and you always intern them so they are unique.

At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
to that buffer is freed and our `'tcx` references would be invalid.


## The tcx and how it uses lifetimes

The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
context:

```rust,ignore
tcx: TyCtxt<'tcx>
// ----
// |
// arena lifetime
```

As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
the arenas, anyhow).

## Allocating and working with types

To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
that correspond mostly to the various kinds of types. For example:

```rust,ignore
let array_ty = tcx.mk_array(elem_ty, len * 2);
```

These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
allocate exactly the same type twice).

> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
> – however, this is almost never what you want to do unless you happen to be hashing and looking
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
> particularly once inference is involved. If you are going to be testing for type equality, you
> probably need to start looking into the inference code to do it right.

You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
`tcx.types.char`, etc (see [`CommonTypes`] for more).

[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html

## Beyond types: other kinds of arena-allocated data structures

In addition to types, there are a number of other arena-allocated data structures that you can
allocate, and which are found in this module. Here are a few examples:

- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
defined and discussed in depth in the `AdtDef and DefId` section.
- [`Predicate`] defines something the trait system has to prove (see `traits` module).

[subst]: ./generic_arguments.html#subst
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html

## Import conventions

Although there is no hard and fast rule, the `ty` module tends to be used like so:
Expand Down
9 changes: 2 additions & 7 deletions src/type-inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,17 +43,12 @@ tcx.infer_ctxt().enter(|infcx| {
})
```

Each inference context creates a short-lived type arena to store the
fresh types and things that it will create, as described in the
[chapter on the `ty` module][ty-ch]. This arena is created by the `enter`
function and disposed of after it returns.

[ty-ch]: ty.html

Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
fresh `'cx`, while `'tcx` is the same as outside the inference context.
(Again, see the [`ty` chapter][ty-ch] for more details on this setup.)

[ty-ch]: ty.html

The `tcx.infer_ctxt` method actually returns a builder, which means
there are some kinds of configuration you can do before the `infcx` is
created. See `InferCtxtBuilder` for more information.
Expand Down