Skip to content

Commit 7ff227a

Browse files
fix: improve description of boxing/allocation for int types
There was one correct and one incorrect description; the incorrect one has been deleted and now refers to the correct one.
1 parent a07e6f1 commit 7ff227a

File tree

6 files changed

+35
-21
lines changed

6 files changed

+35
-21
lines changed

Manual/BasicTypes/Char.lean

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ tag := "char-runtime"
4444
As a {ref "inductive-types-trivial-wrappers"}[trivial wrapper], characters are represented identically to {lean}`UInt32`.
4545
In particular, characters are represented as 32-bit immediate values in monomorphic contexts.
4646
In other words, a field of a constructor or structure of type {lean}`Char` does not require indirection to access.
47-
In polymorphic contexts, characters are boxed.
47+
In polymorphic contexts, characters are {tech}[boxed].
4848

4949

5050
# Syntax

Manual/BasicTypes/Int.lean

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Integers are specially supported by Lean's implementation.
2525
The logical model of the integers is based on the natural numbers: each integer is modeled as either a natural number or the negative successor of a natural number.
2626
Operations on the integers are specified using this model, which is used in the kernel and in interpreted code.
2727
In these contexts, integer code inherits the performance benefits of the natural numbers' special support.
28-
In compiled code, integers are represented as efficient arbitrary-precision integers, and sufficiently small numbers are stored as unboxed values that don't require indirection through a pointer.
28+
In compiled code, integers are represented as efficient arbitrary-precision integers, and sufficiently small numbers are stored as values that don't require indirection through a pointer.
2929
Arithmetic operations are implemented by primitives that take advantage of the efficient representations.
3030

3131
# Logical Model
@@ -46,7 +46,7 @@ Integers can also be represented as a pair of natural numbers in which one is su
4646
tag := "int-runtime"
4747
%%%
4848

49-
Like {ref "nat-runtime"}[natural numbers], sufficiently-small integers are represented as unboxed values: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer.
49+
Like {ref "nat-runtime"}[natural numbers], sufficiently-small integers are represented without pointers: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer.
5050
If an integer is too large to fit in the remaining bits, it is instead allocated as an ordinary Lean object that consists of an object header and an arbitrary-precision integer.
5151

5252
# Syntax

Manual/BasicTypes/Nat.lean

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The {deftech}[natural numbers] are nonnegative integers.
2222
Logically, they are the numbers 0, 1, 2, 3, …, generated from the constructors {lean}`Nat.zero` and {lean}`Nat.succ`.
2323
Lean imposes no upper bound on the representation of natural numbers other than physical constraints imposed by the available memory of the computer.
2424

25-
Because the natural numbers are fundamental to both mathematical reasoning and programming, they are specially supported by Lean's implementation. The logical model of the natural numbers is as an {tech}[inductive type], and arithmetic operations are specified using this model. In Lean's kernel, the interpreter, and compiled code, closed natural numbers are represented as efficient arbitrary-precision integers. Sufficiently small numbers are unboxed values that don't require indirection through a pointer. Arithmetic operations are implemented by primitives that take advantage of the efficient representations.
25+
Because the natural numbers are fundamental to both mathematical reasoning and programming, they are specially supported by Lean's implementation. The logical model of the natural numbers is as an {tech}[inductive type], and arithmetic operations are specified using this model. In Lean's kernel, the interpreter, and compiled code, closed natural numbers are represented as efficient arbitrary-precision integers. Sufficiently small numbers are values that don't require indirection through a pointer. Arithmetic operations are implemented by primitives that take advantage of the efficient representations.
2626

2727
# Logical Model
2828
%%%
@@ -107,10 +107,10 @@ In the kernel, there are special `Nat` literal values that use a widely-trusted,
107107
Basic functions such as addition are overridden by primitives that use this representation.
108108
Because they are part of the kernel, if these primitives did not correspond to their definitions as Lean functions, it could undermine soundness.
109109

110-
In compiled code, sufficiently-small natural numbers are represented as unboxed values: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer, and the remaining bits are used to store the number.
111-
31 bits are available on 32-bits architectures for unboxed {lean}`Nat`s, while 63 bits are available on 64-bit architectures.
110+
In compiled code, sufficiently-small natural numbers are represented without pointer indirections: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer, and the remaining bits are used to store the number.
111+
31 bits are available on 32-bits architectures for pointer-free {lean}`Nat`s, while 63 bits are available on 64-bit architectures.
112112
In other words, natural numbers smaller than $`2^{31} = 2,147,483,648` or $`2^{63} = 9,223,372,036,854,775,808` do not require allocations.
113-
If an natural number is too large for the unboxed representation, it is instead allocated as an ordinary Lean object that consists of an object header and an arbitrary-precision integer value.
113+
If an natural number is too large for this representation, it is instead allocated as an ordinary Lean object that consists of an object header and an arbitrary-precision integer value.
114114

115115
## Performance Notes
116116
%%%

Manual/BasicTypes/UInt.lean

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,13 @@ Signed integers wrap the corresponding unsigned integers, and use a twos-complem
6060
tag := "fixed-int-runtime"
6161
%%%
6262

63-
In compiled code, fixed-width integer types that fit in one less bit than the platform's pointer size are always represented unboxed, without additional allocations or indirections.
63+
In compiled code in contexts that require {tech}[boxed] representations, fixed-width integer types that fit in one less bit than the platform's pointer size are always represented without additional allocations or indirections.
6464
This always includes {lean}`Int8`, {lean}`UInt8`, {lean}`Int16`, and {lean}`UInt16`.
65-
On 64-bit architectures, {lean}`Int32` and {lean}`UInt32` are also unboxed.
66-
On 32-bit architectures, {lean}`Int32` and {lean}`UInt32` are boxed, which means they may be represented by a pointer to an object on the heap.
67-
{lean}`ISize`, {lean}`USize`, {lean}`Int64` and {lean}`UInt64` are boxed on all architectures.
65+
On 64-bit architectures, {lean}`Int32` and {lean}`UInt32` are also represented without pointers.
66+
On 32-bit architectures, {lean}`Int32` and {lean}`UInt32` require a pointer to an object on the heap.
67+
{lean}`ISize`, {lean}`USize`, {lean}`Int64` and {lean}`UInt64` may require pointers on all architectures.
6868

69-
Even though some fixed-with integer types require boxing in general, the compiler is able to represent them without boxing in code paths that use only a specific fixed-width type rather than being polymorphic, potentially after a specialization pass.
69+
Even though some fixed-with integer types require boxing in general, the compiler is able to represent them without boxing or pointer indirections in code paths that use only a specific fixed-width type rather than being polymorphic, potentially after a specialization pass.
7070
This applies in most practical situations where these types are used: their values are represented using the corresponding unsigned fixed-width C type when a constructor parameter, function parameter, function return value, or intermediate result is known to be a fixed-width integer type.
7171
The Lean run-time system includes primitives for storing fixed-width integers in constructors of {tech}[inductive types], and the primitive operations are defined on the corresponding C types, so boxing tends to happen at theedges” of integer calculations rather than for each intermediate result.
7272
In contexts where other types might occur, such as the contents of polymorphic containers like {name}`Array`, these types are boxed, even if an array is statically known to contain only a single fixed-width integer type.{margin}[The monomorphic array type {lean}`ByteArray` avoids boxing for arrays of {lean}`UInt8`.]

Manual/Language/InductiveTypes.lean

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -371,11 +371,7 @@ axiom α : Prop
371371
```
372372

373373
* The representation of the fixed-width integer types {lean}`UInt8`, …, {lean}`UInt64`, {lean}`Int8`, …, {lean}`Int64`, and {lean}`USize` depends on the whether the code is compiled for a 32- or 64-bit architecture.
374-
Fixed-width integer types that are strictly smaller than the architecture's pointer type are stored unboxed by setting the lowest bit of a pointer to `1`.
375-
Integer types that are at least as large as the architecture's pointer type may be boxed or unboxed, depending on whether a concrete value fits in one fewer bits than the pointer type.
376-
If so, it is encoded by setting the lowest bit of the value to `1` (checked by {c}`lean_is_scalar`).
377-
Otherwise, the value is represented is a pointer to a fixed-size Lean object on the heap.
378-
In the C FFI, these values are marshalled into the appropriate C types {c}`uint8_t`, …, {c}`uint64_t`, and {c}`size_t`, respectively.{margin}[Fixed-width signed integer types are also represented as unsigned C integers in the FFI.]
374+
Their representation is described {ref "fixed-int-runtime"}[in a dedicated section].
379375

380376
* {lean}`Char` is represented by `uint32_t`. Because {lean}`Char` values never require more than 21 bits, they are always unboxed.
381377

@@ -386,7 +382,7 @@ axiom α : Prop
386382
* {lean}`Decidable α` is represented the same way as `Bool` {TODO}[Aren't Decidable and Bool just special cases of the rules for trivial constructors and irrelevance?]
387383

388384
* {lean}`Nat` and {lean}`Int` are represented by {c}`lean_object *`.
389-
A run-time {lean}`Nat` or {lean}`Int` value is either a pointer to an opaque arbitrary-precision integer object or, if the lowest bit of the “pointer” is `1` (checked by {c}`lean_is_scalar`), an encoded unboxed natural number or integer ({c}`lean_box`/{c}`lean_unbox`). {TODO}[Move these to FFI section or Nat chapter]
385+
Their representations are described in more detail in {ref "nat-runtime"}[the section on natural numbers] and {ref "int-runtime"}[the section on integers].
390386

391387
:::
392388

Manual/Runtime.lean

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,24 @@ These services include:
4646
There are many primitive operators.
4747
They are described in their respective sections under {ref "basic-types"}[Basic Types].
4848

49+
# Boxing
50+
%%%
51+
tag := "boxing"
52+
%%%
53+
54+
:::paragraph
55+
Lean values may be represented at runtime in two ways:
56+
* {deftech}_Boxed_ values may be pointers to heap values or require shifting and masking.
57+
* {deftech}_Unboxed_ values are immediately available.
58+
:::
59+
60+
Boxed values are either a pointer to an object, in which case the lowest-order bit is 0, or an immediate value, in which case the lowest-order bit is 1 and the value is found by shifting the representation to the right by one bit.
61+
62+
Types with an unboxed representation, such as {name}`UInt8` and {tech}[enum inductive] types, are represented as the corresponding C types in contexts where the compiler can be sure that the value has said type.
63+
In some contexts, such as generic container types like {name}`Array`, otherwise-unboxed values must be boxed prior to storage.
64+
In other words, {name}`Bool.not` is called with and returns unboxed `uint8_t` values because the {tech}[enum inductive] type {name}`Bool` has an unboxed representation, but the individual {name}`Bool` values in an {lean}`Array Bool` are boxed.
65+
A field of type {lean}`Bool` in an inductive type's constructor is represented unboxed, while {lean}`Bool`s stored in polymorphic fields that are instantiated as {lean}`Bool` are boxed.
66+
4967

5068
# Reference Counting
5169
%%%
@@ -385,15 +403,15 @@ local macro "..." : term => ``(«...»)
385403
In the {tech key:="application binary interface"}[ABI], Lean types are translated to C types as follows:
386404

387405
* The integer types {lean}`UInt8`, …, {lean}`UInt64`, {lean}`USize` are represented by the C types {c}`uint8_t`, ..., {c}`uint64_t`, {c}`size_t`, respectively.
388-
If their {ref "fixed-int-runtime"}[run-time representation] requires boxing, then they are unboxed at the FFI boundary.
406+
If their {ref "fixed-int-runtime"}[run-time representation] requires {tech key:="boxed"}[boxing], then they are unboxed at the FFI boundary.
389407
* {lean}`Char` is represented by {c}`uint32_t`.
390408
* {lean}`Float` is represented by {c}`double`.
391409
* {name}`Nat` and {name}`Int` are represented by {c}`lean_object *`.
392-
Their runtime values is either a pointer to an opaque bignum object or, if the lowest bit of the "pointer" is 1 ({c}`lean_is_scalar`), an encoded unboxed natural number or integer ({c}`lean_box`/{c}`lean_unbox`).
410+
Their runtime values is either a pointer to an opaque bignum object or, if the lowest bit of the "pointer" is 1 ({c}`lean_is_scalar`), an encoded natural number or integer ({c}`lean_box`/{c}`lean_unbox`).
393411
* A universe {lean}`Sort u`, type constructor {lean}`...Sort u`, or proposition {lean}`p`​` :`{lean}` Prop` is {tech}[irrelevant] and is either statically erased (see above) or represented as a {c}`lean_object *` with the runtime value {c}`lean_box(0)`
394412
* The ABI for other inductive types that don't have special compiler support depends on the specifics of the type.
395413
It is the same as the {ref "run-time-inductives"}[run-time representation] of these types.
396-
Its runtime value is a pointer to an object of a subtype of {c}`lean_object` (see the "Inductive types" section below) or the unboxed value {c}`lean_box(cidx)` for the {c}`cidx`th constructor of an inductive type if this constructor does not have any relevant parameters.
414+
Its runtime value is either a pointer to an object of a subtype of {c}`lean_object` (see the "Inductive types" section below) or it is the value {c}`lean_box(cidx)` for the {c}`cidx`th constructor of an inductive type if this constructor does not have any relevant parameters.
397415

398416
```lean (show := false)
399417
variable (u : Unit)

0 commit comments

Comments
 (0)