fix: improve description of boxing/allocation for int types

david-christiansen · david-christiansen · commit 7ff227a584e8 · 2025-07-02T12:07:33.000+02:00
There was one correct and one incorrect description; the incorrect one
has been deleted and now refers to the correct one.
diff --git a/Manual/BasicTypes/Char.lean b/Manual/BasicTypes/Char.lean
@@ -44,7 +44,7 @@ tag := "char-runtime"
 As a {ref "inductive-types-trivial-wrappers"}[trivial wrapper], characters are represented identically to {lean}`UInt32`.
 In particular, characters are represented as 32-bit immediate values in monomorphic contexts.
 In other words, a field of a constructor or structure of type {lean}`Char` does not require indirection to access.
-In polymorphic contexts, characters are boxed.
+In polymorphic contexts, characters are {tech}[boxed].
 
 
 # Syntax
diff --git a/Manual/BasicTypes/Int.lean b/Manual/BasicTypes/Int.lean
@@ -25,7 +25,7 @@ Integers are specially supported by Lean's implementation.
 The logical model of the integers is based on the natural numbers: each integer is modeled as either a natural number or the negative successor of a natural number.
 Operations on the integers are specified using this model, which is used in the kernel and in interpreted code.
 In these contexts, integer code inherits the performance benefits of the natural numbers' special support.
-In compiled code, integers are represented as efficient arbitrary-precision integers, and sufficiently small numbers are stored as unboxed values that don't require indirection through a pointer.
+In compiled code, integers are represented as efficient arbitrary-precision integers, and sufficiently small numbers are stored as values that don't require indirection through a pointer.
 Arithmetic operations are implemented by primitives that take advantage of the efficient representations.
 
 # Logical Model
@@ -46,7 +46,7 @@ Integers can also be represented as a pair of natural numbers in which one is su
 tag := "int-runtime"
 %%%
 
-Like {ref "nat-runtime"}[natural numbers], sufficiently-small integers are represented as unboxed values: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer.
+Like {ref "nat-runtime"}[natural numbers], sufficiently-small integers are represented without pointers: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer.
 If an integer is too large to fit in the remaining bits, it is instead allocated as an ordinary Lean object that consists of an object header and an arbitrary-precision integer.
 
 # Syntax
diff --git a/Manual/BasicTypes/Nat.lean b/Manual/BasicTypes/Nat.lean
@@ -22,7 +22,7 @@ The {deftech}[natural numbers] are nonnegative integers.
 Logically, they are the numbers 0, 1, 2, 3, …, generated from the constructors {lean}`Nat.zero` and {lean}`Nat.succ`.
 Lean imposes no upper bound on the representation of natural numbers other than physical constraints imposed by the available memory of the computer.
 
-Because the natural numbers are fundamental to both mathematical reasoning and programming, they are specially supported by Lean's implementation. The logical model of the natural numbers is as an {tech}[inductive type], and arithmetic operations are specified using this model. In Lean's kernel, the interpreter, and compiled code, closed natural numbers are represented as efficient arbitrary-precision integers. Sufficiently small numbers are unboxed values that don't require indirection through a pointer. Arithmetic operations are implemented by primitives that take advantage of the efficient representations.
+Because the natural numbers are fundamental to both mathematical reasoning and programming, they are specially supported by Lean's implementation. The logical model of the natural numbers is as an {tech}[inductive type], and arithmetic operations are specified using this model. In Lean's kernel, the interpreter, and compiled code, closed natural numbers are represented as efficient arbitrary-precision integers. Sufficiently small numbers are values that don't require indirection through a pointer. Arithmetic operations are implemented by primitives that take advantage of the efficient representations.
 
 # Logical Model
 %%%
@@ -107,10 +107,10 @@ In the kernel, there are special `Nat` literal values that use a widely-trusted,
 Basic functions such as addition are overridden by primitives that use this representation.
 Because they are part of the kernel, if these primitives did not correspond to their definitions as Lean functions, it could undermine soundness.
 
-In compiled code, sufficiently-small natural numbers are represented as unboxed values: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer, and the remaining bits are used to store the number.
-31 bits are available on 32-bits architectures for unboxed {lean}`Nat`s, while 63 bits are available on 64-bit architectures.
+In compiled code, sufficiently-small natural numbers are represented without pointer indirections: the lowest-order bit in an object pointer is used to indicate that the value is not, in fact, a pointer, and the remaining bits are used to store the number.
+31 bits are available on 32-bits architectures for pointer-free {lean}`Nat`s, while 63 bits are available on 64-bit architectures.
 In other words, natural numbers smaller than $`2^{31} = 2,147,483,648` or $`2^{63} = 9,223,372,036,854,775,808` do not require allocations.
-If an natural number is too large for the unboxed representation, it is instead allocated as an ordinary Lean object that consists of an object header and an arbitrary-precision integer value.
+If an natural number is too large for this representation, it is instead allocated as an ordinary Lean object that consists of an object header and an arbitrary-precision integer value.
 
 ## Performance Notes
 %%%
diff --git a/Manual/BasicTypes/UInt.lean b/Manual/BasicTypes/UInt.lean
@@ -60,13 +60,13 @@ Signed integers wrap the corresponding unsigned integers, and use a twos-complem
 tag := "fixed-int-runtime"
 %%%
 
-In compiled code, fixed-width integer types that fit in one less bit than the platform's pointer size are always represented unboxed, without additional allocations or indirections.
+In compiled code in contexts that require {tech}[boxed] representations, fixed-width integer types that fit in one less bit than the platform's pointer size are always represented without additional allocations or indirections.
 This always includes {lean}`Int8`, {lean}`UInt8`, {lean}`Int16`, and {lean}`UInt16`.
-On 64-bit architectures, {lean}`Int32` and {lean}`UInt32` are also unboxed.
-On 32-bit architectures, {lean}`Int32` and {lean}`UInt32` are boxed, which means they may be represented by a pointer to an object on the heap.
-{lean}`ISize`, {lean}`USize`, {lean}`Int64` and {lean}`UInt64` are boxed on all architectures.
+On 64-bit architectures, {lean}`Int32` and {lean}`UInt32` are also represented without pointers.
+On 32-bit architectures, {lean}`Int32` and {lean}`UInt32` require a pointer to an object on the heap.
+{lean}`ISize`, {lean}`USize`, {lean}`Int64` and {lean}`UInt64` may require pointers on all architectures.
 
-Even though some fixed-with integer types require boxing in general, the compiler is able to represent them without boxing in code paths that use only a specific fixed-width type rather than being polymorphic, potentially after a specialization pass.
+Even though some fixed-with integer types require boxing in general, the compiler is able to represent them without boxing or pointer indirections in code paths that use only a specific fixed-width type rather than being polymorphic, potentially after a specialization pass.
 This applies in most practical situations where these types are used: their values are represented using the corresponding unsigned fixed-width C type when a constructor parameter, function parameter, function return value, or intermediate result is known to be a fixed-width integer type.
 The Lean run-time system includes primitives for storing fixed-width integers in constructors of {tech}[inductive types], and the primitive operations are defined on the corresponding C types, so boxing tends to happen at the “edges” of integer calculations rather than for each intermediate result.
 In contexts where other types might occur, such as the contents of polymorphic containers like {name}`Array`, these types are boxed, even if an array is statically known to contain only a single fixed-width integer type.{margin}[The monomorphic array type {lean}`ByteArray` avoids boxing for arrays of {lean}`UInt8`.]
diff --git a/Manual/Language/InductiveTypes.lean b/Manual/Language/InductiveTypes.lean
@@ -371,11 +371,7 @@ axiom α : Prop
 ```
 
  * The representation of the fixed-width integer types {lean}`UInt8`, …, {lean}`UInt64`, {lean}`Int8`, …, {lean}`Int64`, and {lean}`USize` depends on the whether the code is compiled for a 32- or 64-bit architecture.
-   Fixed-width integer types that are strictly smaller than the architecture's pointer type are stored unboxed by setting the lowest bit of a pointer to `1`.
-   Integer types that are at least as large as the architecture's pointer type may be boxed or unboxed, depending on whether a concrete value fits in one fewer bits than the pointer type.
-   If so, it is encoded by setting the lowest bit of the value to `1` (checked by {c}`lean_is_scalar`).
-   Otherwise, the value is represented is a pointer to a fixed-size Lean object on the heap.
-   In the C FFI, these values are marshalled into the appropriate C types {c}`uint8_t`, …, {c}`uint64_t`, and {c}`size_t`, respectively.{margin}[Fixed-width signed integer types are also represented as unsigned C integers in the FFI.]
+  Their representation is described {ref "fixed-int-runtime"}[in a dedicated section].
 
  * {lean}`Char` is represented by `uint32_t`. Because {lean}`Char` values never require more than 21 bits, they are always unboxed.
 
@@ -386,7 +382,7 @@ axiom α : Prop
  * {lean}`Decidable α` is represented the same way as `Bool` {TODO}[Aren't Decidable and Bool just special cases of the rules for trivial constructors and irrelevance?]
 
  * {lean}`Nat` and {lean}`Int` are represented by {c}`lean_object *`.
-   A run-time {lean}`Nat` or {lean}`Int` value is either a pointer to an opaque arbitrary-precision integer object or, if the lowest bit of the “pointer” is `1` (checked by {c}`lean_is_scalar`), an encoded unboxed natural number or integer ({c}`lean_box`/{c}`lean_unbox`). {TODO}[Move these to FFI section or Nat chapter]
+  Their representations are described in more detail in {ref "nat-runtime"}[the section on natural numbers] and {ref "int-runtime"}[the section on integers].
 
 :::
 
diff --git a/Manual/Runtime.lean b/Manual/Runtime.lean
@@ -46,6 +46,24 @@ These services include:
 There are many primitive operators.
 They are described in their respective sections under {ref "basic-types"}[Basic Types].
 
+# Boxing
+%%%
+tag := "boxing"
+%%%
+
+:::paragraph
+Lean values may be represented at runtime in two ways:
+* {deftech}_Boxed_ values may be pointers to heap values or require shifting and masking.
+* {deftech}_Unboxed_ values are immediately available.
+:::
+
+Boxed values are either a pointer to an object, in which case the lowest-order bit is 0, or an immediate value, in which case the lowest-order bit is 1 and the value is found by shifting the representation to the right by one bit.
+
+Types with an unboxed representation, such as {name}`UInt8` and {tech}[enum inductive] types, are represented as the corresponding C types in contexts where the compiler can be sure that the value has said type.
+In some contexts, such as generic container types like {name}`Array`, otherwise-unboxed values must be boxed prior to storage.
+In other words, {name}`Bool.not` is called with and returns unboxed `uint8_t` values because the {tech}[enum inductive] type {name}`Bool` has an unboxed representation, but the individual {name}`Bool` values in an {lean}`Array Bool` are boxed.
+A field of type {lean}`Bool` in an inductive type's constructor is represented unboxed, while {lean}`Bool`s stored in polymorphic fields that are instantiated as {lean}`Bool` are boxed.
+
 
 # Reference Counting
 %%%
@@ -385,15 +403,15 @@ local macro "..." : term => ``(«...»)
 In the {tech key:="application binary interface"}[ABI], Lean types are translated to C types as follows:
 
 * The integer types {lean}`UInt8`, …, {lean}`UInt64`, {lean}`USize` are represented by the C types {c}`uint8_t`, ..., {c}`uint64_t`, {c}`size_t`, respectively.
-  If their {ref "fixed-int-runtime"}[run-time representation] requires boxing, then they are unboxed at the FFI boundary.
+  If their {ref "fixed-int-runtime"}[run-time representation] requires {tech key:="boxed"}[boxing], then they are unboxed at the FFI boundary.
 * {lean}`Char` is represented by {c}`uint32_t`.
 * {lean}`Float` is represented by {c}`double`.
 * {name}`Nat` and {name}`Int` are represented by {c}`lean_object *`.
-  Their runtime values is either a pointer to an opaque bignum object or, if the lowest bit of the "pointer" is 1 ({c}`lean_is_scalar`), an encoded unboxed natural number or integer ({c}`lean_box`/{c}`lean_unbox`).
+  Their runtime values is either a pointer to an opaque bignum object or, if the lowest bit of the "pointer" is 1 ({c}`lean_is_scalar`), an encoded natural number or integer ({c}`lean_box`/{c}`lean_unbox`).
 * A universe {lean}`Sort u`, type constructor {lean}`... → Sort u`, or proposition {lean}`p`​` :`{lean}` Prop` is {tech}[irrelevant] and is either statically erased (see above) or represented as a {c}`lean_object *` with the runtime value {c}`lean_box(0)`
 * The ABI for other inductive types that don't have special compiler support depends on the specifics of the type.
   It is the same as the {ref "run-time-inductives"}[run-time representation] of these types.
-  Its runtime value is a pointer to an object of a subtype of {c}`lean_object` (see the "Inductive types" section below) or the unboxed value {c}`lean_box(cidx)` for the {c}`cidx`th constructor of an inductive type if this constructor does not have any relevant parameters.
+  Its runtime value is either a pointer to an object of a subtype of {c}`lean_object` (see the "Inductive types" section below) or it is the value {c}`lean_box(cidx)` for the {c}`cidx`th constructor of an inductive type if this constructor does not have any relevant parameters.
 
   ```lean (show := false)
   variable (u : Unit)