Skip to content

Adopt a Zig Memory Model #6396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mb64 opened this issue Sep 23, 2020 · 17 comments
Open

Adopt a Zig Memory Model #6396

mb64 opened this issue Sep 23, 2020 · 17 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mb64
Copy link
Contributor

mb64 commented Sep 23, 2020

TL;DR: Zig should adopt an official memory model. Some options are to use the C/C++ memory model (easy), or its own better memory model (very hard).

(Note that this issue is about the semantics of concurrent access, not other just-as-important things like pointer aliasing rules and provenance. That should also be specified, but it's the job of another issue.)

The problem

The memory model determines which concurrent programs are valid, and is important as the ground truth for what transformations the compiler is allowed to do. Currently, the Zig memory model is by default "whatever LLVM says". It would be nice to have a Zig memory model independent of LLVM. LLVM memory model

This is by no means urgent -- despite being the subject of significant formal verification efforts, Rust still lacks its own memory model, instead defaulting to the C11/C++11 spec.

Here are some reasons to not just stick with LLVM's model:

  • The Zig language would be tied to the LLVM release schedule. If LLVM changes what it does in the future, Zig would be forced to either change with it, or implement some kind of backwards compatibility on top of it.
  • Other Zig backends (Stage2, as well as any other future backend) are forced to have feature parity with LLVM
  • The LLVM memory model is more docs than spec, so it lacks the formalisms that would unambiguously deal with all edge cases. This also has implications for other Stage2/other backends -- if LLVM starts compiling some code in a more conservative way than Stage2, that shouldn't retroactively make Stage2's behavior a bug.

The LLVM memory model also (somewhat necessarily) incorporates a lot of cruft, from C, Java, and probably more -- what could "starting over" with a memory model fix?

  • Reordering of dependent loads is allowed by C/C++ relaxed loads, and maybe by Java non-volatiles (this is unclear to me, maybe someone else can say for sure). LLVM never allows reordering of dependent loads, by simply not supporting the DEC Alpha processor. If Zig is to ever be run on the DEC Alpha, Zig will have to take a stance on this.

  • LLVM doesn't support C/C++ consume operations, which are arguably broken. Instead, Clang changes all consumes to acquires. Consume semantics were introduced for a reason though, so it's worth thinking about how Zig could succeed where the C/C++ standard failed.

  • LLVM has unordered for Java non-volatile, and monotonic for C/C++ relaxed. It also asserts that monotonic is strictly more ordered than unordered. However, C/C++ relaxed allows some things that Java non-volatile doesn't. This section of the Java memory model has an example:

    Initially x = 0, y = 0.

    Thread 1 Thread 2
    int tmp1 = x; int tmp2 = y;
    if (tmp1 == 1) y = 1; if (tmp2 == 1) x = 1;

    Assuming relaxed loads/stores to x and y, C/C++ allows this program to end with x = y = 1, but Java non-volatiles disallow this result. This is largely considered a bug in the C/C++ spec, but (afaik) all fixes are either too restrictive or require significant changes to how the memory model is specified.

Proposed solution

  • Start off by officially adopting the C/C++ memory model, minus consume. (This is what Rust has currently.)
    • Introduce AtomicOrder.Relaxed, which gets translated to LLVM's monotonic, and remove/deprecate AtomicOrder.Unordered and AtomicOrder.Monotonic.
    • More importantly, improve the AtomicOrder docs, including stating that Zig follows the C11/C++11 memory model.
  • If consume is deemed useful in the future, introduce something like consume in a way that fixes the poor C/C++ semantics. This could be implemented by LLVM with acquire, and in a more fine-grained way in Stage2.
  • Zig might eventually wish to replace the C/C++ memory model with its own memory model, as it's currently doing with other C infrastructure like libc. This would be very hard, and I don't expect it to happen soon, if ever.
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 23, 2020
@andrewrk andrewrk added this to the 0.7.0 milestone Sep 23, 2020
@andrewrk
Copy link
Member

Note that in the language specification (#75) the memory model will be specified without referencing an external definition (even if it happens to match C11).

@rohlem
Copy link
Contributor

rohlem commented Sep 23, 2020

I've also extensively tried to wrap my head around memory models, so I'll try to partake in the discussion when it eventually happens.
Just a couple of quick notes for the interested:

  • Different memory orders (or "AtomicOrder") are a high-level way of reasoning which CPU instructions to use; a more-strictly-defined order means you get more guarantees ("sane behaviour"), but need stronger fencing (and less optimizations through reordering). Consume wasn't "broken" conceptually (IIUC), but it didn't end up mapping to the set of instructions (on older-generation ARM targets I think?) it was designed for - compilers incorrectly assuming so had a broken implementation.
  • The C/C++11 model is sort of the 'best attempt' at an efficient abstraction so far, but it is not (yet?) bullet-proof due to "out-of-thin-air" behaviour. Since its inception, dozens of papers have been published trying to rectify this, most of them incur stronger ordering/costs. The newest most promising one seems to address all issues without penalty, and hasn't been disproved for now. Summary, Presentation, Paper
  • The C/C++11 model turns programs with race conditions wholly-undefined. One sweeping solution is to "localize" their effects, which is done by the "Multicore OCaml" memory model (Paper, Implementation with doc comment); they essentially have to disallow certain optimizations, and use fencing store instructions where concurrent access can occur (or might have occured).
  • (race conditions cont.) While undefined behaviour is a bad fit for Zig, ReleaseFast will probably not want to forfeit optimizations, and I think it's generally difficult to transition from a "safer" model, unless we can provide some runtime detection (for Debug / ReleaseSafe, which I've read is a non-trivial task) and guarantee it doesn't interfer with any of the optimization passes (which may be the source of race conditions themselves).

If LLVM changes what it does in the future, Zig would be forced to either change with it, or implement some kind of backwards compatibility on top of it.

As long as clang supports C/C++11 - 20+, adopting their memory model puts us in the same boat, which seems reasonably stable to me (though maybe only on their target platforms).
I agree that it's a difficult topic, which shouldn't (and needn't) be rushed. IMO 1.0 can probably do with picking the stable sections of the state-of-the-art C/C++ model that we're interested in supporting.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 13, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@HugoFlorentino
Copy link
Contributor

HugoFlorentino commented Aug 12, 2021

Andrew, have you ever read the way Pony language deals with concurrency? From what I have read, even though it is an actor-based oriented language using a GC, the type system has been formally verified, and the reference capabilities leave no option to write unsafe code. Maybe Zig could include something similar, it seems like a very elegant approach to safe concurrency.

Some interesting documents worth reading:

https://sites.google.com/a/athaydes.com/renato-athaydes/posts/fearlessconcurrencyhowclojurerustponyerlanganddartletyouachievethat

https://www.ponylang.io/media/papers/a_prinicipled_design_of_capabilities_in_pony.pdf

https://uu.diva-portal.org/smash/get/diva2:1363822/FULLTEXT01.pdf

http://www.ponylang.org/media/papers/opsla237-clebsch.pdf

@matu3ba
Copy link
Contributor

matu3ba commented Nov 6, 2021

Some stuff worth to follow:
Discussion on how Rust could decide about their memory model with article by maintainer of Linux Kernel Memory Model title "What Memory Model Should the Rust Language Use?" by Paul E. McKenney.
TLDR; "My recommendation is therefore to adopt the untroublesome portions of the C/C++ memory model in safe mode, and the rest in unsafe mode."

@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@matu3ba
Copy link
Contributor

matu3ba commented Nov 29, 2021

More relevant discussion on the c side of things:
propsal n2676 for c memory model. tldr;
"additional machinery may well be desirable for PNVI-ae and PNVI-ae-udi
to give programmers more control of the provenance of the results of byte manipulations, and of what is left marked
as exposed. The design of that machinery should ideally be based on the treatment of representation-byte-accessed
pointer values by existing compiler alias analyses and optimisations"

also @SpexGuy "The primary goal there is to ensure that ptr->int->ptr casts are preserved. Which is good, but there are many other operations that can cause provenance information to be lost. Loading a pointer from memory, for example, or obtaining one from an extern fn."

Contrary positions: how iso c became unusable for OS devs holding the same position as Linus Torvalds and M. Anton Ertl:
define operations in a virtual machine and ... then each operation (like "load" vs "load_acquire" vs "load_volatile") would have well-defined semantics in that virtual machine. And then a compiler writer would be able to do any optimization he dam,n well please, as long as it is semantically 100% equivalent.
Unfortunately Ertl and Torvalds dont express how these semantics would fit with the reality of how optimizing compilers work (see below).

Ralf Jung explains that making stuff too platform specific does not reflect reality of what compilers do and gives an example why this is a bad idea:
"any out-of-bounds write could potentially modify any other variable (at least any variable that has an address in memory). This can make even the most basic parts of high-quality code generation, such as register allocation, tricky or impossible: a variable that has its address taken has to be re-loaded from that same address any time an out-of-bounds write might have happened, since that write might just have hit the right address to change this variable’s value."
(He also mentions that some compilers may be sufficiently simple in not doing optimizations that they would be ok with doing that.)

Further more Jung mentions that there is no data on what is possible in performance improvements with more UB, but that likely one does not use C in the beginning for that.

@mb64
Copy link
Contributor Author

mb64 commented Nov 30, 2021

stuff about provenance

That's an interesting discussion, but it might be better on a different issue. (So far this issue has largely only been about concurrent memory accesses.) Do we have an issue for discussing pointer provenance and aliasing rules?

@matu3ba
Copy link
Contributor

matu3ba commented Nov 30, 2021

it might be better on a different issue

The context will remain the same, so it would be linked anyway.

Do we have an issue for discussing pointer provenance and aliasing rules?

If the thread becomes to long, we can make linked meta-issues. Only proposals to use them, but not how and why with semantics: #1108 #7742
Closely related are the comptime memory rules #7770 #9646

There was also a related proposal, which sounds to me like it goes into direction of storing the provenance explicit by the programmer (though I have no idea what and how provenance is planned to be encoded and used): #7887

Related to pointer semantics is #15232.

@matu3ba
Copy link
Contributor

matu3ba commented Feb 13, 2022

One formal description of ReleaseSafe could be inspirited by "A Formal Model of Checked C" by Li et al, although (bound-)"checked c" and the paper do no statements on allocation errors like double free.
link to checked c
Likewise, Li et al do not document the performance/compilation time cost of converting fat pointers+checks into normal "weaker typed" ones. link to paper on arxiv

@matu3ba
Copy link
Contributor

matu3ba commented Mar 20, 2022

Another relevant idea on the Rust side is to put the provenance information as being part of the type akind to what CHERI is doing: https://gankra.github.io/blah/fix-rust-pointers/.
Article is also a great read as introduction into 1. Allocations and 2. Pointer Provenance.
It is suggested to 1. distinguish between Pointers And Addresses, 2. special syntax for places and offsets (access via adress and raw pointer).

This has been implemented and is described: https://gankra.github.io/blah/tower-of-weakenings/.
Article suggest "The Tower Of Weakenings" to have something super strict and checkable on default with weakening to allow optimizations etc.
Article also describes PNVI-ae and PNVI-ae-udi more in detail.

@matu3ba
Copy link
Contributor

matu3ba commented Apr 12, 2022

An update with accurate description of PNVI-ae-udi and the Rust equivalent was
posted by Ralf Jung.
PNVI-ae-udi means "provenance not via integers adress expose user disambiguation".

Rust

  • ptrtoint casting has sideeffects
  • except: if inttoptr is never used
    • strict provenance experiment
      to evaluate if int->ptr can be replaced
    • could be ensured with the type system by tracking provenance with permissions
  • ptr-int transmutation (union access as ptr from int)
    • use case 1: chaining as casts
    • use case 2: some way to hold arbitrary data in container of a given time
    • => evaluate on removing it from the language
  • on inttoptr guess provenance from all provenances previously marked as "exposed" (globally all)
    • any ptrtoint casting exposes the provenance of "ptr": its permission is “up for grabs”
    • this includes potential aliasing etc
  • code that does not syntactically contain exposing pointer-integer
    or integer-pointer casts can forget that such casts exist at all.

C

  • PNVI-ae-udi does not account for the restrict modifier of C
  • udi part: set of valid guesses in C is a lot more restricted
  • ptr-int transmutation (union access as ptr from int)
    • legacy code
    • no type that can be used to hold data without losing provenance
    • not all loads at int type have provenance, only loads at char type

LLVM

  • ptr-int transmutation with many semantic options for LLVM to choose
    • load behaves like ptrtoint: strip provenance + expose ptr
    • load strips provenance without exposing ptr
    • load is UB or return poison
    • load produces int with provenance and any computation on such int is UB
  • int-ptr transmutation:
    • load has same effects of inttoptr
    • load creates ptr with "invalid" provenance
    • load procudes poison

Example, where optimization will still fail with reason (not in Ralf's work), but described in comments

  • call a C function that returns a pointer
    • unkown provenance
    • do not optimize anything based on pointers as fallback
    • optimization would require additional guarantee
      1. pointer was not exposed/modified and provided by us or
      2. accurate provenance information are stored in between compilation units
    • "blog post is about the specification, where we always know which function is being called"
    • "compiler has to assume it has suitable provenance and that provenance might already be exposed. Or it might not. The compiler has to do something that is correct in both cases."

@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@matu3ba
Copy link
Contributor

matu3ba commented Apr 24, 2022

"Memory Consistency & Atomicity" consists of 2 parts, which have an effect on program correctness:

  1. compiler reorderings (for optimizations)
  2. execution reorderings (which instructions are emitted to ensure the CPU cores do the correct thing)

More details of memory barriers https://stackoverflow.com/a/61711095

// Software Memory Models in C++
// memory_order | fences
// relaxed      | None
// consume      | LoadLoad, LoadStore (fences only for vars that are data-dependent on that atomic)
// acquire      | LoadLoad, LoadStore
// release      | LoadStore, StoreStore
// acq_rel      | LoadLoad, LoadStore, StoreStore
// seq_cst      | All (default order)

LoadStore fence means that reorderings from

Load instruction
Store instruction

into

Store instruction
Load instruction

are prevented for both compiler and CPU core.

Default of all fences looks to be LoadStore + another fence, so LoadStore comes up always as a combination in the C11/C11++ memory model.

Not sure, if this is library is relevant: http://mintomic.github.io/.

@matu3ba
Copy link
Contributor

matu3ba commented Apr 25, 2022

Current problems of C11 memory model (execution semantics for multiple CPU cores) outlined here: https://llvm.org/devmtg/2017-03/assets/slides/weak_memory_concurrency_in_c_cxx11_and_llvm.pdf
The C11 memory model is

  1. not mathematically sane: not monotone
  2. not too weak: Reasoning principles for C11 subsets.
  3. too strong: Compilation to Power and ARM is broken.
  4. not actually useful: It disallows intended program transformations (optimzations)

I think this is the related paper: https://people.mpi-sws.org/~viktor/papers/popl2015-c11comp.pdf

Lots of work going on with different approaches, but desired optimizations could not be verified yet https://arxiv.org/pdf/2108.06944.pdf

@matu3ba
Copy link
Contributor

matu3ba commented Aug 8, 2022

An idealized human readable model (currently restricted to sequential) memory access and incomplete for Rust is given in minirust. This is intended as the specification later to be verified against.
More interactively given is https://github.com/nikomatsakis/a-mir-formality/. However, that is still operational by the underlying language.
"PLT Redex is a domain-specific language designed for specifying and debugging operational semantics. Write down a grammar and the reduction rules, and PLT Redex allows you to interactively explore terms and to use randomized test generation to attempt to falsify properties of your semantics. "

A full specification is aimed for in https://github.com/ferrocene/specification, but this specification is like C and C++ wishful axiomatic ("style is axiomatic") and yet without any formal proof or reasoning.

@matu3ba
Copy link
Contributor

matu3ba commented Aug 10, 2022

Suggested by Ralf, "Promising 2.0: Global Optimizations in Relaxed Memory Concurrency" by Lee et al. looks like the latest state of art on a model for weak memory semantics and summarizes the main problems of other models including the shortcomings of C11. Link https://dl.acm.org/doi/pdf/10.1145/3385412.3386010

TLDR;

    1. Weak memory models will always be more complex to support thread-local + global optimizations affecting each other, but these optimizations are opt-in without creation of unsoundness.
    1. C11 or other models either dont support both or not all possible hardware semantics/optimizations.
    1. More research is needed to compare the tradeoffs, for example to fix the C11 thin-air problem, see https://arxiv.org/pdf/2007.09944.pdf for results+proofs and https://people.mpi-sws.org/~viktor/papers/esop2021-decidable.pdf for future work "Finally, studying the decidability problem for related models that solve the thin-air problem (e.g., Paviotti et al. [27 ]) is interesting and kept as future work."

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@matu3ba
Copy link
Contributor

matu3ba commented Jul 22, 2023

Some notes from the excellent to read "Rust Atomics and Locks Low-Level Concurrency in Practice" by Mara Bos (https://marabos.nl/atomics/):

  1. Zig currently supports only hardware fences and compiler fences (for stuff on the same thread like signal handler and interrupt handler/exceptions) are not yet supported, as far as I understand. Ideally, this could be solved via an annotated inline assembly pass from comptime to allow optimal flexibility, but probably intrinsics will be offered to keep things simple at performance cost (ARM etc). (Zig has calling convention for that)
  2. (To be checked) Zig semantics currently are according to the C11 memory model with hopefully change (likely according to LLVM semantics) to eliminate the "out of thin air problem" in the formal model.
  3. LLVM is currently not able to model and optimize low level primitives at scale and thus intrinsics should be preferred instead of inline assembly.

TODO Hardware protocols (MOESI etc), functionality, future and influence of hardware platform + configuration

@matu3ba
Copy link
Contributor

matu3ba commented Oct 11, 2023

For example programs and a tool for checking, see "Exploring C Semantics and Pointer Provenance"
Looks like there is a broad zoo of memory models, here an overview from "VIP: Verifying Real-World C Idioms with Integer-Pointer Casts" link:

Most notably, "concrete models, in which pointer values are simple machine words or integers" are used in HOL, seL4, "abstract models like that of CompCert in which pointers are represented as a pair of an abstract allocation identifier and an offset in the corresponding allocation block" and "abstract models can only support restricted forms of conversions between pointers and integer".
PNVI-ae-udi looks to be the most realisting one to make it into the C standard, but its complexity make it non appealing for verification.
"The main idea of VIP is to provide the user with a new construct, copy_alloc_id, by means of which they can explicitly annotate integer-to-pointer casts with a pointer value from which to copy the provenance."

More interestingly though, "PNVI-ae-udi tracks (1) ambiguities in provenance and (2) provenance exposure whereas VIP
does not. On the other hand, (3) VIP tracks provenance in integers (in a limited way, for round-trip
casts) whereas PNVI-ae-udi does not, and (4) VIP relies on the copy_alloc_id primitive, which is
not available in PNVI-ae-udi.".

Appendum (20231012):

  • Afaiu, temporal (undereferenced) out of bounds pointers are also not forced to be UB in contrast to C.
  • From what I understand, the one past the end ambiguous provenance of pointers originates from pointer that were casted to integers losing their provenance (applicative to PNVI-ae-udi, but not to VIP model).
  • This model looks much simpler, but I have no clue what LLVM is or will be doing. Question is in https://discord.com/channels/636084430946959380/636732535434510338/1161785114603298947, but I think llvm discord is more chatter and I'll ask on IRC.

@zxubian
Copy link

zxubian commented Dec 2, 2023

Some resources:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

6 participants