Skip to content

Native Assembler: Improvements, Tweaks, Enhancements #7561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ghost opened this issue Dec 27, 2020 · 7 comments
Open

Native Assembler: Improvements, Tweaks, Enhancements #7561

ghost opened this issue Dec 27, 2020 · 7 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ghost
Copy link

ghost commented Dec 27, 2020

Split out from #2081 (comment) . See also #5241 for inline assembly improvements.

Zig Native Assembler

For system interfacing without libc in LLVM-less builds, we will need our own inline assembler. From there, it's not much more work to have a standalone assembler as well. This presents us an opportunity to make some improvements. It won't be wise to stray too far, though, as people will need to port their existing code.

Proposed Changes

  • Intel syntax for x86/64, except directives start with . and symbols end with :
  • scas-style local and relative labels instead of numerics: .label:/b .label, _:/b _(+|-)+ (no b _ -- ambiguous)
  • .pub sym:, .use sym for sharing symbols within a compilation unit
  • .export sym:, .extern ("mod")? sym for sharing symbols between objects
  • No .comm or .global
  • .end takes a symbol(s): .end sym1, sym2 == .size sym1, (. - sym1) [\n] .size sym2, (. - sym2); all non-local symbols must be .ended; replaces .size
  • Relax all guarantees of relative symbol layout beyond .end boundaries
  • Prepend or replace a symbol with its loader address: 0x8000 pin:, 0xff00: -- only possible at the start of a coherent region (all previous symbols have been .ended); linker will detect clashes/range errors
  • On the Zig side, use keyword, to access .pub symbols within the compilation unit: use const func: fn callconv(.c) (u64, bool) u64;
  • Cull some redundant/historical symbols (flexible)

Notes

  • Directives are carefully chosen so as not to clash with existing GNU/LLVM definitions. This way, an LLVM build of Zig can compile both Zig-flavoured and GNU-flavoured asm with no ambiguity.
  • .pub/.use take advantage of Zig's compilation units: .pub symbols are not necessarily exported, and .export symbols are not necessarily public. Symbols from pre-compiled object files cannot be .used; they must be .external (see below).
  • .pub symbols populate a single global namespace; the amorphous organisation permitted by assembly means a strictly hierarchical symbol-sharing model would be untenable. Explicit .use at least makes this much more manageable.
  • .extern "mod" provides some primitive namespacing for libraries, as with Zig extern -- without this, making the use of multiple libraries tenable requires a single global namespace for every symbol in every library on the system. The interpretation of mod is left to the linker, to facilitate versioning of libraries or different library paths; lack of "mod" is always taken to mean another explicit input object file (i.e. argument to zig build-(exe|lib). An unresolved symbol is a compile error.
  • .public and .exported symbols must be declared as such at the symbol definition site, i.e. .pub sym: both declares the symbol sym and marks it public, and there is no way to separate these actions. This facilitates locating such a symbol by a simple global text search.
  • .global makes symbols impossible to track down, .common glosses over potential naming errors; their functionality is subsumed by .pub/.use.
  • .end was chosen rather than some kind of hierarchical structure or dividing by non-local symbols to allow overlapping of symbols, as well as sequencing:
one:
  ; Some code
two:
.end one
  ; This code comes right after `one`, if both are included, but both need not be if optimised

This presents an interesting edge case: a local label may be dropped by a non-local symbol while its use would still be valid. I'm not aware of a clean solution to this.

  • Prepending a section with a loader address may clash with section declarations in other files, and hence is best left to the build system; also, a specific loader address typically implies specific symbol addresses as well, and we make no guarantees of symbol layout within sections, so it would just be more complication for no reason. This at least gets us a bit closer to Zig-based alternative to linker scripts #3206.
  • I considered a hypothetical @sImport(), but the need for strong typing would have made it untenable. Collecting all public symbols into a namespace wouldn't have helped, for the same reason: since builtins will be required anyway, and the lack of explicit source file dependencies means assembly building will have to be coordinated by build.zig anyway, there is no harm in accessing symbols individually. (Note: this still only applies to symbols within the same compilation unit; .exported symbols from prebuilt objects still come through Zig extern, as usual.)
  • Under this system, there is a way for Zig code to directly use symbols from asm, but not the other way around. Unfortunately, there is no clean way around this: asm's flat symbol model can be expressed inside Zig's hierarchical model without recreating it from scratch, but not vice versa. This means that using Zig symbols from asm requires two compilation steps, and use of export/.extern; this is annoying, however, as Zig is typically the driver of asm and not the other way around, and inline asm with Enhancement: New Inline Assembly #5241 makes any structure possible if need be, it is considered acceptable.
@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Dec 27, 2020
@Vexu Vexu added this to the 0.9.0 milestone Dec 27, 2020
@pixelherodev
Copy link
Contributor

Pending the inevitable arguments against it, I'm in full support of this.

I think some of the specific syntax needs to be altered, but aside from that, a "Zig-flavored asm" makes a great deal of sense IMO. This also conveniently paves the way for asm files per input file instead of per compilation unit (thanks to .use), which allows the assembling pass to be parallelized with a single process per asm file by build.zig instead of needing tracking or dependency code inside of the assembler. This is a great parallel (pun intended) to a Makefile with a bunch of objects built from asm.

I think it might even be able to integrate with incremental compilation; if we have our own syntax, we can just effectively pretend that a .s file is a .zig file with a single asm statement.

@ghost
Copy link
Author

ghost commented Dec 30, 2020

I think some of the specific syntax needs to be altered,

Which syntax, specifically?

@pixelherodev
Copy link
Contributor

@EleanorNB Actually, most of the syntax is fine on a second look, I think.

.pub sym, .use sym for sharing symbols within a compilation unit

This means multiple input asm files can be compiled into one compilation unit? and @use implies that an asm file can become part of a .zig compilation unit, as well?

@ghost
Copy link
Author

ghost commented Dec 30, 2020

That's right. Zig has multi-file compilation units, we might as well use them. (That's also the only way that .extern "lib" sym makes sense, at least until I make my own OS with a custom object format.)

@pixelherodev
Copy link
Contributor

If you're serious about making a new object format, it'd probably be easier to add support to an existing OS :P

@pixelherodev
Copy link
Contributor

@EleanorNB Would you intend macro support in the asm? If the goal is to be able to write asm directly, I think that makes sense (and a simple macro processor is pretty straightforward).

On the other hand, "macro support" could be a build.zig feature instead, using an extension to build.zig that parses the asm, performs macro replacement, and gives the resultant file as an input instead of the source. This removes the complexity from the compiler, while still allowing asm programmers to add support themselves in a manner best suited to their project.

It's worth noting that you could technically do that for Zig code as well, which could theoretically be used to replace some expensive comptime calls with cheap textual substitution. I can already hear people screaming in revulsion at the idea - but the point isn't, "hey, this is a good idea!" it's that it can be done independently of the compiler! The advantages of a native build system :)

@matu3ba
Copy link
Contributor

matu3ba commented Jan 5, 2021

@pixelherodev Some examples, how you intent the assembler to look and what parsing+checking synax+type complexity you intend, would be great.

There is a tradeoff between syntax+type complexity = compile-time speed, functional complexity and safety classes.

@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

4 participants