Skip to content

AstGen: detect and error on files included in multiple packages #13670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 23, 2023

Conversation

mlugg
Copy link
Member

@mlugg mlugg commented Nov 27, 2022

Previously, if a source file was referenced from multiple packages, it just became owned by the first one AstGen happened to reach; this was a problem, because it could lead to inconsistent behaviour in the compiler based on a race condition. This could be fixed by just analyzing such files multiple times - however, it was pointed out by Andrew that it might make more sense to enforce files being part of at most a single package. Having a file in multiple packages would not only impact compile times (due to Sema having to run multiple times on potentially a lot of code) but is also a confusing anti-pattern which more often than not is a mistake on the part of the user.

This PR also changes how errors are formatted slightly. Previously, entire_file module errors were turned into plain errors for output, which caused the filename to be omitted (which was quite unhelpful). This tweaks them to be reported like any other error, just with no outputted source line. This only affects entire_file errors, which were just used for retryable AstGen/File errors.

Resolves: #13662

@matu3ba
Copy link
Contributor

matu3ba commented Nov 27, 2022

Not sure, if this is the correct place to discuss, but I found no issue on package semantics yet:

    1. Do symbols referencing another package (+ be used) need to construct a directed acyclic graph?
    1. What should happen, if a file is included as being part of one (smaller) package and in another as being part of one (bigger) package? This can happen, if a dependency of a dependency also uses/needs this package (for for example as smaller version) and if Zig intends to reduce package duplicates.

@Jarred-Sumner
Copy link
Contributor

a confusing anti-pattern which more often than not is a mistake on the part of the user.

Will this affect when package A depends on package B and package B depends on package A?

An example from Bun where interdependent packages are a useful language feature:

PathString.toJS function that relies on the "javascript_core" package, and the "javascript_core" package which relies on the "strings" package.
// macOS sets file path limit to 1024
// Since a pointer on x64 is 64 bits and only 46 bits are used
// We can safely store the entire path slice in a single u64.
pub const PathString = packed struct {
    const PathIntLen = std.math.IntFittingRange(0, bun.MAX_PATH_BYTES);
    pub const use_small_path_string = @bitSizeOf(usize) - @bitSizeOf(PathIntLen) >= 53;
    pub const PathInt = if (use_small_path_string) PathIntLen else usize;
    pub const PointerIntType = if (use_small_path_string) u53 else usize;
    ptr: PointerIntType = 0,
    len: PathInt = 0,

    const JSC = @import("javascript_core");
    pub fn fromJS(value: JSC.JSValue, global: *JSC.JSGlobalObject, exception: JSC.C.ExceptionRef) PathString {
        if (!value.jsType().isStringLike()) {
            JSC.JSError(JSC.getAllocator(global), "Only path strings are supported for now", .{}, global, exception);
            return PathString{};
        }
        var zig_str = JSC.ZigString.init("");
        value.toZigString(&zig_str, global);

        return PathString.init(zig_str.slice());
    }

    pub inline fn asRef(this: PathString) JSC.JSValueRef {
        return this.toValue().asObjectRef();
    }

    pub fn toJS(this: PathString, ctx: JSC.C.JSContextRef, _: JSC.C.ExceptionRef) JSC.C.JSValueRef {
        var zig_str = JSC.ZigString.init(this.slice());
        zig_str.detectEncoding();

        return zig_str.toValueAuto(ctx.ptr()).asObjectRef();
    }

    pub inline fn slice(this: anytype) string {
        @setRuntimeSafety(false); // "cast causes pointer to be null" is fine here. if it is null, the len will be 0.
        return @intToPtr([*]u8, @intCast(usize, this.ptr))[0..this.len];
    }

    pub inline fn sliceAssumeZ(this: anytype) stringZ {
        @setRuntimeSafety(false); // "cast causes pointer to be null" is fine here. if it is null, the len will be 0.
        return @intToPtr([*:0]u8, @intCast(usize, this.ptr))[0..this.len];
    }

    pub inline fn init(str: string) @This() {
        @setRuntimeSafety(false); // "cast causes pointer to be null" is fine here. if it is null, the len will be 0.

        return @This(){
            .ptr = @truncate(PointerIntType, @ptrToInt(str.ptr)),
            .len = @truncate(PathInt, str.len),
        };
    }

    pub inline fn isEmpty(this: anytype) bool {
        return this.len == 0;
    }

    pub const empty = @This(){ .ptr = 0, .len = 0 };
    comptime {
        if (!bun.Environment.isWasm) {
            if (use_small_path_string and @bitSizeOf(@This()) != 64) {
                @compileError("PathString must be 64 bits");
            } else if (!use_small_path_string and @bitSizeOf(@This()) != 128) {
                @compileError("PathString must be 128 bits");
            }
        }
    }
};

@mlugg
Copy link
Member Author

mlugg commented Nov 29, 2022

That use case should be fine - this PR doesn't make any package dependencies illegal, only multiple packages directly depending on the same source file.

@matklad
Copy link
Contributor

matklad commented Dec 6, 2022

Having a file in multiple packages would not only impact compile times (due to Sema having to run multiple times on potentially a lot of code) but is also a confusing anti-pattern which more often than not is a mistake on the part of the user.

A file in multiple packages would be pretty horrible for IDE support as well. In a correct IDE, the very first thing you'd ask to do, eg, completions in foo.zig is "what is the root file of the package foo.zig belongs too?", and if the answer is "multiple", IDE operations become somewhat ill-defined. Definitely have this problem in Rust.

@mlugg
Copy link
Member Author

mlugg commented Dec 6, 2022

Ooh, that's an interesting point too, hadn't considered that. This PR's basically good to go, I just need to ask a question in this Thursday's compiler meeting about how to restructure the compiler tests to make them actually work again

@matu3ba
Copy link
Contributor

matu3ba commented Dec 6, 2022

IDE operations become somewhat ill-defined. Definitely have this problem in Rust.

I do agree. Solving this would add complexity with additional lookup in metadata for every request. It sounds better to allow users to prune unused code with tooling/the package manager.

(Since the language semantics allow such analysis for most use cases.)

@mlugg
Copy link
Member Author

mlugg commented Dec 6, 2022

Not sure, if this is the correct place to discuss, but I found no issue on package semantics yet:

    1. Do symbols referencing another package (+ be used) need to construct a directed acyclic graph?
    1. What should happen, if a file is included as being part of one (smaller) package and in another as being part of one (bigger) package? This can happen, if a dependency of a dependency also uses/needs this package (for for example as smaller version) and if Zig intends to reduce package duplicates.

Oops, I meant to reply to this ages ago, I'll have a think once I'm home

@mlugg mlugg force-pushed the fix/astgen-ambiguous-package branch from 55b9fe9 to b871f2a Compare December 7, 2022 08:04
@matu3ba
Copy link
Contributor

matu3ba commented Dec 7, 2022

I think this deserves a regression test, as this is super annoying to track down if it regresses. But the decision is of course up to a core team member.

@mlugg
Copy link
Member Author

mlugg commented Dec 15, 2022

I was just looking at adding a test for this error, and I'm not really sure how to do it. It can't live under cases/compile_errors with the current structure, since TestContext is unaware of packages (src/test.zig just creates a main_pkg with an empty table), but it also can't be a standalone test since there's no way (that I can think of) to expect a compile error with those.

@andrewrk Is there a strategy I've not thought of, or should I try and extend TestContext.Case with the ability to have files in packages? I imagine it wouldn't be too difficult to get something that works - add something like a packages: std.ArrayList(struct { name: []const u8, root_src: []const u8 }) to Case and have the source files themselves be specified in files as usual.

@mlugg mlugg force-pushed the fix/astgen-ambiguous-package branch from b871f2a to 8401eff Compare December 25, 2022 11:04
@mlugg mlugg force-pushed the fix/astgen-ambiguous-package branch 2 times, most recently from e690afb to 7968d29 Compare January 22, 2023 16:33
mlugg added 2 commits January 22, 2023 19:00
By @Vexu's suggestion, since fetching the name from the parent package
is error-prone and complex, and optimising Package for size isn't really
a priority.
Previously, if a source file was referenced from multiple packages, it
just became owned by the first one AstGen happened to reach; this was a
problem, because it could lead to inconsistent behaviour in the compiler
based on a race condition. This could be fixed by just analyzing such
files multiple times - however, it was pointed out by Andrew that it
might make more sense to enforce files being part of at most a single
package. Having a file in multiple packages would not only impact
compile times (due to Sema having to run multiple times on potentially a
lot of code) but is also a confusing anti-pattern which more often than
not is a mistake on the part of the user.

Resolves: ziglang#13662
@mlugg mlugg force-pushed the fix/astgen-ambiguous-package branch from 7968d29 to 5f9186d Compare January 22, 2023 19:00
@Vexu Vexu merged commit 2200205 into ziglang:master Jan 23, 2023
@mlugg mlugg deleted the fix/astgen-ambiguous-package branch March 10, 2023 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

non-deterministic resolving of packages leads to build failures
5 participants