Skip to content

Store tasks by path instead of file name #1535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

gabritto
Copy link
Member

@gabritto gabritto commented Aug 7, 2025

This fixes the following crash found while building an internal repo:

panic: ManyToManySet.Set: key already exists: c:/users/gabrielaa/code/.../removable.d.ts

goroutine 1 [running]:
github.com/microsoft/typescript-go/internal/collections.(*ManyToManySet[...]).Set(0xa65e80, {0xc002fc0700, 0x5d}, 0xc000c07800)
        C:/Users/gabrielaa/code/typescript-go/internal/collections/manytomanyset.go:31 +0x33a
github.com/microsoft/typescript-go/internal/incremental.newSnapshotForProgram(0xc0000036c0, 0x0, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/incremental/snapshot.go:373 +0x572
github.com/microsoft/typescript-go/internal/incremental.NewProgram(0xc0000036c0, 0x0, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/incremental/program.go:36 +0x25
github.com/microsoft/typescript-go/internal/execute.performIncrementalCompilation({0xa54370, 0xc0000ceba0}, 0xc0000e61e0, 0xc0001f4e00, 0xc000037d10, 0x36d6ac, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/execute/tsc.go:247 +0x377
github.com/microsoft/typescript-go/internal/execute.tscCompilation({0xa54370, 0xc0000ceba0}, 0xc0000e60f0, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/execute/tsc.go:194 +0xb53
github.com/microsoft/typescript-go/internal/execute.CommandLine({0xa54370, 0xc0000ceba0}, {0xc00008a150, 0x2, 0x3}, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/execute/tsc.go:63 +0x17e
main.runMain()
        C:/Users/gabrielaa/code/typescript-go/cmd/tsgo/main.go:23 +0x109
main.main()
        C:/Users/gabrielaa/code/typescript-go/cmd/tsgo/main.go:10 +0x13

The issue was that in the file loader, we had a map that stored tasks by their file name, but that was insufficient to guarantee that we'd only have one task per file, because in a case insensitive file system, it was possible for the same file to be included in the list of tasks with two names differing only by case (see new test in tsc_test.go). That would later result in the crash above when processing the same file twice to build a snapshot. The fix is to map tasks by path instead of name, which accounts for case sensitiveness.

@Copilot Copilot AI review requested due to automatic review settings August 7, 2025 18:44
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a crash in the incremental compiler that occurred when the same file was referenced with different casing on case-insensitive file systems. The fix changes the file loader to store tasks by their normalized path instead of file name, ensuring unique file handling.

  • Fixed panic in ManyToManySet.Set when duplicate file entries existed due to case differences
  • Updated task storage from filename-based to path-based mapping throughout the file loading system
  • Added comprehensive test case to verify proper handling of case-insensitive file names

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
testdata/baselines/reference/tsc/commandLine/Compile-incremental-with-case-insensitive-file-names.js New baseline test output for case-insensitive file name compilation
internal/execute/tsctestrunner_test.go Added case sensitivity configuration support to test runner
internal/execute/tsc_test.go Added test case demonstrating the case-insensitive file name bug fix
internal/execute/testsys_test.go Updated test system constructor to accept case sensitivity parameter
internal/compiler/projectreferenceparsetask.go Changed task identification from filename to path-based
internal/compiler/projectreferencefilemapper.go Updated references to use path-based task lookup
internal/compiler/parsetask.go Enhanced parse tasks to store and use normalized paths
internal/compiler/fileloadertask.go Changed worker task interface and storage from filename to path-based
internal/compiler/fileloader.go Updated all task creation to include path information

@jakebailey
Copy link
Member

The issue was that in the file loader, we had a map that stored tasks by their file name, but that was insufficient to guarantee that we'd only have one task per file,

This is by design though, no? Otherwise we can't detect when the case mismatch happens and report the error? What did Strada do here?

files FileMap
cwd string
edits []*testTscEdit
useCaseInsensitiveFileNames bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This casing is backwards, but I guess this is to reduce how much true we write? (we normally just write true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignoreCase ?

export const foo2: Foo2 = { foo: "b" };
//// [/home/project/tsconfig.json] *new*
{
"compilerOptions": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder why its showing like this

Copy link
Member

@sheetalkamat sheetalkamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from the suggestion to rename it as ignoreCase on test input

@sheetalkamat
Copy link
Member

The issue was that in the file loader, we had a map that stored tasks by their file name, but that was insufficient to guarantee that we'd only have one task per file,

This is by design though, no? Otherwise we can't detect when the case mismatch happens and report the error? What did Strada do here?

That needs to happen at reporting time though not when parsing the file - file should be parsed and added only once

@jakebailey
Copy link
Member

Race tests are failing, so that will have to get fixed.

Copy link
Member

@jakebailey jakebailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Marking as needs review just to note it, since a rerun could possibly "pass" but be missing the right changes)

@sheetalkamat
Copy link
Member

Is it because you are setting t.path by default now but have not deleted t.path setting in parseTask.load

@gabritto gabritto requested a review from jakebailey August 7, 2025 21:27
@jakebailey
Copy link
Member

Race mode still fails, seemingly due to a logical race because it's nondeterministic which case wins.

@sheetalkamat
Copy link
Member

you want to use file.Path() in baselineProgram so the casing is always correct

@jakebailey
Copy link
Member

This failure is largely why I thought deduping via Path would be wrong, because you can't settle ties consistently on the fly without ending up in a situation where you'd have to swap one file out for another, which could have entirely different imports and therefore you'd need to undo any work from the "wrongly cased" file, which is a nightmare.

@jakebailey
Copy link
Member

I guess doing Path for the printout papers over it a bit, but I suspect there's still a logical race here if the two loaded files have totally different contents.

@gabritto
Copy link
Member Author

gabritto commented Aug 7, 2025

I guess doing Path for the printout papers over it a bit, but I suspect there's still a logical race here if the two loaded files have totally different contents.

They can't have different contents because the file system is case insensitive, no? So they're same file in the FS, it is just the compiler who in some places thinks they're different.

@jakebailey
Copy link
Member

jakebailey commented Aug 7, 2025

Oh, you're right, yes. My mistake.

Though it is still a race which case gets "preserved", because the two different FileNames require two different SourceFiles. Do we pick a winner consistently for that case?

@sheetalkamat
Copy link
Member

I guess doing Path for the printout papers over it a bit, but I suspect there's still a logical race here if the two loaded files have totally different contents.

How? for case insensitve files the contents will match - as you said the work for imports is different story !!! unfortunately we are not yet reporting errors for those so we will not fail for those

@gabritto
Copy link
Member Author

gabritto commented Aug 7, 2025

Oh, you're right, yes. My mistake.

Though it is still a race which case gets "preserved", because the two different FileNames require two different SourceFiles. Do we pick a winner consistently for that case?

I don't know that we do. Would it matter? I think the only different thing is the file name itself, but I'd hope we're using path instead in the places where this can matter.

@jakebailey
Copy link
Member

jakebailey commented Aug 7, 2025

It's used all over the place, whenever a string has to be shown to a user for sure but elsewhere too. The Path is supposed to be an internal detail used only for map keys and so on. Basically everything else uses the FileName.

@jakebailey
Copy link
Member

If we are deduping during file loading it's going to be fine to swap out the source file; whenever we reencounter a task with a different path, we can just pick the min of them or something and swap out the SourceFile. I don't think we hold a reference to anything else from there, and since the contents match, everything else should work.

@sheetalkamat
Copy link
Member

But as you mentioned its not just "file" but the files included by import, triple slash etc as well right ? We need to go fix up file paths in all of those as well right? (eg if directory name casing differs)

@jakebailey
Copy link
Member

That sounds right, but I don't see how we get away with not doing that, because there's no guarantee after this PR that things are handled in a consistent order that won't eventually break some test when run concurrently, no?

@gabritto
Copy link
Member Author

gabritto commented Aug 7, 2025

If we are deduping during file loading it's going to be fine to swap out the source file; whenever we reencounter a task with a different path, we can just pick the min of them or something and swap out the SourceFile. I don't think we hold a reference to anything else from there, and since the contents match, everything else should work.

I can update this PR to update the normalized file path with the smaller option when we find a repeated parser task (and maybe do the same for project reference tasks), but I guess that wouldn't be enough and right now I am not sure I can tell all of the places that would need updating, and that may be too much work. To me file names should really only be used as a "preferred name" to display things to users.

The alternative to this PR would be to make snapshot creation sort of go in the opposite direction and use file names. That way it would also have more than one file that only differs by name and are really the same file.

@jakebailey
Copy link
Member

I was going to say we could do a cleanup pass, noting the conflicts (we have to report them anyway) and fixing them later, but we do store off actual AST nodes like jsxRuntimeImportSpecifiers and importHelpersImportSpecifiers.

Though arguably, those two are "weird" and nobody likes that we use AST nodes to store this information.

@sheetalkamat
Copy link
Member

I was going to say we could do a cleanup pass, noting the conflicts (we have to report them anyway) and fixing them later, but we do store off actual AST nodes like jsxRuntimeImportSpecifiers and importHelpersImportSpecifiers.

Though arguably, those two are "weird" and nobody likes that we use AST nodes to store this information.

Even if we were not storing those imports which are easy to recreate btw, the problem is multi fold

  • if you parse file without considering "path" you are going to be parsing file and its import tree multiple times (depending on if the casing was problematic in directory or drive letter)
  • if you use "path" as your key and parse only once - recreating just file or even those import specifier stuff is not enough - what happens to files that get imported - we would need to fix casing for those (to avoid race) and then re-check if file was referenced and report error . this will get out of hand quickly.

so may be deduplicating though can be expensive in those error scenarios might be better from complexity perspective.
and while i am writing this, i remember not all scenarios report error either - you could have drive letter different and its then we will not report error but we could end up doing too much duplicate work

can we always refer file by "name" on the disk = realpath that way we will have consistency all the time and the only record we need to keep is how the file was referenced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants