Store tasks by path instead of file name #1535

gabritto · 2025-08-07T18:44:48Z

This fixes the following crash found while building an internal repo:

panic: ManyToManySet.Set: key already exists: c:/users/gabrielaa/code/.../removable.d.ts

goroutine 1 [running]:
github.com/microsoft/typescript-go/internal/collections.(*ManyToManySet[...]).Set(0xa65e80, {0xc002fc0700, 0x5d}, 0xc000c07800)
        C:/Users/gabrielaa/code/typescript-go/internal/collections/manytomanyset.go:31 +0x33a
github.com/microsoft/typescript-go/internal/incremental.newSnapshotForProgram(0xc0000036c0, 0x0, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/incremental/snapshot.go:373 +0x572
github.com/microsoft/typescript-go/internal/incremental.NewProgram(0xc0000036c0, 0x0, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/incremental/program.go:36 +0x25
github.com/microsoft/typescript-go/internal/execute.performIncrementalCompilation({0xa54370, 0xc0000ceba0}, 0xc0000e61e0, 0xc0001f4e00, 0xc000037d10, 0x36d6ac, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/execute/tsc.go:247 +0x377
github.com/microsoft/typescript-go/internal/execute.tscCompilation({0xa54370, 0xc0000ceba0}, 0xc0000e60f0, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/execute/tsc.go:194 +0xb53
github.com/microsoft/typescript-go/internal/execute.CommandLine({0xa54370, 0xc0000ceba0}, {0xc00008a150, 0x2, 0x3}, 0x0)
        C:/Users/gabrielaa/code/typescript-go/internal/execute/tsc.go:63 +0x17e
main.runMain()
        C:/Users/gabrielaa/code/typescript-go/cmd/tsgo/main.go:23 +0x109
main.main()
        C:/Users/gabrielaa/code/typescript-go/cmd/tsgo/main.go:10 +0x13

The issue was that in the file loader, we had a map that stored tasks by their file name, but that was insufficient to guarantee that we'd only have one task per file, because in a case insensitive file system, it was possible for the same file to be included in the list of tasks with two names differing only by case (see new test in tsc_test.go). That would later result in the crash above when processing the same file twice to build a snapshot. The fix is to map tasks by path instead of name, which accounts for case sensitiveness.

Copilot

Pull Request Overview

This PR fixes a crash in the incremental compiler that occurred when the same file was referenced with different casing on case-insensitive file systems. The fix changes the file loader to store tasks by their normalized path instead of file name, ensuring unique file handling.

Fixed panic in ManyToManySet.Set when duplicate file entries existed due to case differences
Updated task storage from filename-based to path-based mapping throughout the file loading system
Added comprehensive test case to verify proper handling of case-insensitive file names

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`testdata/baselines/reference/tsc/commandLine/Compile-incremental-with-case-insensitive-file-names.js`	New baseline test output for case-insensitive file name compilation
`internal/execute/tsctestrunner_test.go`	Added case sensitivity configuration support to test runner
`internal/execute/tsc_test.go`	Added test case demonstrating the case-insensitive file name bug fix
`internal/execute/testsys_test.go`	Updated test system constructor to accept case sensitivity parameter
`internal/compiler/projectreferenceparsetask.go`	Changed task identification from filename to path-based
`internal/compiler/projectreferencefilemapper.go`	Updated references to use path-based task lookup
`internal/compiler/parsetask.go`	Enhanced parse tasks to store and use normalized paths
`internal/compiler/fileloadertask.go`	Changed worker task interface and storage from filename to path-based
`internal/compiler/fileloader.go`	Updated all task creation to include path information

jakebailey · 2025-08-07T18:46:58Z

The issue was that in the file loader, we had a map that stored tasks by their file name, but that was insufficient to guarantee that we'd only have one task per file,

This is by design though, no? Otherwise we can't detect when the case mismatch happens and report the error? What did Strada do here?

jakebailey · 2025-08-07T18:48:19Z

internal/execute/tsctestrunner_test.go

+	files                       FileMap
+	cwd                         string
+	edits                       []*testTscEdit
+	useCaseInsensitiveFileNames bool


This casing is backwards, but I guess this is to reduce how much true we write? (we normally just write true)

ignoreCase ?

sheetalkamat · 2025-08-07T19:22:58Z

.../baselines/reference/tsc/commandLine/Compile-incremental-with-case-insensitive-file-names.js

+export const foo2: Foo2 = { foo: "b" };
+//// [/home/project/tsconfig.json] *new* 
+        {
+"compilerOptions": {


wonder why its showing like this

sheetalkamat

Looks good apart from the suggestion to rename it as ignoreCase on test input

sheetalkamat · 2025-08-07T19:27:33Z

The issue was that in the file loader, we had a map that stored tasks by their file name, but that was insufficient to guarantee that we'd only have one task per file,

This is by design though, no? Otherwise we can't detect when the case mismatch happens and report the error? What did Strada do here?

That needs to happen at reporting time though not when parsing the file - file should be parsed and added only once

jakebailey · 2025-08-07T20:38:34Z

Race tests are failing, so that will have to get fixed.

jakebailey

(Marking as needs review just to note it, since a rerun could possibly "pass" but be missing the right changes)

sheetalkamat · 2025-08-07T20:42:33Z

Is it because you are setting t.path by default now but have not deleted t.path setting in parseTask.load

jakebailey · 2025-08-07T21:36:56Z

Race mode still fails, seemingly due to a logical race because it's nondeterministic which case wins.

sheetalkamat · 2025-08-07T21:38:43Z

you want to use file.Path() in baselineProgram so the casing is always correct

jakebailey · 2025-08-07T21:39:42Z

This failure is largely why I thought deduping via Path would be wrong, because you can't settle ties consistently on the fly without ending up in a situation where you'd have to swap one file out for another, which could have entirely different imports and therefore you'd need to undo any work from the "wrongly cased" file, which is a nightmare.

jakebailey · 2025-08-07T21:40:33Z

I guess doing Path for the printout papers over it a bit, but I suspect there's still a logical race here if the two loaded files have totally different contents.

gabritto · 2025-08-07T21:41:52Z

I guess doing Path for the printout papers over it a bit, but I suspect there's still a logical race here if the two loaded files have totally different contents.

They can't have different contents because the file system is case insensitive, no? So they're same file in the FS, it is just the compiler who in some places thinks they're different.

jakebailey · 2025-08-07T21:42:51Z

Oh, you're right, yes. My mistake.

Though it is still a race which case gets "preserved", because the two different FileNames require two different SourceFiles. Do we pick a winner consistently for that case?

sheetalkamat · 2025-08-07T21:43:21Z

I guess doing Path for the printout papers over it a bit, but I suspect there's still a logical race here if the two loaded files have totally different contents.

How? for case insensitve files the contents will match - as you said the work for imports is different story !!! unfortunately we are not yet reporting errors for those so we will not fail for those

gabritto · 2025-08-07T21:44:52Z

Oh, you're right, yes. My mistake.

Though it is still a race which case gets "preserved", because the two different FileNames require two different SourceFiles. Do we pick a winner consistently for that case?

I don't know that we do. Would it matter? I think the only different thing is the file name itself, but I'd hope we're using path instead in the places where this can matter.

jakebailey · 2025-08-07T21:49:57Z

It's used all over the place, whenever a string has to be shown to a user for sure but elsewhere too. The Path is supposed to be an internal detail used only for map keys and so on. Basically everything else uses the FileName.

jakebailey · 2025-08-07T21:53:31Z

If we are deduping during file loading it's going to be fine to swap out the source file; whenever we reencounter a task with a different path, we can just pick the min of them or something and swap out the SourceFile. I don't think we hold a reference to anything else from there, and since the contents match, everything else should work.

sheetalkamat · 2025-08-07T22:09:12Z

But as you mentioned its not just "file" but the files included by import, triple slash etc as well right ? We need to go fix up file paths in all of those as well right? (eg if directory name casing differs)

jakebailey · 2025-08-07T22:14:49Z

That sounds right, but I don't see how we get away with not doing that, because there's no guarantee after this PR that things are handled in a consistent order that won't eventually break some test when run concurrently, no?

gabritto · 2025-08-07T22:29:28Z

If we are deduping during file loading it's going to be fine to swap out the source file; whenever we reencounter a task with a different path, we can just pick the min of them or something and swap out the SourceFile. I don't think we hold a reference to anything else from there, and since the contents match, everything else should work.

I can update this PR to update the normalized file path with the smaller option when we find a repeated parser task (and maybe do the same for project reference tasks), but I guess that wouldn't be enough and right now I am not sure I can tell all of the places that would need updating, and that may be too much work. To me file names should really only be used as a "preferred name" to display things to users.

The alternative to this PR would be to make snapshot creation sort of go in the opposite direction and use file names. That way it would also have more than one file that only differs by name and are really the same file.

jakebailey · 2025-08-07T23:48:00Z

I was going to say we could do a cleanup pass, noting the conflicts (we have to report them anyway) and fixing them later, but we do store off actual AST nodes like jsxRuntimeImportSpecifiers and importHelpersImportSpecifiers.

Though arguably, those two are "weird" and nobody likes that we use AST nodes to store this information.

sheetalkamat · 2025-08-08T06:29:28Z

I was going to say we could do a cleanup pass, noting the conflicts (we have to report them anyway) and fixing them later, but we do store off actual AST nodes like jsxRuntimeImportSpecifiers and importHelpersImportSpecifiers.

Though arguably, those two are "weird" and nobody likes that we use AST nodes to store this information.

Even if we were not storing those imports which are easy to recreate btw, the problem is multi fold

if you parse file without considering "path" you are going to be parsing file and its import tree multiple times (depending on if the casing was problematic in directory or drive letter)
if you use "path" as your key and parse only once - recreating just file or even those import specifier stuff is not enough - what happens to files that get imported - we would need to fix casing for those (to avoid race) and then re-check if file was referenced and report error . this will get out of hand quickly.

so may be deduplicating though can be expensive in those error scenarios might be better from complexity perspective.
and while i am writing this, i remember not all scenarios report error either - you could have drive letter different and its then we will not report error but we could end up doing too much duplicate work

can we always refer file by "name" on the disk = realpath that way we will have consistency all the time and the only record we need to keep is how the file was referenced?

use path for task to fix crash

52e7108

Copilot AI review requested due to automatic review settings August 7, 2025 18:44

Merge branch 'main' into gabritto/taskpath

5ff2ee0

Copilot AI reviewed Aug 7, 2025

View reviewed changes

gabritto requested review from sheetalkamat and jakebailey August 7, 2025 18:45

jakebailey reviewed Aug 7, 2025

View reviewed changes

sheetalkamat reviewed Aug 7, 2025

View reviewed changes

sheetalkamat approved these changes Aug 7, 2025

View reviewed changes

rename property to ignoreCase and update baselines

d22b5db

sheetalkamat approved these changes Aug 7, 2025

View reviewed changes

jakebailey requested changes Aug 7, 2025

View reviewed changes

Merge branch 'main' into gabritto/taskpath

a597313

gabritto requested a review from jakebailey August 7, 2025 21:27

use path in test baselining

791c7a2

gabritto mentioned this pull request Aug 8, 2025

Crash in incremental due to inconsistent casing #1549

Open

Store tasks by path instead of file name #1535

Are you sure you want to change the base?

Store tasks by path instead of file name #1535

Uh oh!

Conversation

gabritto commented Aug 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

jakebailey Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

sheetalkamat Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

sheetalkamat Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

sheetalkamat left a comment

Choose a reason for hiding this comment

Uh oh!

sheetalkamat commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

jakebailey left a comment

Choose a reason for hiding this comment

Uh oh!

sheetalkamat commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

sheetalkamat commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

gabritto commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sheetalkamat commented Aug 7, 2025

Uh oh!

gabritto commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

sheetalkamat commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

gabritto commented Aug 7, 2025

Uh oh!

jakebailey commented Aug 7, 2025

Uh oh!

sheetalkamat commented Aug 8, 2025

Uh oh!

Uh oh!

jakebailey commented Aug 7, 2025 •

edited

Loading

jakebailey commented Aug 7, 2025 •

edited

Loading