Skip to content

Conversation

mythmon
Copy link
Member

@mythmon mythmon commented Feb 27, 2024

This is an alternate approach to Windows support (#90), in contrast to #881. The two PRs share a lot of the common cross platform fixes, but differe in how paths are handled.

In #881, every path was "branded" as either a FilePath or a UrlPath, and there were helpers that helped convert to, from, and between these types. In this PR, every path is kept as a "posix" style path (with forward slashes), until we have to interact with an external system like writing to a file or calling rollup. As late as possible the path is converted to use the OS path format, which is a no-op on macOS and Linux.

Comparing the two, #881 changed 87 files, with a line change of +2020/-1264. This PR touched 68 files, with a line change of +1186/-646. All other things being equal, I like smaller changes. The line count of #881 is inflated, due to it often making lines longer and causing extra wrapping.

From a subjective point of view, the two approaches feel quite different to work with. When working with the branding path types, the issue of OS vs URL paths is felt everywhere, and needs to be constantly thought about. However, it is almost always an easy thing to figure out. With this PR, the internal consistency meant that (as long as you use the right imports) the thought about different paths is almost completely invisible. However, when it is felt, it is a lot harder to figure out, since there aren't any guard rails from the type system.

In this PR I also had to figure out to deal with Windows' driver letters, which #881 got to pretty much ignore. The approach I went with is to turn a Windows path like C:\Users\Amaya\ to /C:/Users/Amaya/. The file system handlers then have to know how to recognize that prefix and handle it appropriately. Getting that working correctly was about 1/4 of the time I spent on this project. I'm only confident I got it working due to good test coverage.

Assuming that CI passes here, my vote would be for this PR. I like the pattern of concentrating "the hard parts" into well defined interfaces instead of spreading it lighly over the whole code base. This is a leaky abstraction due to drive letters, but in the spirit of worse is better I think that it is the right trade off.

@mythmon mythmon requested review from Fil, mbostock and visnup February 27, 2024 01:07
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for exploring this pathway. To me this feels much better than #881.

It leaves open the question of tracking which path is a url or a file, but this could be done as a separate PR, and maybe is just an issue of how we name variables rather than typescripting it (??).

The part that feels redundant is the special handling of \r\n. In my opinion (and it's just a weak opinion at this point), we should not change anything here in the code and the data assets; and instead, make tests smarter (for instance, we don't care if a python data loader returns "python\r\n" or "python\n" — instead let's test that the trimmed output is "python").

# Ask git to automatically ensure that text files have posix-style line endings.
* text eol=lf

# Git can usually detect binary files, but lets mark the binary files we have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Git can usually detect binary files, but lets mark the binary files we have
# Git can usually detect binary files, but let’s mark the binary files we have

*.tgz binary
*.webp binary
*.xlsx binary
*.zip binary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't plan to run the repo under Windows, except for unit tests—and in that case I'd rather see a failure if something changes. Can we just remove the gitattributes file and live with whatever this results in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add this to support local development; it is required for Windows CI. I could remove the extra binary attributes and just trust Git to handle binary detection.

@mythmon mythmon requested a review from Fil February 28, 2024 17:18
@mythmon
Copy link
Member Author

mythmon commented Feb 28, 2024

Note that the "Waiting for status to be reported" state of CI is because I changed the names of the expected CI jobs. We'll need to have a sort of sea-change to merge this, where we update the CI to expect the new jobs.

One thing we might consider is to introduce a "Capstone" job which itself depends on all the needed jobs. That gives us an option to change CI without having to change Github settings in the future.

@mythmon mythmon force-pushed the mythmon/240222/normalize-paths branch from 3e21af1 to 0c33d22 Compare February 28, 2024 18:45
@mythmon
Copy link
Member Author

mythmon commented Mar 2, 2024

Closing in favor of #944.

@mythmon mythmon closed this Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants