-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Labels
A-filesystemArea: issues with filesystemsArea: issues with filesystemsC-bugCategory: bugCategory: bugCommand-packageO-macosOS: macOSOS: macOSO-windowsOS: WindowsOS: WindowsS-triageStatus: This issue is waiting on initial triage.Status: This issue is waiting on initial triage.
Description
Problem
It seems that Cargo is excluding an already-packaged files using exact name comparison, which doesn't always match how the file system sees name equality.
Archiving Cargo.lock
Archiving Cargo.toml
Archiving Cargo.toml.orig
Archiving README.md
Archiving readme.Md
Archiving src/main.rs
Example crate:
https://docs.rs/crate/rosu/0.6.0/source/
Steps
[package]
…
readme = "README.md"
echo case > readme.Md
cargo package
The same applies to license-file
, cargo.lock
.
Possible Solution(s)
Theoretically there could be other gotchas of this kind, e.g. HFS+ file system on macOS forces file names to use NFD Unicode form, while most text has NFC form, which makes codepoint-by-codepoint comparisons not equal. However HFS+ is on its way out, so perhaps a simple case-insensitive comparison will suffice.
Notes
No response
Version
cargo 1.79.0-nightly (499a61ce7 2024-03-26)
demurgos
Metadata
Metadata
Assignees
Labels
A-filesystemArea: issues with filesystemsArea: issues with filesystemsC-bugCategory: bugCategory: bugCommand-packageO-macosOS: macOSOS: macOSO-windowsOS: WindowsOS: WindowsS-triageStatus: This issue is waiting on initial triage.Status: This issue is waiting on initial triage.
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
[-]Cargo packages duplicate README on case-insensitive file systems[/-][+]Cargo packages duplicate files on case-insensitive file systems[/+]kornelski commentedon Apr 7, 2024
In the same vein, if there's a
TARGET/
directory, it doesn't get excluded when packaging.heisen-li commentedon Apr 9, 2024
@rustbot label Command-package
VorpalBlade commentedon Apr 23, 2024
While that sounds lovely, in what locale? For the languages I speak it is relatively straight forward, but my understanding is that case handling is lossy in some languages, such as German (ẞ is Ss in upper case I think?) and Turkish (I believe they have the letter "i" both with and without a dot, and the uppper/lower case there isn't straight forward, but don't ask me how exactly).
As a Swedish/English speaker this is all hearsay though, and I don't know how e.g. Windows or Mac OS handle these, though I think I heard that NTFS store a case normalisation table at file system creation time based on the locale set at that point?
ChrisDenton commentedon Apr 23, 2024
On Windows, the NTFS up case table is initialized when the drive is first formatted. So it'll depend on the Windows version that did that. It is however language neutral and only acts on the Basic Multilingual Plane.
Also, depending on the configuration, NTFS can be case sensitive. In Windows this can even be set differently for each directory.
VorpalBlade commentedon Apr 23, 2024
Hm, maybe I'm thinking of FAT and Windows 9x then? Pretty sure things differed depending on code pages and such there. Not sure how modern OSes interacting with FAT32/exFAT works with that. Hopefully it is somewhat sane on any Windows version Rust still supports.
ChrisDenton commentedon Apr 23, 2024
Ah yes FAT32 is indeed a mess. But then I'm also not sure how well Cargo and rustc support it as it lacks a lot of filesystem features that may be expected. Probably it does at least work if it's only read from (e.g. the target directory is on another drive).
kornelski commentedon Apr 24, 2024
It is a messy problem, but fortunately the detection algorithm doesn't need to produce user-facing text, so it doesn't need to be perfect from linguistic perspective. It only needs to detect potential collisions between file names. Crates that work with only a specific combination of Windows locale and NTFS vintage are not generally useful, so the detection can also err on the side of over-normalizing (e.g. normalize all dotless
ı
's toi
, forbid all control characters, check against both lower and upper case, treat codepoints with multiple transliterations/decompositions as a wildcard, etc.).but for a start, even a simple
.lowercase()
will handle more than enough for the accidental variations ofReadme.Md
.cargo package
includes multiple README files when the case is not all upper-case #14020cargoHash
might be different on linux and darwin systems NixOS/nixpkgs#308089