-
Notifications
You must be signed in to change notification settings - Fork 5k
Fix ZipArchiveEntry names shown as corrupted on other zip programs #65886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @dotnet/area-system-io-compression Issue DetailsFixes #65750 runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs Line 874 in afc2f11
A test may or may not be required as .NET is resilient to the error reported, I think is because it decodes with UTF8 regardless. cc @GrabYourPitchforks.
|
_storedEntryName = value; | ||
|
||
if (isUTF8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I understand. When Utf8 is detected, we enable the Unicode flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe the name of the flag should be utf8FileNameAndComment
to be very clear.
Per the spec: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
Bit 11: Language encoding flag (EFS). If this bit is set,
the filename and comment fields for this file
MUST be encoded using UTF-8. (see APPENDIX D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe the name of the flag should be utf8FileNameAndComment to be very clear.
That would definitely help ;) thanks for the pointer to the docs!
{ | ||
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileNameAndComment; | ||
} | ||
else | ||
{ | ||
_generalPurposeBitFlag &= ~BitFlagValues.UnicodeFileNameAndComment; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if the comment has utf8 characters, but the filename does not, now you're removing the utf8 flag.
My intention was to detect utf8 characters in either filename or comment. If one of the two had them, then the general purpose bit flag bit 11 would be turned on. Hence the |=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably look like the code you added in the Comment
setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that the FullName
setter is private, because the name is only set once, in the constructor. The user cannot change it. They have to create a new ZipArchiveEntry
instance if they want to change the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm now that I think of it more, maybe it makes sense to set it to 0 or 1 here, since as I said, this is called in the constructor only, which means Comment has not modified this bit yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding/removing the flag was being performed before your changes so this is really going back to update _generalPurposeBitFlag in the ctor before it spreads.
|
||
if (isUTF8) | ||
{ | ||
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileNameAndComment; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, since you're avoiding removing the bit flag unexpectedly if filename already set it to true.
if (_hasUnicodeEntryNameOrComment) | ||
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileNameAndComment; | ||
else | ||
_generalPurposeBitFlag &= ~BitFlagValues.UnicodeFileNameAndComment; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with removing this if we are directly modifying the general purpose bit flag when detecting the comment or filename encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense. Just one request before approving it: Can you generate a few zip files, and then open them with 7-zip or winzip to verify they are correct? (both apps show entry comments IIRC, I don't remember if the Windows File Explorer does).
Create two zips where you pass the UTF8 encoding to the constructor, then:
- One zip file where an entry has utf8 characters in the name, but none in the comments.
- Another zip file where the comment has utf8 characters in the name, but none in the filename.
Create two zips like the ones above, but instead of UTF8, you pass Latin1 encoding to the constructor.
@carlossanlop I tried that locally and verified filename and comment were shown correct on 7zip, winzip doesn't show comments for UTF8. FWIW this is the code I used: [Theory]
[InlineData("folder-with-unicode-filenames", true)]
[InlineData("folder-with-ascii-filenames", false)]
public static void test(string test, bool unicodeFileName)
{
foreach (var encoding in new[] { Encoding.UTF8, Encoding.Latin1 })
{
var pwd = @"C:\Users\david\Desktop\zip-file-main\" + test;
using var stream = File.Create($@"C:\Users\david\Desktop\zip-file-main\out22-{test}-{(unicodeFileName ? "unicodeFileName" : "unicodeComment")}-{encoding.EncodingName}.zip");
var directory = new DirectoryInfo(pwd);
using (var zip = new ZipArchive(stream, ZipArchiveMode.Create, true, encoding))
{
foreach (FileInfo file in directory.EnumerateFiles("*", SearchOption.AllDirectories))
{
string relativePath = file.FullName.Substring(pwd.Length + 1); // +1 prevents it from including the leading backslash
string zipEntryName = relativePath.Replace('\\', '/'); // Normalize slashes
ZipArchiveEntry entry = zip.CreateEntryFromFile(file.FullName, zipEntryName);
entry.Comment = unicodeFileName ? "Comment" : "你好";// If Filename is unicode, use an ascii comment and viceversa.
}
}
}
} |
This fix is blocking dotnet/aspnetcore from updating to the latest SDK. It would be good to make some progress towards getting it merged to unblock us. |
There were many unrelated failures in Edit: They failed again, but other CI legs passed. I'll open an issue to investigate the CI failures and will merge this.
|
Fixes #65750
Due to changes in #59442, _generalPurposeBitFlag didn't have the proper value when it was written here:
runtime/src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Line 874 in afc2f11
A test may or may not be required as .NET is resilient to the error reported, I think is because it decodes with UTF8 regardless. cc @GrabYourPitchforks.