Skip to content

Fix ZipArchiveEntry names shown as corrupted on other zip programs #65886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,7 @@ internal static void AddEntry(ZipArchive archive, string name, string contents,
protected const string Utf8LowerCaseOUmlautChar = "\u00F6";
protected const string Utf8CopyrightChar = "\u00A9";
protected const string AsciiFileName = "file.txt";
protected const string UnicodeFileName = "\u4f60\u597D.txt";
// The o with umlaut is a character that exists in both latin1 and utf8
protected const string Utf8AndLatin1FileName = $"{Utf8LowerCaseOUmlautChar}.txt";
// emojis only make sense in utf8
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@ public partial class ZipArchiveEntry
private List<ZipGenericExtraField>? _lhUnknownExtraFields;
private byte[] _fileComment;
private readonly CompressionLevel? _compressionLevel;
private bool _hasUnicodeEntryNameOrComment;

// Initializes a ZipArchiveEntry instance for an existing archive entry.
internal ZipArchiveEntry(ZipArchive archive, ZipCentralDirectoryFileHeader cd)
Expand Down Expand Up @@ -84,8 +83,6 @@ internal ZipArchiveEntry(ZipArchive archive, ZipCentralDirectoryFileHeader cd)
_fileComment = cd.FileComment;

_compressionLevel = null;

_hasUnicodeEntryNameOrComment = (_generalPurposeBitFlag & BitFlagValues.UnicodeFileNameAndComment) != 0;
}

// Initializes a ZipArchiveEntry instance for a new archive entry with a specified compression level.
Expand Down Expand Up @@ -144,8 +141,6 @@ internal ZipArchiveEntry(ZipArchive archive, string entryName)
{
_archive.AcquireArchiveStream(this);
}

_hasUnicodeEntryNameOrComment = false;
}

/// <summary>
Expand Down Expand Up @@ -197,7 +192,11 @@ public string Comment
set
{
_fileComment = ZipHelper.GetEncodedTruncatedBytesFromString(value, _archive.EntryNameAndCommentEncoding, ushort.MaxValue, out bool isUTF8);
_hasUnicodeEntryNameOrComment |= isUTF8;

if (isUTF8)
{
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileNameAndComment;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, since you're avoiding removing the bit flag unexpectedly if filename already set it to true.

}
}
}

Expand All @@ -218,11 +217,19 @@ private set
ArgumentNullException.ThrowIfNull(value, nameof(FullName));

_storedEntryNameBytes = ZipHelper.GetEncodedTruncatedBytesFromString(
value, _archive.EntryNameAndCommentEncoding, 0 /* No truncation */, out bool hasUnicodeEntryName);
value, _archive.EntryNameAndCommentEncoding, 0 /* No truncation */, out bool isUTF8);

_hasUnicodeEntryNameOrComment |= hasUnicodeEntryName;
_storedEntryName = value;

if (isUTF8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand. When Utf8 is detected, we enable the Unicode flag?

Copy link
Contributor

@carlossanlop carlossanlop Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe the name of the flag should be utf8FileNameAndComment to be very clear.

Per the spec: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

 Bit 11: Language encoding flag (EFS).  If this bit is set,
                the filename and comment fields for this file
                MUST be encoded using UTF-8. (see APPENDIX D)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe the name of the flag should be utf8FileNameAndComment to be very clear.

That would definitely help ;) thanks for the pointer to the docs!

{
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileNameAndComment;
}
else
{
_generalPurposeBitFlag &= ~BitFlagValues.UnicodeFileNameAndComment;
}
Comment on lines +225 to +231
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that if the comment has utf8 characters, but the filename does not, now you're removing the utf8 flag.

My intention was to detect utf8 characters in either filename or comment. If one of the two had them, then the general purpose bit flag bit 11 would be turned on. Hence the |=.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably look like the code you added in the Comment setter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that the FullName setter is private, because the name is only set once, in the constructor. The user cannot change it. They have to create a new ZipArchiveEntry instance if they want to change the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm now that I think of it more, maybe it makes sense to set it to 0 or 1 here, since as I said, this is called in the constructor only, which means Comment has not modified this bit yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding/removing the flag was being performed before your changes so this is really going back to update _generalPurposeBitFlag in the ctor before it spreads.


DetectEntryNameVersion();
}
}
Expand Down Expand Up @@ -505,11 +512,6 @@ internal void WriteCentralDirectoryFileHeader()
extraFieldLength = (ushort)bigExtraFieldLength;
}

if (_hasUnicodeEntryNameOrComment)
_generalPurposeBitFlag |= BitFlagValues.UnicodeFileNameAndComment;
else
_generalPurposeBitFlag &= ~BitFlagValues.UnicodeFileNameAndComment;

Comment on lines -508 to -512
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with removing this if we are directly modifying the general purpose bit flag when detecting the comment or filename encoding.

writer.Write(ZipCentralDirectoryFileHeader.SignatureConstant); // Central directory file header signature (4 bytes)
writer.Write((byte)_versionMadeBySpecification); // Version made by Specification (version) (1 byte)
writer.Write((byte)CurrentZipPlatform); // Version made by Compatibility (type) (1 byte)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -554,7 +554,7 @@ internal struct ZipEndOfCentralDirectoryBlock
public uint OffsetOfStartOfCentralDirectoryWithRespectToTheStartingDiskNumber;
public byte[] ArchiveComment;

public static void WriteBlock(Stream stream, long numberOfEntries, long startOfCentralDirectory, long sizeOfCentralDirectory, byte[]? archiveComment)
public static void WriteBlock(Stream stream, long numberOfEntries, long startOfCentralDirectory, long sizeOfCentralDirectory, byte[] archiveComment)
{
BinaryWriter writer = new BinaryWriter(stream);

Expand All @@ -574,10 +574,10 @@ public static void WriteBlock(Stream stream, long numberOfEntries, long startOfC
writer.Write(startOfCentralDirectoryTruncated);

// Should be valid because of how we read archiveComment in TryReadBlock:
Debug.Assert((archiveComment == null) || (archiveComment.Length <= ZipFileCommentMaxLength));
Debug.Assert(archiveComment.Length <= ZipFileCommentMaxLength);

writer.Write(archiveComment != null ? (ushort)archiveComment.Length : (ushort)0); // zip file comment length
if (archiveComment != null)
writer.Write((ushort)archiveComment.Length); // zip file comment length
if (archiveComment.Length > 0)
writer.Write(archiveComment);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,21 @@ public static void CreateNormal_VerifyDataDescriptor()
AssertDataDescriptor(memoryStream, false);
}

[Theory]
[InlineData(UnicodeFileName, UnicodeFileName, true)]
[InlineData(UnicodeFileName, AsciiFileName, true)]
[InlineData(AsciiFileName, UnicodeFileName, true)]
[InlineData(AsciiFileName, AsciiFileName, false)]
public static void CreateNormal_VerifyUnicodeFileNameAndComment(string fileName, string entryComment, bool isUnicodeFlagExpected)
{
using var ms = new MemoryStream();
using var archive = new ZipArchive(ms, ZipArchiveMode.Create);

CreateEntry(archive, fileName, fileContents: "xxx", entryComment);

AssertUnicodeFileNameAndComment(ms, isUnicodeFlagExpected);
}

[Fact]
public static void CreateNormal_With2SameEntries_ThrowException()
{
Expand All @@ -198,11 +213,12 @@ private static string ReadStringFromSpan(Span<byte> input)
return Text.Encoding.UTF8.GetString(input.ToArray());
}

private static void CreateEntry(ZipArchive archive, string fileName, string fileContents)
private static void CreateEntry(ZipArchive archive, string fileName, string fileContents, string entryComment = null)
{
ZipArchiveEntry entry = archive.CreateEntry(fileName);
using StreamWriter writer = new StreamWriter(entry.Open());
writer.Write(fileContents);
entry.Comment = entryComment;
}

private static void AssertDataDescriptor(MemoryStream memoryStream, bool hasDataDescriptor)
Expand All @@ -211,5 +227,12 @@ private static void AssertDataDescriptor(MemoryStream memoryStream, bool hasData
Assert.Equal(hasDataDescriptor ? 8 : 0, fileBytes[6]);
Assert.Equal(0, fileBytes[7]);
}

private static void AssertUnicodeFileNameAndComment(MemoryStream memoryStream, bool isUnicodeFlagExpected)
{
byte[] fileBytes = memoryStream.ToArray();
Assert.Equal(0, fileBytes[6]);
Assert.Equal(isUnicodeFlagExpected ? 8 : 0, fileBytes[7]);
}
}
}