This project demonstrates a bug in SixLabors.ImageSharp where Unicode text written to the EXIF UserComment
tag (0x9286) is encoded using UTF-16LE instead of UTF-16BE as required by the EXIF specification.
According to the EXIF specification (JEITA CP-3451, section 4.6.5 "User Comments"), when using Unicode encoding for the UserComment
tag, the data should start with an 8-byte identifier UNICODE\0
followed by the comment text encoded in UTF-16BE (Big Endian).
However, ImageSharp appears to encode the text payload using UTF-16LE (Little Endian), while still using the correct UNICODE\0
prefix. This causes the UserComment
to be misinterpreted by other EXIF readers that strictly follow the specification.
- Configuration: Sets input (
image.webp
), output (output.webp
) paths, and the comment string ("Hello, World! こんにちわ世界"
) containing both ASCII and non-ASCII characters to ensure Unicode encoding is triggered. - Image Processing (ImageSharp):
- Loads the input image (
image.webp
). - Gets or creates an
ExifProfile
. - Sets the
UserComment
tag usingexif.SetValue(ExifTag.UserComment, comment)
. - Saves the modified image to
output.webp
.
- Loads the input image (
- Verification (MetadataExtractor):
- Reads the metadata from the saved
output.webp
usingMetadataExtractor
. - Retrieves the raw byte array associated with the
UserComment
tag (0x9286) from theExifSubIfdDirectory
. - Compares the actual byte array against:
- The expected 8-byte prefix (
UNICODE\0
). - The expected byte sequence for the comment string encoded in UTF-16LE (Incorrect).
- The expected byte sequence for the comment string encoded in UTF-16BE (Correct according to spec).
- The expected 8-byte prefix (
- Prints the comparison results to the console.
- Reads the metadata from the saved
- Ensure you have a
.NET SDK
installed (tested with .NET 9 SDK, but should work with compatible versions). - Place a sample WebP image named
image.webp
in the project directory. - Navigate to the project directory (
ImageSharpExifUserCommentEncodingBug
) in your terminal. - Run the command:
dotnet run
The console output should show:
- The expected byte sequences for the prefix, UTF-16LE payload, and UTF-16BE payload.
- The actual bytes read from the
UserComment
tag byMetadataExtractor
. - Confirmation that the prefix matches
UNICODE\0
. - Confirmation that the actual payload bytes match the UTF-16LE sequence, followed by the message:
Verification: Payload matches UTF-16 LE bytes. (BUG CONFIRMED - Should be UTF-16 BE)
This confirms that ImageSharp wrote the payload using the incorrect endianness (LE) instead of the specification-required BE.
The repository includes the output.webp
file generated by running this code. You can inspect this file using external EXIF tools to verify that the UserComment
tag contains data encoded as UTF-16LE instead of the expected UTF-16BE.