Skip to content

nirvash/ImageSharpExifUserCommentEncodingBug

Repository files navigation

ImageSharp EXIF UserComment Unicode Encoding Bug (UTF-16LE vs UTF-16BE)

This project demonstrates a bug in SixLabors.ImageSharp where Unicode text written to the EXIF UserComment tag (0x9286) is encoded using UTF-16LE instead of UTF-16BE as required by the EXIF specification.

Problem Description

According to the EXIF specification (JEITA CP-3451, section 4.6.5 "User Comments"), when using Unicode encoding for the UserComment tag, the data should start with an 8-byte identifier UNICODE\0 followed by the comment text encoded in UTF-16BE (Big Endian).

However, ImageSharp appears to encode the text payload using UTF-16LE (Little Endian), while still using the correct UNICODE\0 prefix. This causes the UserComment to be misinterpreted by other EXIF readers that strictly follow the specification.

Code Explanation (Program.cs)

  1. Configuration: Sets input (image.webp), output (output.webp) paths, and the comment string ("Hello, World! こんにちわ世界") containing both ASCII and non-ASCII characters to ensure Unicode encoding is triggered.
  2. Image Processing (ImageSharp):
    • Loads the input image (image.webp).
    • Gets or creates an ExifProfile.
    • Sets the UserComment tag using exif.SetValue(ExifTag.UserComment, comment).
    • Saves the modified image to output.webp.
  3. Verification (MetadataExtractor):
    • Reads the metadata from the saved output.webp using MetadataExtractor.
    • Retrieves the raw byte array associated with the UserComment tag (0x9286) from the ExifSubIfdDirectory.
    • Compares the actual byte array against:
      • The expected 8-byte prefix (UNICODE\0).
      • The expected byte sequence for the comment string encoded in UTF-16LE (Incorrect).
      • The expected byte sequence for the comment string encoded in UTF-16BE (Correct according to spec).
    • Prints the comparison results to the console.

How to Run

  1. Ensure you have a .NET SDK installed (tested with .NET 9 SDK, but should work with compatible versions).
  2. Place a sample WebP image named image.webp in the project directory.
  3. Navigate to the project directory (ImageSharpExifUserCommentEncodingBug) in your terminal.
  4. Run the command: dotnet run

Expected Result

The console output should show:

  • The expected byte sequences for the prefix, UTF-16LE payload, and UTF-16BE payload.
  • The actual bytes read from the UserComment tag by MetadataExtractor.
  • Confirmation that the prefix matches UNICODE\0.
  • Confirmation that the actual payload bytes match the UTF-16LE sequence, followed by the message:
    Verification: Payload matches UTF-16 LE bytes. (BUG CONFIRMED - Should be UTF-16 BE)
    

This confirms that ImageSharp wrote the payload using the incorrect endianness (LE) instead of the specification-required BE.

Included Output File (output.webp)

The repository includes the output.webp file generated by running this code. You can inspect this file using external EXIF tools to verify that the UserComment tag contains data encoded as UTF-16LE instead of the expected UTF-16BE.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages