Skip to content

Collect unknown app1 data for further processing #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

taavit
Copy link
Contributor

@taavit taavit commented Jan 4, 2023

I'm working on project with some non-exif tags data, and only currently only exif tag from App(1) is collected. This PR collects all unknown app1 data, allowing user of library its further processing.

Copy link
Member

@197g 197g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern isn't seriously blocking but if you want to provide a more idiomatic iterator API, please do so.

Otherwise, I'd return Vec<Vec<u8>> and not worry too much about the additional clone, just like we do for icc_profile. Consistency will be more important here than that one allocation (in terms of maintanance anyways).

src/decoder.rs Outdated
@@ -197,6 +199,11 @@ impl<R: Read> Decoder<R> {
self.exif_data.as_deref()
}

/// Returns collection of raw app data that couldn't be recognized
pub fn unknown_data(&self) -> &Vec<Vec<u8>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it's almost never optimal to return &Vec<_>. Doing this always requires the field to stay a vector and the caller can't do much with it anyways, it should be a reference to a slice or indeed an iterator over slices.

Suggested change
pub fn unknown_data(&self) -> &Vec<Vec<u8>> {
pub fn unknown_data(&self) -> &[Vec<u8>] {

or even

Suggested change
pub fn unknown_data(&self) -> &Vec<Vec<u8>> {
pub fn unknown_data(&self) -> impl Iterator<Item=&'_ [u8]> + '_ {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ended with, although I'm not sure if map is a right tool here.

   /// Returns collection of raw app data that couldn't be recognized
   pub fn unknown_data(&self) -> impl Iterator<Item=&'_ [u8]> + '_ {
       self.unknown_data.iter().map(|v| { v.as_slice() })
   }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally lead towards &[Vec<u8>]. The type signature of the method necessitates that all the raw app data has already been loaded before calling it so there doesn't seem to be much benefit to returning an iterator.

src/decoder.rs Outdated
@@ -197,6 +199,11 @@ impl<R: Read> Decoder<R> {
self.exif_data.as_deref()
}

/// Returns collection of raw app data that couldn't be recognized
pub fn unknown_data(&self) -> impl Iterator<Item=&'_ [u8]> + '_ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling this unknown_data feels a bit vague.

Also, I'm noticing that the "unknown" part of the name signals a possible backwards compatibility issue: if later versions of the crate do recognize some other app data, will the user have to switch interfaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe store all AppData, including known one. So in the future, when crate recognizes more tags, user will switch from custom implementation of "raw" to crate one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really makes me consider if data from all App(n) variants should be collected?

Just as an overview, there's a whole lot of specific cases that would access these and which we currently do not store: https://exiftool.org/TagNames/JPEG.html

In that case however the Unknown variant should collect the n so that the caller always knowns which kind of app segment the data is from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my purpose I need only data from App1. However, because goal of this library is to be generic, collect all data from all apps, and leave it to the user how to process them makes sense to me.

What way of storing would you suggest? Iterator of pairs (u8, [u8])? Map?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a simple struct? Should be semantically cleaner than sticking to tuples. And with that, fintelias preference in the other comment makes more sense to me, i.e. the public API can return &[AppSegment] for accessing them instead of that impl-iterator type.

struct AppSegment {
    pub kind: u8,
    pub data: Vec<u8>,
}

Copy link
Contributor

@fintelia fintelia Jan 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the exif data can just be stored as an index into the Vec<AppSegment> rather than needing to keep around an extra copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between exif data [u8] and app_segment with exif [u8] is header *b"Exif\x00\x00".

@hfiguiere
Copy link
Contributor

I was looking at adding explicit XMP support, which is an APP1 segment starts with the string http://ns.adobe.com/xap/1.0/\0. This would probably conflict with this, but complement it too.

I wonder how we can proceed.

@hfiguiere hfiguiere mentioned this pull request Apr 25, 2023
}
},
_ => {},
}

skip_bytes(reader, length - bytes_read)?;
Copy link
Contributor

@hfiguiere hfiguiere Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this call cause the following warning:

warning: function is never used: `skip_bytes`
   --> src/parser.rs:154:4
    |
154 | fn skip_bytes<R: Read>(reader: &mut R, length: usize) -> Result<()> {
    |    ^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

tests/lib.rs Outdated
let path = Path::new("tests")
.join("reftest")
.join("images")
.join("unknown_app1.jpg");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use the existing ycck.jpg. It has all the same app1 and more.

@taavit
Copy link
Contributor Author

taavit commented Apr 27, 2023

I was looking at adding explicit XMP support, which is an APP1 segment starts with the string http://ns.adobe.com/xap/1.0/\0. This would probably conflict with this, but complement it too.

I wonder how we can proceed.

Maybe app0/app1 could be separated from parsing jpeg data, then it could be more generic, and allows custom parser to be injected. Otherwise handling new custom app1 data will need changes to this library, or creating forks.

- removed unused function
@taavit taavit deleted the branch image-rs:master April 28, 2023 20:02
@taavit taavit closed this Apr 28, 2023
@taavit taavit deleted the master branch April 28, 2023 20:02
@taavit taavit mentioned this pull request Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants