-
Notifications
You must be signed in to change notification settings - Fork 91
Collect unknown app1 data for further processing #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern isn't seriously blocking but if you want to provide a more idiomatic iterator API, please do so.
Otherwise, I'd return Vec<Vec<u8>>
and not worry too much about the additional clone, just like we do for icc_profile
. Consistency will be more important here than that one allocation (in terms of maintanance anyways).
src/decoder.rs
Outdated
@@ -197,6 +199,11 @@ impl<R: Read> Decoder<R> { | |||
self.exif_data.as_deref() | |||
} | |||
|
|||
/// Returns collection of raw app data that couldn't be recognized | |||
pub fn unknown_data(&self) -> &Vec<Vec<u8>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it's almost never optimal to return &Vec<_>
. Doing this always requires the field to stay a vector and the caller can't do much with it anyways, it should be a reference to a slice or indeed an iterator over slices.
pub fn unknown_data(&self) -> &Vec<Vec<u8>> { | |
pub fn unknown_data(&self) -> &[Vec<u8>] { |
or even
pub fn unknown_data(&self) -> &Vec<Vec<u8>> { | |
pub fn unknown_data(&self) -> impl Iterator<Item=&'_ [u8]> + '_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've ended with, although I'm not sure if map is a right tool here.
/// Returns collection of raw app data that couldn't be recognized
pub fn unknown_data(&self) -> impl Iterator<Item=&'_ [u8]> + '_ {
self.unknown_data.iter().map(|v| { v.as_slice() })
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally lead towards &[Vec<u8>]
. The type signature of the method necessitates that all the raw app data has already been loaded before calling it so there doesn't seem to be much benefit to returning an iterator.
src/decoder.rs
Outdated
@@ -197,6 +199,11 @@ impl<R: Read> Decoder<R> { | |||
self.exif_data.as_deref() | |||
} | |||
|
|||
/// Returns collection of raw app data that couldn't be recognized | |||
pub fn unknown_data(&self) -> impl Iterator<Item=&'_ [u8]> + '_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling this unknown_data
feels a bit vague.
Also, I'm noticing that the "unknown" part of the name signals a possible backwards compatibility issue: if later versions of the crate do recognize some other app data, will the user have to switch interfaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then maybe store all AppData, including known one. So in the future, when crate recognizes more tags, user will switch from custom implementation of "raw" to crate one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really makes me consider if data from all App(n)
variants should be collected?
Just as an overview, there's a whole lot of specific cases that would access these and which we currently do not store: https://exiftool.org/TagNames/JPEG.html
In that case however the Unknown
variant should collect the n
so that the caller always knowns which kind of app segment the data is from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my purpose I need only data from App1. However, because goal of this library is to be generic, collect all data from all apps, and leave it to the user how to process them makes sense to me.
What way of storing would you suggest? Iterator of pairs (u8, [u8])? Map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a simple struct? Should be semantically cleaner than sticking to tuples. And with that, fintelias preference in the other comment makes more sense to me, i.e. the public API can return &[AppSegment]
for accessing them instead of that impl-iterator type.
struct AppSegment {
pub kind: u8,
pub data: Vec<u8>,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the exif data can just be stored as an index into the Vec<AppSegment>
rather than needing to keep around an extra copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between exif data [u8]
and app_segment with exif [u8]
is header *b"Exif\x00\x00"
.
I was looking at adding explicit XMP support, which is an APP1 segment starts with the string I wonder how we can proceed. |
} | ||
}, | ||
_ => {}, | ||
} | ||
|
||
skip_bytes(reader, length - bytes_read)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing this call cause the following warning:
warning: function is never used: `skip_bytes`
--> src/parser.rs:154:4
|
154 | fn skip_bytes<R: Read>(reader: &mut R, length: usize) -> Result<()> {
| ^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
tests/lib.rs
Outdated
let path = Path::new("tests") | ||
.join("reftest") | ||
.join("images") | ||
.join("unknown_app1.jpg"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use the existing ycck.jpg
. It has all the same app1 and more.
Maybe app0/app1 could be separated from parsing jpeg data, then it could be more generic, and allows custom parser to be injected. Otherwise handling new custom app1 data will need changes to this library, or creating forks. |
- removed unused function
I'm working on project with some non-exif tags data, and only currently only exif tag from App(1) is collected. This PR collects all unknown app1 data, allowing user of library its further processing.