Skip to content

Non-determinism in JSON export #1211

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
davidstrauss opened this issue Nov 12, 2020 · 7 comments
Closed

Non-determinism in JSON export #1211

davidstrauss opened this issue Nov 12, 2020 · 7 comments

Comments

@davidstrauss
Copy link

davidstrauss commented Nov 12, 2020

Description of issue or feature request:

I've been working to create a deterministic test fixture generator for PHP-TUF. I've rooted out the apparent sources of most meaningful non-determinism by fixing the clock and using a fixed well of keypairs. However, some of the JSON export appears to have different behavior on different systems.

Shown below is the diff I see when comparing generated data on GitHub Actions (on Python 3.9 with ubuntu-latest) versus on my laptop (also Python 3.9 but with Fedora 33). We've pinned all known dependencies using pipenv, so I don't think it's that.

This causes a cascading set of differences because other files use hashes of snapshot.json.

Could TUF canonicalize even the JSON data that isn't directly signed?

Current behavior:
Screenshot from 2020-11-12 12-30-35

Expected behavior:

Deterministic (ideally canonical) output of JSON that contains the same functional data.

@davidstrauss
Copy link
Author

My suspicion is that one system but not another includes an accelerated JSON implementation that's preferentially used when available but doesn't cause any errors/warnings when missing.

@jku
Copy link
Member

jku commented Nov 13, 2020

So these are both files generated by python-tuf?

And the request is that while the spec does not (to my knowledge) require it, python-tuf tools should make efforts to produce deterministic output.

I think I would agree with that.

@lukpueh
Copy link
Member

lukpueh commented Nov 13, 2020

Could TUF canonicalize even the JSON data that isn't directly signed?

Note that python-TUF currently does not canonicalize any JSON metadata (not even the payload aka. "signed" part) on the wire, although there is a proposal to change this at least for the "signed" part to not require any JSON parsing of untrusted metadata see (secure-systems-lab/dsse#2).

Canonicalization of the entire metadata is not required by the spec, because file hashes of targets.json, $delegated-targets.json (in snapshot.json) and snapshot.json (in timestamp.json) are generated and then re-generated for client verification over the same file blob (without need for JSON parsing/canonicalization).

Regardless, @erickt has made a similar request in #1154 (in a similar context, i.e. interoperability testing).

I'm fine with implementing his suggestion.

@davidstrauss
Copy link
Author

So these are both files generated by python-tuf?

Yes, the exact same version as well (currently tuf==0.14.0).

there is a proposal to change this at least for the "signed" part to not require any JSON parsing of untrusted metadata

This is also something we'd appreciate on the PHP-TUF side.

Canonicalization of the entire metadata is not required by the spec

I'm not suggesting this behavior is in violation of the spec, just that it would make testing more reliable if the fixtures we generate can be more consistent.

@lukpueh
Copy link
Member

lukpueh commented Nov 19, 2020

Canonicalization of the entire metadata is not required by the spec

I'm not suggesting this behavior is in violation of the spec, just that it would make testing more reliable if the fixtures we generate can be more consistent.

Sure. I just wanted to clarify (also for myself) that the lack of canonicalization when generating metadata hashes to be included in other metadata is not an issue for client verification. :)

Let's go with @erickt's oneliner patch proposal in #1154. I can submit a PR...

@lukpueh
Copy link
Member

lukpueh commented Nov 23, 2020

#1217 fixes the signature order part of this issue

@jku
Copy link
Member

jku commented Feb 11, 2022

State as I understand it:

  • We try to be deterministic in Metadata API: e.g. JSON dictionary content is sorted on output
  • We do not canonicalize the output JSON (signed content is canonicalized but that is not used as the output format)

I'm closing this as it is about the legacy code which we no longer maintain: if you see a similar issue using Metadata API, please open a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants