Skip to content

Protecting integrity of @context #956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mark-moir opened this issue Oct 24, 2022 · 14 comments
Closed

Protecting integrity of @context #956

mark-moir opened this issue Oct 24, 2022 · 14 comments
Assignees

Comments

@mark-moir
Copy link

Section 8.2 (Content Integrity Protection) highlights the dangers of depending on remote content that is not content-integrity protected, and suggests that developers use schemes such as Cryptographic Hyperlinks or IPFS to protect them.

Section B1 (Base Context) includes the content of (required) https://www.w3.org/2018/credentials/v1 and provides its SHA-256 digest. However, this content itself includes links that are not content-integrity protected (e.g., https://w3id.org/security#).

Doesn't Section B1 therefore create a false sense of security?

@msporny
Copy link
Member

msporny commented Oct 28, 2022

Doesn't Section B1 therefore create a false sense of security?

It's debatable if it does or not. :)

Ideally, yes, everything would be content protected all the way down to the human readable text... but we're just not there yet (from an infrastructure standpoint).

The security vocabulary has been brought under the purview of the VCWG: https://github.com/w3c/vc-data-integrity/tree/main/vocab/security

So, the sorts of attacks we're talking about are attacks against humans or semantic reasoners. At present, no one is using semantic inference on these documents, and if they do, I'd expect them to keep a static copy of all machine-readable vocabulary documents. That leaves the attack on the human-readable text, in this case, it's probably unlikely that someone could trick a human into mis-implementing something based on what we've written in the vocabulary (because there are test suites that text for implementation correctness/conformance).

The important thing is that everyone is using the same URLs for the terms, and in order to ensure that, all we need to do is embed the context in the spec and provide a SHA-256 digest for it.

Does that help?

@dlongley
Copy link
Contributor

Doesn't Section B1 therefore create a false sense of security?

No, I don't think so. But if you think so, you should draw out a concrete scenario where it's a significant issue so that it could be discussed and mitigations could be considered if it seems worthwhile to do so.

There's only so much that cryptographic primitives can provide; a false sense of security might be thinking that you are totally safe if the natural language content behind a link was protected. That content can still be misinterpreted. It's an unavoidable problem.

So, there's a spectrum of protection here. Additional protections usually come at some cost and comparisons between an infeasible, perfect system and a feasible, imperfect system need to be understood as such.

@mark-moir
Copy link
Author

Ideally, yes, everything would be content protected all the way down to the human readable text... but we're just not there yet (from an infrastructure standpoint).

Fair enough, but I think it would be preferable to defend against "creating a false sense of security" by addressing this point explicitly in Section B.1. As it stands, the section provides no hint that it doesn't fully address the issue, or what would be required in future to do so.

The security vocabulary has been brought under the purview of the VCWG: https://github.com/w3c/vc-data-integrity/tree/main/vocab/security

So, the sorts of attacks we're talking about are attacks against humans or semantic reasoners. At present, no one is using semantic inference on these documents, and if they do, I'd expect them to keep a static copy of all machine-readable vocabulary documents.

They'd still need to be confident that they have the correct static copy, just as for the Base context, right? Section B1 explicitly addresses this for the Base context, but is silent on the remaining documents.

That leaves the attack on the human-readable text, in this case, it's probably unlikely that someone could trick a human into mis-implementing something based on what we've written in the vocabulary (because there are test suites that text for implementation correctness/conformance).

The important thing is that everyone is using the same URLs for the terms, and in order to ensure that, all we need to do is embed the context in the spec and provide a SHA-256 digest for it.

I don't follow why ensuring everyone has the same URLs is "the important thing", without any guarantee that everyone will have the same content from those URLs.

I do see references to hash links in the Implementation Guide, so can see that you've thought about how this issue could be addressed, but this is not used in the URLs included in the Base context.

Does that help?

Yes, somewhat, thanks. For context, I have been observing the various arguments: from @SmithSamuelM (e.g., here) that it's critical to ensure the integrity of content is protected (with which I agree) and that therefore JSON-LD is a non-starter (I don't yet agree); and from you (@msporny) that it's fine, people who don't want the extensibility of JSON-LD and don't want to do (formal) JSON-LD processing can be satisfied by only including the required base context, and this does not require (formal) JSON-LD processing.

In trying to form an opinion about these seemingly competing positions, I made the observation that caused me to create this issue.

@mark-moir
Copy link
Author

So, there's a spectrum of protection here. Additional protections usually come at some cost and comparisons between an infeasible, perfect system and a feasible, imperfect system need to be understood as such.

@dlongley I agree regarding spectrum and especially "need to be understood as such". My "false sense of security" question asks whether the current spec is clear enough in this regard. See above response to @msporny, in which I say I think not yet.

Thanks.

@brentzundel brentzundel added discuss ready for PR This issue is ready for a Pull Request to be created to resolve it labels Jan 11, 2023
@msporny
Copy link
Member

msporny commented Jan 11, 2023

Hey @mark-moir could you please provide some text that we could put in the specification that would address your concern? Having concrete text at this point would be helpful.

@mark-moir
Copy link
Author

Hi @msporny. The following would address my immediate "false sense of security" concern:

This section serves as a reminder of the importance of ensuring that, when verifying a Verifiable Presentation, the Verifier has information that is consistent with what the Issuer of the credential had when signing the Verifiable Credential from which the Verifiable Presentation is derived. This information may include at least: 1) contents of the credential itself; 2) any content included by reference in the credential; 3) any content transitively included by reference.

Digital signatures or other mechanisms such as Zero Knowledge Proofs address 1). Including the SHA-256 digest of the Base Context in this section addresses 2) for the Base Context.

However, it does not do so for content referenced by the Base Context (e.g., https://w3id.org/security#), and thus does not address 3). Furthermore, it does not address 2) for use-case-specific content that cannot be anticipated in this specification.

Therefore, a more comprehensive approach is required in future to ensure that Verifiers have sufficient information (such as cryptographic hashes of all referenced content; see the Content Integrity section of the Implementation Guide) to verify that content it either fetches remotely or caches locally is consistent with what the Issuer intended.

What do you think?

@msporny
Copy link
Member

msporny commented Jan 15, 2023

What do you think?

@mark-moir thank you, that's an excellent start that we can word smith into something that will find its way into the spec. Do you want to raise the PR, or do you want me to raise it (with a few edits here and there)? I'll note that Oracle is not a member of the WG, so you might need to get your W3C AC Rep to add you before you submit the PR (to make the IPR release). If not, I've got a decent idea of what you're looking for now and can take a shot at some text, though I'd rather just start w/ yours as long as we have an appropriate IPR release in place. Oracle is a W3C member, so it shouldn't be difficult for you to join the group. Let me know how you'd like to proceed.

@mark-moir
Copy link
Author

Thanks @msporny. As you note, there would be some administrative overhead required for me to do a PR, and even then, according to my understanding, I would need to add an Oracle copyright notice, which I can't imagine is desirable for this small change. I'm happy for you to start with my input, edit as you see fit (or not), and include it in a PR.

@msporny
Copy link
Member

msporny commented Jan 15, 2023

I would need to add an Oracle copyright notice

This wouldn't be needed, it (more or less) happens automatically when you join the group representing Oracle.

That said, I hear you, I'll assign it to myself and do my best to represent your suggestion above.

@msporny msporny self-assigned this Jan 15, 2023
@iherman
Copy link
Member

iherman commented Feb 16, 2023

The issue was discussed in a meeting on 2022-09-15

  • no resolutions were taken
View the transcript

5.2. Protecting integrity of @context (issue vc-data-model#956)

See github issue vc-data-model#956.

David Chadwick: When I do the PR, Joe, you can suggest edits to the wording.

Manu Sporny: I need to get to this.

@msporny
Copy link
Member

msporny commented Feb 17, 2023

@mark-moir, PR #1051 has been raised to address your concern. I had to modify your text to try to align it with the rest of the spec. Please let me know if this text works for you.

@msporny msporny added pr exists and removed ready for PR This issue is ready for a Pull Request to be created to resolve it discuss labels Feb 17, 2023
@mark-moir
Copy link
Author

Hi @msporny, thanks! Here are some comments on #1051.

Regarding:

The SHA-256 digest of JSON-LD Contexts, as well as locally cached copies, address item #2 in the previous paragraph.

  • The digest doesn't "do" anything. I think something like: "knowing the SHA256 digest of any content linked as in item #2 in the previous paragraph enables a verifier to confirm that the content is the same as the issuer or holder intended" is better.
  • "knowing" is a bit vague here; maybe someone can tighten it to indicate that the digest is available in some way already validated (e.g., baked into the standard as in the original case, or included in content that has already been validated).
  • I think whether the content is fetched remotely or locally cached is a red herring here. Either way, the digest should be confirmed before the content is used (in the case of locally cached content it could be confirmed when caching, or upon use).

Regarding:

See the Content Integrity section of the Verifiable Credential Implementation Guide to verify that content it either fetches remotely or caches locally is consistent with what the issuer or holder intended.

  • I think it would make sense to change "verify" to something like "for discussion of mechanisms for verifying" (simply looking at the section does not verify anything 😄)
  • I agree with this comment from @dlongley to use passive voice here too.

@msporny
Copy link
Member

msporny commented Apr 3, 2023

Hi @msporny, thanks! Here are some comments on #1051.

Great, thanks for the feedback @mark-moir!

In the future, please leave the feedback in the PR (ideally as change suggestions), as trying to coordinate all of the input into the section from various issues, the PR, and email (yes, someone emailed me directly about this PR) delays the time it takes to process the PR. In any case, grateful to you that you took a look.

Regarding:

  • The digest doesn't "do" anything. I think something like: "knowing the SHA256 digest of any content linked as in item #2 in the previous paragraph enables a verifier to confirm that the content is the same as the issuer or holder intended" is better.

Fixed in 4659e91.

  • "knowing" is a bit vague here; maybe someone can tighten it to indicate that the digest is available in some way already validated (e.g., baked into the standard as in the original case, or included in content that has already been validated).

Searched for a word that would be better, but failed to come up with one. I don't think "knowing" is misleading, in any case, it is "knowledge that is known." :)

  • I think whether the content is fetched remotely or locally cached is a red herring here. Either way, the digest should be confirmed before the content is used (in the case of locally cached content it could be confirmed when caching, or upon use).

Fixed in 4659e91 (by removing "locally cached" and focusing solely on the cryptographic hash).

See the Content Integrity section of the Verifiable Credential Implementation Guide to verify that content it either fetches remotely or caches locally is consistent with what the issuer or holder intended.

  • I think it would make sense to change "verify" to something like "for discussion of mechanisms for verifying" (simply looking at the section does not verify anything smile)

Fixed in 4659e91 (by removing "verify" and just clearly stating that readers should go and look at that section for more information).

Fixed in aa4d74c

@msporny
Copy link
Member

msporny commented Apr 4, 2023

PR #1051 has been merged, closing.

@msporny msporny closed this as completed Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants