-
Notifications
You must be signed in to change notification settings - Fork 117
proof
in @context
and the use of @container
#881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@container
proof
in @context
and the use of @container
Relevant sections of the JSON-LD TR:
https://www.w3.org/TR/json-ld11/#graph-containers
^ pretty sure this is the culprit... it means that if you expand a credential, you loose the relationship between the credential and its proof. |
This states that URGNA2012 (Universal RDF Graph Normalization Algorithm 2012) didn't do this as it only dealt with RDF Graphs, not RDF Datasets, and so we just shoved all the RDF signature data into the default graph (and some people were rightfully upset by that). When the RDF 1.1 work expanded to include RDF Datasets (part of the driver there was to support concepts that JSON-LD supported but the core RDF data model at the time didn't support), we separated the "data to be signed" from the "signature information" to ensure a cleaner separation between the two types of data. That became the URDNA2015 (Universal RDF Dataset Canonicalization Algorithm 2015). Hopefully the benefits of this architectural separation between original data and signature data are clear... if they're not, I'm happy to try and elaborate on how jumbling "data to be signed" with "the signature" leads to dirty data over time, especially when you shove it into / take it out of graph databases. As for what neo4j is doing there... you might ask them how they link statements between RDF Graphs in an RDF Dataset... might just be a limitation on their tooling. The JSON-LD Playground doesn't seem to suffer from the same limitation. |
@OR13 can you give the JSONLD you use to make that neo4j graph? |
@msporny thanks, I figured that is what was happening.
^ these are the URIs that neo4j assigns to the blank nodes (based on a default graph config):
... so it is possible to query over the gap, between the graphs, you just have to do some string magic. |
Hrm, that feels a bit weird. It looks like they're sharing some part of the bnode ID space, but then tacking something on at the end ( re: https://v.jsld.org/ -- that's a neat visualization tool :) Note that the |
^ exactly, I suspect that with an updated graph config in neo4j the link would be imported as |
I fear I've missed something important along the way... Are you saying that, in RDF Dataset Canonicalization, "the data being signed" is always in the default graph, and not in a named graph? This is (or will be) problematic for systems (such as Virtuoso) where the default graph is the union of all named graphs (plus, at least in Virtuoso's case, a special not-really-named graph which is populated by inserts that do not specify a target named graph)... Further, in such systems, this re-blurs the lines between "the data being signed" and "the proof data", as the named graph containing the latter is included in the default graph containing the former -- i.e., the default graph contains both the "data being signed" and "the proof data"... |
No, this is unrelated to RDF Dataset Canonicalization. As for Data Integrity proofs, the above separation of concerns and process may have been better described by just saying that a proof always exists in its own named graph so as to isolate it from other data. So, whenever you create a proof (when using proof sets as opposed to proof chains), you remove any existing proof named graphs from the default graph, then sign the entire (canonicalized) dataset, then add back the existing proof named graphs and add the new proof named graph that represents the new proof to the default graph. Does this clarify? |
@dlongley --
"The default graph" seems not to be the correct label for all of the above instances, and even if it were, in Virtuoso (for instance), you cannot "remove any existing proof named graphs from the default graph" unless you are dropping those "existing proof named graphs" from the quad store, because all existing named graphs are part of the default graph (except when specific SPARQL clauses are used to change the definition of the default graph for that query, which does not appear to be part of the process you're describing). |
Sorry to potentially add to the confusion. I think I follow but want to check (this also feels like we're diverging into a separate topic so I can take this elsewhere if you want):
If the proof graph(s) are always decoupled during signing, then the metadata about the signature generation is not part of the signature? 👇🏻 indeed
|
+1 for finding better terminology to avoid confusion as needed. EDIT: I presume you could implement the above using a specific SPARQL query as you mentioned (to "change the definition of the default graph") if you need to interact with the data that way via a quad store (as opposed to in memory). |
I think responding to individual concerns without a comprehensive response (i.e., what the spec says or should say) on the entire process is leading to more confusion here. But at risk of introducing more confusion in just responding to your particular query, a Data Integrity proof involves signing over a hash of both the canonicalized dataset (with any existing proofs in the default graph removed when using "proof sets") and over a hash of the canonicalized meta data for the new proof. In other words, all data is signed except for the signature itself (which is not logically possible to sign over since it is an output of the process).
The above should clarify that the answer to this is: "No". |
@dlongley, thank you. That's how I originally had thought about it. Crystal clear now. |
@dlongley --
Still trying to parse this... It appears that the "both" is misplaced in the sentence and/or the "over a hash of both" is missing one of the things being hashed. Maybe --
-- or --
-- or --
-- or something I'm not seeing yet... |
The canonicalized meta data is hashed producing |
AFAIK, the "Data Integrity Proofs" or what used to be called "Linked Data Proofs" have not changed in this regard since 2017... Here is an example where I tested them against Mastodon: (Mastodon is the original web5, get on my level haters). |
I was also working on LD signatures back then when the signatures/proofs still used to be in the same graph as the data, and I remember it felt like the right decision to move the signatures/proofs into their own named graphs as it is now. |
@OR13 The example doesn't parse in rdf4j, probably because it doesn't yet support JSON-LD 1.1: eclipse-rdf4j/rdf4j#3654 Jena 4.4.0 2022-01-30 also gave error
@TallTed should we post an issue to SPARQL 1.2 "FROM should allow the exclusion of graphs"? Maybe no, because to fulfill the goal "separate the data you're signing", a repository would store the VC in a named graph: storing hundreds or millions of VCs in the default graph would not allow you to separate them. |
@VladimirAlexiev -- I think there are some scenarios where a |
A simpler one liner to reproduce the issue (beware it deletes everything, so don't run this outside of a new database):
Then view the data with:
|
Here is a snippet of CQL that adds a link relationship between the proof node and "similar blank nodes"... This is an incredibly expensive hacky work around: MATCH
(n0: Resource),
(n1: Resource),
(n2: Resource)
WHERE
(n0)-[:proof]->(n1) AND
apoc.text.levenshteinSimilarity(n1.uri, n2.uri) > .8 AND
apoc.text.levenshteinSimilarity(n1.uri, n2.uri) < 1
MERGE (n1)-[link: DATA_INTEGRITY_PROOF]->(n2)
RETURN n0, n1, n2 After this link has been added the graphs are connected. |
@VladimirAlexiev I had the same issue with JSON-LD v1.1 before... Its a major reason to convert from the standard JSON representation of a credential to the n-quad or framed versions... which seem to be better supported by graph databases. I suppose the next step should be to create 3 or 4 VCs and import them all, and then look at the graph again. I would expect to be able to see that they are "proofs for the same information", but from different actors, over time. |
A much smarter way to join the graphs after import: MATCH
(n1: Resource),
(n2: Resource)
WHERE
split(n1.uri, '-')[1] = split(n2.uri, '-')[1] AND
NOT EXISTS(n1.jws) AND
EXISTS(n2.jws)
MERGE (n1)-[link: DATA_INTEGRITY_PROOF]->(n2)
RETURN n1, n2 ^ this doesn't work though because of the way the blank node identifiers are assigned during a bulk import... In this case, 3 credentials are imported, but each has a proof with a blank node id that looks like:
... because they were imported at the same time.... even though the credentials were issued at different times. On the other side of the gap, we have:
After import, we can tell they are all related by looking at A few thoughts:
My goal:
it seems the naive solutions to this problem are causing me to trade 1 goal for another. |
Importing objects that might contain blank nodes 1 at a time seems to work: Left hand side:
Right hand side:
It's now possible to join by looking at the middle component of the MATCH
(credential: Resource),
(signature: Resource)
WHERE
()-[:proof]->(credential) AND
EXISTS(signature.jws) AND
split(credential.uri, '-')[1] = split(signature.uri, '-')[1]
MERGE (credential)-[link: DATA_INTEGRITY_PROOF]->(signature)
RETURN credential, signature, link After this relationship is added: |
Unfortunately, this won't help you with Verifiable Presentations... Because the proofs on the credentials will have a similar blank node identifier as the proof on the presentation: Left:
Right:
Same problem as before. The problem here is worse though... Since we also have the dangling "holder": {"@id": "cred:holder", "@type": "@id"},
"proof": {"@id": "sec:proof", "@type": "@id", "@container": "@graph"},
"verifiableCredential": {"@id": "cred:verifiableCredential", "@type": "@id", "@container": "@graph"} I'm less sure how to fix this since:
It should be possible to import the credentials individually, then the presentation, and then define relationships between them... but having to do that for every VP is going to add a LOT of overhead. ... it does work... After importing each item 1 at a time... the graphs for a VP can be joined: But I lost the MATCH
(vp { uri: 'urn:uuid:7ea1be55-fe46-443e-a0ce-eb5e40f47aaa' }),
(vc { uri: 'urn:uuid:a96c9e16-adc3-48c7-8746-0e1b8c3535ba' })
MERGE
(vp)-[link: PRESENTED]->(vc)
RETURN vc, vp, link |
Blank nodes are extremely useful, just like other forms of pronoun. However, they are not appropriate for use in all cases; sometimes, a proper noun (a/k/a a URI, URN, IRI, such as a DID) is more appropriate. I submit that these are such cases. |
I added a similar import for VC-JWTs here transmute-industries/verifiable-data#198 This raises interesting questions, since VC-JWT has an I can see benefits to both approaches... but its interesting to not that by default both LD Proofs and VC-JWT don't import the proof as connected to the credential. |
The issue was discussed in a meeting on 2022-08-03
View the transcript6.7.
|
blocked by #947 |
I think we still need to address the graph container issue in the core data model vs the security formats. Data Integrity side is easy, but how does this map to the |
The issue was discussed in a meeting on 2023-04-04
View the transcript1.4.
|
a note: the |
@filip26 wrote:
The are a number of implementations that don't have this behavior. Can you please provide the section of the JSON-LD or VC specification that you feel triggers this behaviour? |
I don't know what step(s) in the algorithm causes the behavior, but I object that other implementations do not have this issue. This example with @graph produces |
The issue was discussed in a meeting on 2023-04-19
View the transcript2.1.
|
The issue was discussed in a meeting on 2023-05-17
View the transcript2.1.
|
This issue can be closed when #1158 is merged |
The issue was discussed in a meeting on 2023-06-28
View the transcript2.8.
|
@iherman on the call today, you asserted that the current JSON-LD context behavior wrt proof is correct. I wanted to share some implementation experience with the working group on applying the current proof graphs, as the are generated with the current normative contexts, when converting from JSON-LD to RDF. It is true that when importing a graph for a This behavior was previously ambiguous, but will now be consistent thanks to making the context normative. It impacts if software systems will process these data models as RDF graphs. Regardless of what the context says the RDF should be, a graph processing verifier might decide to attach proofs to credentials, or credentials to presentations, in order to generate more efficient graph queries. At Transmute, we've obviously been using neo4j a lot, as have a lot of companies that are interested in modern graph APIs and moving beyond just doing what RDF allows (especially while we wait to see what RDF-star will allow). Here is a link to a tool we use to evaluate JSON-LD DIDs and VCs: https://github.com/transmute-industries/transmute Here is a link to an open source US Customs program that also uses neo4j: https://github.com/US-CBP/GTAS While I personally don't agree with the RDF graph that is now normative, as you can see, I am comfortable working around its flaws to produce graphs that preserve the relationships we see in JSON, specifically the relationships between I think most folks will be surprised to learn that while Similarly, folks will probably be surprised to learn that a verifiable presentation will not contain credentials when imported for the same reason, and that its proof will also be treated the same way. This causes graph processors to "forget where things came from" after importing JSON-LD as RDF. I find this behavior undesirable, but obviously we can work around it, and now our work around will be consistent, thanks to making the
As I said on the call, this issue predates the working groups intelligent decision to make the context normative, and this issue can be closed when this PR is merged: |
The PR is merged, I presume this issue is now moot and can be set as pending close. @brentzundel @Sakurann @OR13 ? |
Indeed! Consumers of verifiable credentials as RDF are now assured of a specific graph structure, by the application of our normative context. This makes extension or translation in a reliable manner possible. This issue should be closed. |
This issue has been addressed, closing. |
Uh oh!
There was an error while loading. Please reload this page.
I've been using Neo4j a lot lately.
One of my favorite features is the ability to preview (framed) JSON-LD.
For example:
For simple cases this works fine... but when I attempt to apply this to spec compliant verifiable credentials, I get a weird blank node issue with the proof block.
Here is a picture of what I mean:
Notice the 2 blank nodes that separate these disjoint subgraphs.
I believe this is caused by the way the
proof
block is defined in the v1 context:https://github.com/w3c/vc-data-model/blob/v1.1/contexts/credentials/v1#L45
This is a lot of complexity... for one of the most important term definitions the standard provides.
I believe this is also the cause of the "double blank node" issue, I observed above.
I think what happens is that a first blank node is created for the proof, and since that node has
@container
@graph
, instead of being able to trace the relationships directly from credential to proof to verification method...Each proof is being treated as a disjoint subgraph, and the relationship is not being preserved during preview / import...
This is really not ideal, since I am interested in querying changes in these proofs over time for credentials, and that relationship is not being imported.
I suspect this is solvable with a more complicated graph config: https://neo4j.com/labs/neosemantics/4.0/config/
But I wonder if we might correct this behavior in VC Data Model 2.0, such that RDF representations don't have this odd behavior when imported as labeled property graphs.
Anyone know how to solve this?
The text was updated successfully, but these errors were encountered: