Skip to content

"Individual claims can be merged together to express a graph of information about a subject." is not a true statement, in general #790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mwherman2000 opened this issue Aug 12, 2021 · 14 comments
Labels
pending close Close if no objection within 7 days

Comments

@mwherman2000
Copy link

mwherman2000 commented Aug 12, 2021

In https://www.w3.org/TR/vc-data-model/#claims, the following statement is not always true and, fundamentally and generally, not true:

Individual claims can be merged together to express a graph of information about a subject.

A VC can exist as an Unbound Credential; that is, a VC that has no subject (i.e. no credentialSubject id). In the specification, there is no requirement for a VC to refer to a subject. credentialSubject id is optional.

Ditto for (the following statements from the same section are, in general, and, fundamentally, not true:

A claim is a statement about a subject. A subject is a thing about which claims can be made. Claims are expressed using subject-property-value relationships.

A Claim is simply a name-value pair ...a named value: "color" : "red"

A set of Claims are related by their inclusion in the same collection of Claims inside a specific Credential. If one of those Claims is named id, then the set of Claims is assumed to be associated with that identifier ...in the same way the set of Claims can be assumed to be associated with the color red if there is also a claim in the collection of the form: "color" : "red".

Figure 2 and Figure 3 are not true for the same reason.

@mwherman2000 mwherman2000 changed the title "Individual claims can be merged together to express a graph of information about a subject." is not a true statement "Individual claims can be merged together to express a graph of information about a subject." is not a true statement, in general Aug 12, 2021
@msporny
Copy link
Member

msporny commented Aug 12, 2021

A Claim is simply a name-value pair ...a named value: "color" : "red"

This is not how the VC Data Model defines claim:

https://w3c.github.io/vc-data-model/#claims

This statement, which is claimed to not be true:

Individual claims can be merged together to express a graph of information about a subject.

is true, because the claims consist of 3-tuple values (subject-property-value), not 2-tuple values (property-value, as is suggested above). If claims were only 2-tuple values, the statement wouldn't be accurate as you state... but that redefines the word "claim" in a way that would break the VC Data Model specification.

@mwherman2000
Copy link
Author

mwherman2000 commented Aug 12, 2021

#FYHumor: I found this pioneer bucket while helping a neighbor round up his cows yesterday. It's a bit battered, has been patched up more than once, the handle has fallen off, and it has a few holes in the bottom (it doesn't hold water anymore) but it still looks like a bucket. Maybe we could use it as a logo for the version 1 of the VC data model specification? ;-) :-)

image

@mwherman2000
Copy link
Author

mwherman2000 commented Aug 12, 2021

the claims consist of 3-tuple values (subject-property-value), not 2-tuple values (property-value, as is suggested above). If claims were only 2-tuple values

You can't a have 3-tuples stipulated in the specification when the first tuple (subject) is completely optional and can completely and opaquely missing without detection (as in an Unbound Credential). The only way this might work is if there was a formal grammar or some sort of machine-readable specification for a VC grammar.

The whole JSON vs. JSON-LD thing is a mess.

The specification is so much like the above bucket, I really do have to sit back with "humorous expression on my face".

Do what you need to do, I'm headed in a different direction: #788 (comment)

@msporny
Copy link
Member

msporny commented Aug 12, 2021

You can't a have 3-tuples stipulated in the specification when the first tuple (subject) is completely optional and can completely and opaquely missing without detection (as in an Unbound Credential).

It's optional to specify, but always exists.

The only way this might work is if there was a formal grammar

JSON-LD (and RDF) provides that grammar and data model.

The whole JSON vs. JSON-LD thing is a mess.

I don't disagree with you there. How do you intend to solve it?

Do what you need to do, I'm headed in a different direction: #788 (comment)

Remember that you have at least 60+ people (number of people/implementers in the WG) you're going to need to convince on that new direction. More power to you if you figure out something simpler that achieves consensus.

@mwherman2000
Copy link
Author

mwherman2000 commented Aug 12, 2021

...but that redefines the word "claim" in a way that would break the VC Data Model specification.

#cliquespeak

The above is not a concern if the specification is already or potentially fundamentally broken. This is what we're trying determine.

@mwherman2000
Copy link
Author

mwherman2000 commented Aug 12, 2021 via email

@kdenhartog kdenhartog assigned kdenhartog and unassigned kdenhartog Aug 13, 2021
@kdenhartog
Copy link
Member

kdenhartog commented Aug 13, 2021

A VC can exist as an Unbound Credential; that is, a VC that has no subject (i.e. no credentialSubject id). In the specification, there is no requirement for a VC to refer to a subject. credentialSubject id is optional.

I think there may be a conflation occurring here that's worth pointing out. I'll try to explain this using a proof by contradiction argument. Bear with me:

Just because credentialSubject.id is absent, doesn't mean that the subject doesn't exist. Rather, the subject is just not identified. The @id property serves two functions in JSON-LD. First, it acts as a subject in an RDF statement which is where the traditional sense of the subject assertion comes into play and it acts as an identifier that can be used as proof of possession by the holder. However, because the JSON format doesn't apply the fundamental concepts of RDF I believe there's a conflation that when the VC is missing credentialSubject.id that it's now no longer got a subject. When in fact, in JSON-LD the credentialSubject.id property doesn't include the property underneath the hood it gets automatically assigned an ephemeral one - so it does still have an identifier but it no longer is an identifier that can be used to convey proof of possession by the holder. This is the process of blank node identifiers. In the case of JSON, it just becomes a poorly formed document. So, the subject is effectively still defined - it's who the issuer was making claims about. The only difference is whether or not the verifier can decipher the who later on. Let's take a look at a few examples to see what I mean:

{
    "id": "http://example.edu/credentials/1872",
    "type": ["VerifiableCredential", "AlumniCredential"],
    "issuer": "https://example.edu/issuers/565049",
    "issuanceDate": "2010-01-01T19:73:24Z",
    "credentialSubject": {
      "alumniOf": {
        "id": "https://example.edu",
        "name": [{
          "value": "Example University",
          "lang": "en"
        }, {
          "value": "Exemple d'Université",
          "lang": "fr"
        }],
        "studentId": "018283291"
      }
    },
    "proof": { ... }
  }

In this case the subject identifier is actually still defined, even though it's not defined within the credentialSubject.id property. The subject has been defined in the credentialSubject.studentId property. Unfortunately, it's still a poorly formed document that can't be authenticated because the holder wasn't identified. In some cases that's not necessary though. Such as when the document isn't being used for authorization. An example of value where this might occur is when a machine learning model wants to weight claims based on the issuer. In this case, the data is still of value being signed by the issuer and the holder is irrelevant.

Following with me so far? Ok, now here's where my logic is going to get a bit tricky.

Since we're recognizing that the issuer is stating claims we should also recognize that an identifier is merely one attribute to describe an entity and the removal of an attribute doesn't constitute the removal of the entity. Put another way by analogy, if I don't include my name on my passport, I don't suddenly disappear off the face of the planet. Rather my passport is just potentially a poorly formatted document (depending on the consumption use case). So, in the case of JSON where the credentialSubject.id attribute is missing, we don't suddenly get to say that the subject is gone. Rather, we just need to recognize that the identifier for that subject is no longer included and in reality the issuer produced a poorly defined document. Just like if my passport doesn't include my name I don't suddenly disappear, but more practically speaking the issuing authority of my passport just produced an ambiguous passport that makes it hard to identify me.

So where's the conflation here?

Well, just because the passport is missing my name, doesn't mean it can't still be bound to me and verified by a customs agent. If my picture is still included in the passport then I can still be visually authenticated by the customs agent - even though they don't know my name. So the conflation here is the presumption that the id property serves as both an identifier of the subject and the proof of possession by the holder and without the id the subject suddenly disappears which I believe is an incorrect presumption. This conflation is likely occurring because this is how it works for DIDs, but it's not the end all be all case because it's possible for the issuer to add additional properties that allow the holder to prove possession in a variety of different ways or for the issuer to identify the subject in other ways as well.

For example, let's take a look at this example:

{
  "id": "https://issuer.oidp.uscis.gov/credentials/83627465",
  "type": ["VerifiableCredential", "PermanentResidentCard"],
  "issuer": "did:example:28394728934792387",
  "identifier": "83627465",
  "name": "Permanent Resident Card",
  "description": "Government of Example Permanent Resident Card.",
  "issuanceDate": "2019-12-03T12:19:52Z",
  "expirationDate": "2029-12-03T12:19:52Z",
  "credentialSubject": {
    "id": "PermanentResidentId123",
    "type": ["PermanentResident", "Person"],
    "givenName": "JOHN",
    "familyName": "SMITH",
    "gender": "Male",
    "image": "...kJggg==",
    "residentSince": "2015-01-01",
    "lprCategory": "C09",
    "lprNumber": "999-999-999",
    "commuterClassification": "C1",
    "birthCountry": "Bahamas",
    "birthDate": "1958-07-17"
  },
  "proof": {
     "type": "ImageVerification",
     "verificationProcess": "Confirm image matches the person in front of you"
  }
}

In this example, the id is merely an additional attribute which is unique to the issuer. It's validly formed and there remains a subject, but the subject is not only that identifier. They are rather the entity that's being described by the issuer via the claims in the credentialSubject section. In this case though, the process by which a verifier is expected to verify the binding between the holder and the subject is actually the image and the holder is unidentified. In fact, I could publish this VC on my google drive, embed it in a link in a QR code and then the holder becomes Google (the host of Google Drive). Similarly, we could completely remove the id property, or remove it and add a different credentialSubject.permanentResidentId = 123 and no matter which of those changes occur the verifier could still verify the subject via the image and still doesn't care who the holder is.

So the point here is that the absence of the credentialSubject.id does not convey any additional information about the subject or holder in JSON. And in JSON-LD if the subject is randomly given an ephemeral identifier to form the proper RDF statements it doesn't always convey additional information about the subject or the proof of possession by the holder. Rather, the absence of the credentialSubject.id only conveys is that the issuer potentially malformed their document on issuance if they chose to not include a credentialSubject.id or a subject identifier (not always necessary) or a holder identifier (not always necessary) or a subject authentication mechanism (not always necessary). Rather, it's merely just a poorly formed document that's not very useful to the relying party who's depending on it and should be treated as such.

@David-Chadwick
Copy link
Contributor

There is an alternative argument which is based around proof of possession by the holder, rather than the identifier of the subject. It goes as follows. The issuer has a set of claims about an entity or indeed about a set of entities (such as the parents of a new born baby, or a marriage certificate). The issuer is entitled to give this set of claims to anyone (the holder as per Figure 1 of the Recommendation) that it deems is entitled to hold it. In order to bind this set of claims to the holder, the VC is bound (directly or indirectly) to a cryptographic key of the holder, so that the holder can subsequently prove possession of the VC by including the VC in a VP and signing the VP. This does not break the existing JSON, or the RDF or JSON-LD or the existing model, since the triple is now "holder with cryptographic id holds property with value".

Remember that all the description about the subject triples in section 3 is non-normative. And so is the section about bearer credentials in 7.9 which says that the subject ID is not specified so as to not divulge information about the holder. So both arguments are equally valid and in agreement with both the non-normative and normative parts of the Recommendation. The real issue is that we have conflicting non-normative parts of the Recommendation, some which imply the subjectID relates to the holder and some which imply it relates to the subject. For the vast majority of VCs this is not an issue, since subject == holder, but it is the edge cases that are causing the disconnect.

Perhaps the next version of the Recommendation can describe both of these viewpoints.

@mwherman2000
Copy link
Author

mwherman2000 commented Aug 13, 2021 via email

@brentzundel
Copy link
Member

brentzundel commented Aug 13, 2021 via email

@mwherman2000
Copy link
Author

mwherman2000 commented Aug 14, 2021 via email

@TallTed
Copy link
Member

TallTed commented Aug 16, 2021

@mwherman2000 (and some others) --

Please take care to edit your email-based replies, to remove excess quoted content, especially when that quoted content is not properly marked as such.

Your latest comment above includes the full content of three preceding comments, but these are not properly marked so they are not obviously quoted content, and Github doesn't even know how to hide the content it sent to you, nor can humans -- even those who like reading message-sequence bottom-to-top while each message is read top-to-bottom -- easily discern who wrote what nor when.

That makes the bounds of your own comments even harder to discern, and makes long threads like this one even harder to comprehend than they already are without such confusing embeds.

Increasing the difficulty of reading and understanding your words is counterproductive, if, as I expect, you really do want us to read, understand, and buy into your arguments.

As I include in my email .signature --

--
A: Yes.                          http://www.idallen.com/topposting.html
| Q: Are you sure?           
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

-- and it only gets worse when top- and bottom-posting are mixed.

@brentzundel brentzundel added pending close Close if no objection within 7 days and removed v2.0 labels Jul 26, 2022
@brentzundel
Copy link
Member

I do not believe this Issue calls for changes to the specification that should be made. I am marking the issue pending close and recommend closing after 7 days.

@brentzundel
Copy link
Member

No response since marked pending close, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending close Close if no objection within 7 days
Projects
None yet
Development

No branches or pull requests

6 participants