Skip to content

Possible oversights in processing when @type is not an IRI? #446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
trwnh opened this issue Nov 18, 2024 · 6 comments
Closed

Possible oversights in processing when @type is not an IRI? #446

trwnh opened this issue Nov 18, 2024 · 6 comments

Comments

@trwnh
Copy link

trwnh commented Nov 18, 2024

JSON-LD 1.1 requires @type to be an IRI...

From 3.5 Specifying the Type: https://www.w3.org/TR/json-ld11/#specifying-the-type

In Linked Data, types are uniquely identified with an IRI.

From 9.16 Keywords: https://www.w3.org/TR/json-ld11/#keywords

The @type keyword MAY be aliased and MAY be used as a key in a node object or a value object, where its value MUST be a term, IRI reference, or a compact IRI (including blank node identifiers).

So it seems pretty clear to me that the intent and also normative requirement is for every entry in @type to be an IRI, ultimately.

...but there are documents in the wild that don't follow this...

HOWEVER: There are documents, producers, and other specs that currently exist that use raw string literals within type which is aliased to @type by a @context declaration. Notably, ATProto has this to say: https://atproto.com/specs/did#did-documents

The PDS service network location for the account is found under the service array, with id ending #atproto_pds, and type matching AtprotoPersonalDataServer

At the same time, there is no ATProto-specific context document or declaration: https://web.plc.directory/did/did:plc:ewvi7nxzyoun6zhxrhs64oiz

{
  "@context": [  // note there is no atproto context
    "https://www.w3.org/ns/did/v1",
    "https://w3id.org/security/multikey/v1",
    "https://w3id.org/security/suites/secp256k1-2019/v1"
  ],
  "alsoKnownAs": [
    "at://atproto.com"
  ],
  "id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz",
  "service": [
    {
      "id": "#atproto_pds",
      "serviceEndpoint": "https://enoki.us-east.host.bsky.network",
      "type": "AtprotoPersonalDataServer"  // this is a string literal, not an IRI
    }
  ],
  "verificationMethod": [
    {
      "controller": "did:plc:ewvi7nxzyoun6zhxrhs64oiz",
      "id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto",
      "publicKeyMultibase": "zQ3shunBKsXixLxKtC5qeSG9E4J5RkGN57im31pcTzbNQnm5w",
      "type": "Multikey"
    }
  ]
}

...but it doesn't seem to cause any errors?

I would expect this to be disallowed by the JSON-LD spec, but the JSON-LD Playground seems to be okay with it?

The following non-IRI value for @type shows up in expanded form:

        "@type": [
          "AtprotoPersonalDataServer"
        ]

When compacting against an empty context, it also seemingly works:

{  // note there is no context
  "@id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz",
  "https://w3id.org/security#verificationMethod": {
    "@id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto",
    "@type": "https://w3id.org/security#Multikey",
    "https://w3id.org/security#controller": {
      "@id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz"
    },
    "https://w3id.org/security#publicKeyMultibase": {
      "@type": "https://w3id.org/security#multibase",
      "@value": "zQ3shunBKsXixLxKtC5qeSG9E4J5RkGN57im31pcTzbNQnm5w"
    }
  },
  "https://www.w3.org/ns/activitystreams#alsoKnownAs": {
    "@id": "at://atproto.com"
  },
  "https://www.w3.org/ns/did#service": {
    "@id": "#atproto_pds",
    "@type": "AtprotoPersonalDataServer",  // this is a string literal, not an IRI
    "https://www.w3.org/ns/did#serviceEndpoint": {
      "@id": "https://enoki.us-east.host.bsky.network"
    }
  }
}

So, what gives?

Obviously the most correct thing here is to ask the ATProto people to provide a namespace and/or context document for their extension terms, but I'm wondering if the JSON-LD processing algorithms should detect these non-IRI types and possibly give a warning or error or otherwise elide them... or is it OK for these "non-IRI types" to exist?

Note that the RDF conversion to N-Quads (in the playground example above, for instance) will not output the @type of the single item in did:service.

This is partially due to a different problem that I'm not sure whether I should file an issue about or not -- relative IRI references for @id. In short, it seems like #atproto_pds is not automatically picking on the "id": "did:plc:ewvi7nxzyoun6zhxrhs64oiz" of the top-level object.

No matter; we can change the service node's identifier from #atproto_pds to did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds and two additional quads get output:

<did:plc:ewvi7nxzyoun6zhxrhs64oiz> <https://www.w3.org/ns/did#service> <did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds> .
<did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds> <https://www.w3.org/ns/did#serviceEndpoint> <https://enoki.us-east.host.bsky.network> .

But we don't get a quad for rdf:type. So this backs up the notion that the "non-IRI types" are at least invalid RDF.

The question that remains: are they invalid JSON-LD as well? If not, should they be? If yes, what is to be done when processing them?

@davidlehn
Copy link
Contributor

I think you've described the correct behavior of everything. It's perhaps not as intuitive as it should be. The relative IRIs are not errors normally, and they will get through expansion, but when going to n-quads, will be dropped. If you want to see those types the issues on the JSON-LD Playground, go to the options tab and enable "safe" mode. Or use the canonized tab. In this case, it will report relative @id reference safe mode errors for the #atproto_pds id and the AtprotoPersonalDataServer type.

I'm not sure how that id is used, but as you note, it should probably be prefixed with the top level id as did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds. That's how DID documents do things and how the verificationMethod in same data above does it. It does look like that type should be defined in a context somewhere with a full URL.

jsonld.js (used on the playground) added the "safe mode" to be more strict about acceptable JSON-LD. It raises errors in many places where the algorithms would otherwise drop data or suggest warnings. This is important when the data is canonized and digitally signed. In such cases, dropped data (and other issues) cause serious issues. It's not the default on the playground due to the official processing not having that behavior. One day hopefully the playground can make it more clear how to use that feature. "safe mode" itself needs to be written up (probably by me) and the community can refine it so it's more generally available in tooling.

@niklasl
Copy link

niklasl commented Jan 29, 2025

Shouldn't the @type value here be expanded against the base IRI (commonly the document URL), which is used if @vocab is absent, per step 8 of the IRI Expansion algorithm? (This is what the playground does if a custom base URL is specified.)

@w3cbot
Copy link

w3cbot commented Feb 26, 2025

This was discussed during the #json-ld meeting on 26 February 2025.

View the transcript

w3c/json-ld-syntax#446

After discussion, this does not seem to be a bug, suggest closing.


@pchampin
Copy link
Contributor

@trwnh thanks for this detailed description.

As @niklasl points out, there is nothing wrong (in theory) with the 2nd version of the ATProto example (expanded against an empty context). The the value of @type is a valid (relative) IRI reference, which can then be resolved against the base IRI. And if you manually set a base IRI in the JSON-LD playground's options, and convert this example to n-quads (after removing the comments), you see the rdf:type appearing.

In the absence of a base URI (which is the playground's default), those relative IRI references can't be resolved, and are therefore silently dropped when converting to RDF. The fact that it is silent is indeed an issue, and there are plans to fix this in the future. Those plans are already implemented in the playground: in the options, select "Safe: true", and now you get the error you were expecting.

@trwnh
Copy link
Author

trwnh commented Feb 27, 2025

I'm still slightly confused about what the correct behavior should be... This might be a playground bug, but putting a base IRI of "custom URL" into the options tab produces this RDF statement:

<did:plc:ewvi7nxzyoun6zhxrhs64oiz#atproto_pds> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <did:AtprotoPersonalDataServer> .

Note that did:AtprotoPersonalDataServer is incorrect. More discussion in json-ld/json-ld.org#849 (comment) as well. The only way to get it to produce a "correct" statement is to insert a # into the type, which is not what you want to do:

"type": "#AtprotoPersonalDataServer"

I say "correct" because I'm not sure it actually is correct. I don't know whether it should be a # or a / or a /# or something else. Given a base IRI of did:plc:ewvi7nxzyoun6zhxrhs64oiz and a type of AtprotoPersonalDataServer... well, I could see arguments for any of these:

<did:plc:ewvi7nxzyoun6zhxrhs64oizAtprotoPersonalDataServer>
<did:plc:ewvi7nxzyoun6zhxrhs64oiz#AtprotoPersonalDataServer>
<did:plc:ewvi7nxzyoun6zhxrhs64oiz/AtprotoPersonalDataServer>
<did:plc:ewvi7nxzyoun6zhxrhs64oiz/#AtprotoPersonalDataServer>

Barring the first one which is probably wrong, that is. At least per discussion on json-ld/json-ld.org#849 there is a pointer to RFC 3886 saying exactly this:

If the base URI has a defined authority component and an empty path, then return a string consisting of "/" concatenated with the reference's path; otherwise,

Which I interpret as did:plc:ewvi7nxzyoun6zhxrhs64oiz/AtprotoPersonalDataServer being correct, maybe? In any case, it shouldn't be did:AtprotoPersonalDataServer. But I also recognize that this issue might not be the right venue for that -- it started in json-ld/json-ld.org#849 and might end up being filed against jsonld.js if the bug originates there instead.

For this issue, I would say I'd consider it resolved by "safe mode" in part, and in part by fixing the expansion.


To answer my own questions from the original issue (and as a quick sanity check):

is it OK for these "non-IRI types" to exist?

These "non-IRI types" are actually relative references

are they invalid JSON-LD as well? If not, should they be? If yes, what is to be done when processing them?

They are not invalid JSON-LD but they are invalid RDF unless a base IRI is provided for expansion. The "non-IRI type" will attempt to expand against @vocab, then if not present it will attempt to expand against @base, then if not present it will attempt to expand against the base IRI provided to the JSON-LD processor as a parameter.

@gkellogg
Copy link
Member

This issue might be better discussed on Stack Overflow, where more people might be able to help with basic usage; it does not represent a problem with the specifications, themselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants