Skip to content

Reconsider Processing Levels #213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BigBlueHat opened this issue Aug 15, 2019 · 7 comments
Closed

Reconsider Processing Levels #213

BigBlueHat opened this issue Aug 15, 2019 · 7 comments

Comments

@BigBlueHat
Copy link
Member

Extracting JSON-LD from HTML (in the API spec) is the one distinct feature between the two defined Processor Levels: "full" vs. "pure JSON."

Given the solution Link header solution proposed for #204 and the way it solves for #172, we should reconsider whether extracting JSON-LD from HTML is the role of a JSON-LD processor or rather the role of something run prior to JSON-LD processing (i.e. a data block extractor).

The steps for embedding HTML and referencing specific JSON-LD data blocks could remain normative--in as much as they are expressions of HTML data blocks and id-based fragment identifiers.

@gkellogg
Copy link
Member

Encapsulating the behavior in the Document Loader seems like a clean separation of concerns to me. Rather than running prior to a JSON-LD processor, this effectively allows a processor to delegate the details to an implementation of the Document Loader which can handle this.

@azaroth42
Copy link
Contributor

Is the proposal to normatively require that functionality of any conformant document loader, or to retain the processing levels just to move how the functionality is implemented?

@azaroth42
Copy link
Contributor

I support:

  • A cleaner separation of the HTML normative requirements in the current documentation, as precursor work to be able to later extract it into another spec, when we have a second implemented container format. This is "simply" editorial work, albeit quite a lot of shuffling of content around.
  • Moving the HTML extraction requirements to be explicitly part of the document loader specs, and to call them out as features of a particular class of document loader. In doing that we keep the separation, and we enable implementations to signal their support for the functionality or not without an in-your-face conformance requirement at the beginning of the spec.
  • With the above two bullets, the removal of the processor classes from the spec.
  • The people making the call for change doing a good chunk of the work ;)

I would object to:

  • Normatively requiring all document loader implementations to support extraction from HTML (as it would mean we would never get to TR, as we would never have multiple independent implementations)
  • Not specifying extraction at all (as it provides no guidance to implementers as to what to do, nor to publishers as to what can be expected to work ... the point of interoperability and standards, after all)
  • Extracting the details to a new spec (as we would not meet our chartered deadlines, given process and just the time to go back and check that the extraction was clean)

@iherman
Copy link
Member

iherman commented Aug 16, 2019

This issue was discussed in a meeting.

  • No actions or resolutions
View the transcript Benjamin Young: See Syntax issue #213
Benjamin Young: This issue discusses multiple processing levels.
… If the link header approach solves the use case for linking to a JSON-LD context in HTML, then we probably don’t need a full processor level for HTML processing.
Ivan Herman: If I remember well (we’re talking about schema.org), JSON-LD in HTML this came up with a discussion that website producers had difficulties producing microdata/rdfa when they had different data in databases, and wanted an easier way to dump data from their databases in JSON.
… If we don’t say anything about JSON-LD in HTML, then how can we trust any JSON-LD processor out there that this JSON-LD in HTML will be used?
… The only way at the moment is to put it in a different file.
… Also, the reason why they did it back then is because that’s the easiest way to produce JSON-LD data.
Rob Sanderson: I agree. We have normatively defined how JSON-LD is expressed in HTML. So we’ve opened the door to this. This requires all processors to handle HTML. These levels allow only some processors to not handle HTML.
… I’m fine with putting it in the docloader spec.
… Main question is: can we have a conforming processor that can not handle HTML?
Benjamin Young: These data blocks have been used for a long time before JSON-LD. Anything can be placed in there. This is part of the HTML spec.
… (part around data blocks in HTML5 spec)
… A piece of software that exist now that extracts data from data blocks, and forwards it to any processor can also be used to handle extracting JSON-LD from HTML.
Benjamin Young: See HTML data blocks
Dave Longley: I feel like talking about HTML in a JSON-LD processor is conflating formats. We need a cleaner separation of concerns.
… HTML is not the only format in which JSON-LD can be included.
Benjamin Young: +1
Dave Longley: It is a mistake to define these different levels. Instead, we should see it as plugging in a JSON-LD processor in another piece of software.
Gregg Kellogg: There is not standard way to handle these data blocks. What happens when you have a doc with multiple JSON-LD docs? And combined with RDFa?
Rob Sanderson: +1 to gkellogg
Gregg Kellogg: We have to define these things normatively.
… JSON-LD is not just about getting LD from JSON, it is more. We also tackle issues regarding link headers etc.
Ivan Herman: +1 to gkellogg
Dave Longley: “JSON-LD in HTML” almost feels like a separate spec to me … and we have a mechanism to hook this up to a JSON-LD processor – a document loader; we could define other extensions this way … it provides a clear pattern.
Benjamin Young: supports clarifying multiple blocks, data sets, etc.
Benjamin Young: +1 to dlongley
Ivan Herman: +1 to gkellogg
Dave Longley: +1 that we should specify how to do it – but we need a cleaner architecture and repeatable extension pattern
Rob Sanderson: If we have normative recs about JSON-LD in HTML, and have two processor classes, then we need different processor levels.
Dave Longley: -1 to making this about processor classes
Rob Sanderson: dlongley - the issue is explicitly about processor classes?
Dave Longley: azaroth: yes, it is … and i think people are confused about my position/benjamin’s … we aren’t arguing against defininig how to do JSON-LD in HTML
Dave Longley: azaroth: it’s about processor classes and the architecture.
Benjamin Young: The point isn’t that we shouldn’t normatively describe this. I would like a spec on graphs in HTML.
… We should however not put the extraction concern of extracting JSON-LD from HTML in this spec.
Ruben Taelman: .. There should be a separate thing in front of this.
Dave Longley: +1 to benjamin, this does not scale and is a bad architecture choice.
Ivan Herman: If I am a user, I put data into HTML as JSON-LD. How do I make sure that it is understood?
… Do I have to write a separate processor? I have to know how, otherwise it is useless.
… For example, RDFa processors do a basic entailment, based on flags. Processors can only do things based on these flags. If I know that someone’s processor supports flags, then I can use a specific processor based on these capabilities. We need something like this, otherwise this is useless to users.
Dave Longley: Processor classes won’t solve ivan’s problem.
… If you want to process JSON-LD from HTML, you have to look at that separate spec. Different document loaders can support these things. This is a better architecture, regarding separation of concerns.
Benjamin Young: +1 to dlongley’s summation
Gregg Kellogg: Concern people have is that to be a full processors, you have to process HTML. Defining these as capabilities may be better.
… I would support extracting HTML bits from our current spec to something else.
Dave Longley: we don’t have to split this up now
Dave Longley: let’s make sure we CAN split it up later.
Dave Longley: we can have the right architecture now and split later.
Rob Sanderson: I agree with dlongley and gkellogg.
Benjamin Young: +1
Rob Sanderson: It would be cleaner in a different spec. But we don’t have time to split into a different spec. Maybe something for JSON-LD 2.0.
Dave Longley: we’re not talking about a ton of changes.
Ivan Herman: If we do this, we won’t adhere to our timetable.
Dave Longley: no, no no… not saying that.
Dave Longley: ok, next call :)
Benjamin Young: Let’s take this to next call, as we will need it.

@ajs6f
Copy link
Member

ajs6f commented Aug 19, 2019

enable implementations to signal their support for the functionality or not

Would this be by some means in the API, some characterization?

@gkellogg
Copy link
Member

Version announcement would seem to require some new API interface intended to return information about the service, including whether it supports JSON-LD embedded in HTML. Frankly, I can't imagine that anyone would rely on this rather than just reading the documentation of the appropriate implementation. We could consider some specific error mode to raise when HTML support would be required, but cannot be handled by the implementation, as an after-the-fact announcement mechanism.

The intention of removing this was to make it easier for implementations to deal with HTML before the fact, even though the spec describes it as a function of the document loader. In the case of something like jsonld.js, I would think that it can be used for extracting embedded JSON-LD if put together with an HTML parsing package, which might be suitable in some scenarios, without burdening the implementation to carry this code when used in an embedded environment, where sources are always using pure-JSON.

gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 19, 2019
…ntLoader algorithm to contain HTML-related processing. Also, removes processor levels describing this as a processor supporting "HTML script extraction".

For w3c/json-ld-syntax#213.
gkellogg added a commit that referenced this issue Aug 19, 2019
…ipt extraction" feature that processors may implement. Also, removes processor levels describing this as a processor supporting "HTML script extraction".

For #213.
@iherman
Copy link
Member

iherman commented Aug 23, 2019

This issue was discussed in a meeting.

  • RESOLVED: gkellogg to merge #135 and #214 after reviewers have approved and close the relevant issues
View the transcript ncapsulate HTML processing
Rob Sanderson: See Syntax #214
Rob Sanderson: See API #135
Rob Sanderson: Discussion from last week has resulted in some PRs.
Ivan Herman: Gregg not here this week.
Pierre-Antoine Champin: dlongley: PRs are moving in a direction I would agree with
Rob Sanderson: I would agree as well, pushing things into the document loader as discussed last week.
… I guess the issue to discuss is – is there anyone who is not comfortable yet otherwise we should accept those PRs.
Pierre-Antoine Champin: scribeassist: pchampin
Rob Sanderson: Any objections to the approach?
Ivan Herman: I read through the documents and we have done the work.
Pierre-Antoine Champin: dlongley: I would like to wait for other reviews before minerging (including mine)
Proposed resolution: gkellogg to merge #135 and #214 after reviewers have approved (Rob Sanderson)
Proposed resolution: gkellogg to merge #135 and #214 after reviewers have approved and close the relevant issues (Rob Sanderson)
Dave Longley: +1
Ruben Taelman: +1
Rob Sanderson: +1
Benjamin Young: +1
Pierre-Antoine Champin: +1
Ivan Herman: +1
David I. Lehn: +1
Resolution #2: gkellogg to merge #135 and #214 after reviewers have approved and close the relevant issues

gkellogg added a commit that referenced this issue Aug 26, 2019
…ipt extraction" feature that processors may implement. Also, removes processor levels describing this as a processor supporting "HTML script extraction".

For #213.
gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 26, 2019
…ntLoader algorithm to contain HTML-related processing. Also, removes processor levels describing this as a processor supporting "HTML script extraction".

For w3c/json-ld-syntax#213.
gkellogg added a commit to w3c/json-ld-api that referenced this issue Aug 26, 2019
…ntLoader algorithm to contain HTML-related processing. Also, removes processor levels describing this as a processor supporting "HTML script extraction".

For w3c/json-ld-syntax#213.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants