Skip to content

Implement just-in-time context resolution. #342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 9, 2019

Conversation

dlongley
Copy link
Member

@dlongley dlongley commented Dec 7, 2019

This PR removes the preprocessing step for resolving contexts and replaces its functionality with JIT context resolution. This should fix problems with mutating JSON literal content. This feature also allows document loaders to return an additional tag property that jsonld.js can use to more efficiently cache and reuse already processed contexts.

  • Add optional tag feature processing to returned RemoteDocuments.
    A tag will be understood to mean that the same context document
    needn't be processed twice. A special tag value of static is
    interpreted to mean that a context does not even need to be retrieved
    more than once.
  • Enables greater reuse of already processed contexts and quicker
    discovery (via static tag) of already processed contexts for
    a given context URL (instead of requiring the context content
    itself to be seached for and found in a cache, only its URL
    needs to be found).
  • Addresses Do not process @context in JSON literals #339.

- Add optional `tag` feature processing to returned RemoteDocuments.
  A `tag` will be understood to mean that the same context document
  needn't be processed twice. A special `tag` value of `static` is
  interpreted to mean that a context does not even need to be retrieved
  more than once.
- Enables greater reuse of already processed contexts and quicker
  discovery (via `static` tag) of already processed contexts for
  a given context URL (instead of requiring the context content
  itself to be seached for and found in a cache, only its URL
  needs to be found).
- Addresses #339.
Copy link
Collaborator

@gkellogg gkellogg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, LGTM, aside from some bits I don't quite understand without walking through it with an example.

Presumably, base is handled properly per recent tests.

const resolved = await options.contextResolver.resolve({
context: localCtx,
documentLoader: options.documentLoader,
base: options.base
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the actual document location, or potential value of @base?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's ever @base (could be wrong). It is either a document location (if the API was invoked with a URL) or the base passed via the API options. The tests pass.

if(!resolved) {
// not resolved yet, resolve
resolved = await this._resolveRemoteContext(
{url: ctx, documentLoader, base, cycles});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have the affect of serializing all remote context fetches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_resolveRemoteContext will fetch all remote contexts (recursively) and return an array of resolved contexts. Note that this does not include any scoped contexts, i.e., it doesn't deeply inspect the resolved contexts for those, but it will resolve any relative URLs encountered in scoped contexts to ensure the base URL used is proper. Any scoped contexts will be resolved later, JIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by this you meant "the await" -- then, yes, an array of context URLs will be loaded serially now. A future PR can add more complexity here to parallelize that.

let remoteDoc;

try {
remoteDoc = await documentLoader(url);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we're going to need to pass in API arguments to the documentLoader for options such as extractAllScripts, but this can be handled in a different PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I didn't want to do anything with that in this PR but it will need to be addressed in a subsequent PR.

@dlongley
Copy link
Member Author

dlongley commented Dec 7, 2019

@gkellogg,

Presumably, base is handled properly per recent tests.

The tests pass -- including the URL resolution follows RFC3986 ones.

@@ -90,6 +91,11 @@ const wrapper = function(jsonld) {
/** Registered RDF dataset parsers hashed by content-type. */
const _rdfParsers = {};

// resolved context cache
// TODO: consider basing max on context size rather than number
const RESOLVED_CONTEXT_CACHE_MAX_SIZE = 100;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this adjustable? Should there be some API so it can be adjustable or a custom cache could be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not adjustable in the previous version, so there's no change here. We can make it so in a future PR.

// resolve, cache, and return context
const resolved = await this.resolve(
{context, documentLoader, base, cycles});
this._cacheResolvedContext({key: url, resolved, tag: remoteDoc.tag});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some docs on the tag feature. Is 'static' special?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static is a special value, yes. We can document in a separate PR.

@davidlehn davidlehn merged commit 185ff74 into master Dec 9, 2019
@davidlehn davidlehn deleted the jit-context-processing branch December 9, 2019 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants