-
-
Notifications
You must be signed in to change notification settings - Fork 29
Consistent RDF permalinks with content negotiation #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Q: Should |
It is possible to add the same version of a workflow twice or more to CWLViewer, by having multiple branch names referencing the same commit ID or by adding a version directly via the commit ID. In the case of the standard
|
I think it could make sense to always redirect to the full commitId version - I intended the One thing that could happen laster is that the master Now someone else comes back again at https://w3id.org/cwl/v/git/deadbeef/wf.cwl to find out about that If we present the permalink in the UI as well as in the RDF - according to Identifiers 21st century lesson 8 - then people might have put that link into their papers and so on, so we should keep supporting it. As we have to do that anyway, why not always go to the commit? |
One challenge is that "Explore" will be more polluted by older commits if we "never forget". It might make sense to optimize this somehow to prune public listing of commits that are no longer on a branch (if it was important it should have been!). We would still only keep around commit ROs etc that have been previously visited (and hence more noteworthy than an in-between commit) so it shouldn't be too explosive. |
I changed prefix from See wiki page https://github.com/common-workflow-language/cwlviewer/wiki/Permalinks |
We should use consistent permalinks in URIs across our RDF to identify a workflow or a workflow file.
Currently (v1.1) we have:
http://sparql:3030/cwlviewer/github.com/genome/cancer-genomics-workflow/blob/be7e682c6a2d0b24b949e022aeae7786bd8434ed/strelka/workflow.cwl
that exposes the origin of the git repository, its commit and file pathfile:///data/git/1a2b5d62cde8555e5932907b28189585a2bf99d2/fp_filter/workflow.cwl
that exposes the working directory for the git clone..ro/annotations/workflow.ttl
annotation contain URIs likehttps://github.com/raw/common-workflow-language/workflows/master/workflows/make-to-cwl/dna.cwl#main
I propose we replace all of those (possibly with search-replace on the
cwltool --printrdf
output) to use a single location-free URI like:https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl
Permalink URI scheme
The new URI scheme is composed like this:
https://w3id.org/cwl/view/
fixed prefix at permalink service https://w3id.org/ (/cwl
is our namespace){scm}
- source code management protocol, currently onlygit
supported{commit}
- full git commit sha1 id (no branches or short commits allowed){path}
- relative path to.cwl
file within a checkout of that git commit#{anchor}
- an optional anchor, e.g.#main
as-is fromcwltool --print-rdf
; not passed on to serverAnyone can construct a URI according to the above scheme for a given git commit and file - even if the commit only exists on a local disk or in a private git repository that the CWL Viewer does not know about.
These make good Linked Data identifiers for specific CWL workflow definitions because:
cwl
file and its neighbors can't change within the git commitAnyone generating the URIs should be aware of some edge cases:
../../outside.cwl
)Resolving
Resolving any URI starting with
https://w3id.org/cwl/view/git/{rest}
will HTTP 302 redirect to the corresponding resourcehttps://view.commonwl.org/git/{rest}
representing that path in that commitUnknown commit?
If the public CWL viewer have never heard about the commit
933bf2a1a1cce32d88f88f136275535da9df0954
there is not much more to say:Content-negotiation
But if it is known, CWL Viewer finds a matching graph for that file in that commit, then the client can content-negotiate to get various RDF serializations like
text/turtle
orapplication/ld+json
:Notice how the returned RDF uses the location-independent
w3id.org
namespace, notview.commonwl.org
YAML
If the client asks for the CWL file with type
application/x-yaml
orapplication/octet-stream
, and the git repository has a public "raw" option, then the server can redirect to that:HTML and JSON API
If the user asks for
text/html
, it is probably a browser. So CWL Viewer will redirect to the normal workflow rendering:This works also for
application/json
which then gives the JSON api output:Images
OK, let's be cool and do images as well.
Research Object Bundle
..and of course our Research Object Bundle if client asks for
application/ro+zip
orapplication/zip
Packed workflows
If there's a packed CWL file with nested workflows, then a workflow is not matchable by it's filename alone, as you need to know also the
#{anchor}
. This is not a problem for the RDF output, as it will contain all workflows found in the packed CWL file, and you just match by#anchor
.However it can be a problem for the HTTP and JSON rendering, which with #103 would have alternative URIs depending on the selected nested workflow. So it could be confusing to redirect to the top-level workflow (if that can even be determined) as the user won't find their `#nested1/step/nestedstep2# in there; we don't expand nested workflows in the UI.
So if the user asks for
text/html
orapplication/json
for a packed workflow (multipe workflows found), then we'll give an error, with links to the candidates using #103 escaped URIs.The text was updated successfully, but these errors were encountered: