Description
We should use consistent permalinks in URIs across our RDF to identify a workflow or a workflow file.
Currently (v1.1) we have:
- SPARQL uses Graph URIs like
http://sparql:3030/cwlviewer/github.com/genome/cancer-genomics-workflow/blob/be7e682c6a2d0b24b949e022aeae7786bd8434ed/strelka/workflow.cwl
that exposes the origin of the git repository, its commit and file path -
- Statements within such graphs contains URIs like
file:///data/git/1a2b5d62cde8555e5932907b28189585a2bf99d2/fp_filter/workflow.cwl
that exposes the working directory for the git clone.
- Statements within such graphs contains URIs like
- The research object's
.ro/annotations/workflow.ttl
annotation contain URIs likehttps://github.com/raw/common-workflow-language/workflows/master/workflows/make-to-cwl/dna.cwl#main
I propose we replace all of those (possibly with search-replace on the cwltool --printrdf
output) to use a single location-free URI like: https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl
Permalink URI scheme
The new URI scheme is composed like this:
https://w3id.org/cwl/view/{scm}/{commit}/{path}#{anchor}
https://w3id.org/cwl/view/
fixed prefix at permalink service https://w3id.org/ (/cwl
is our namespace){scm}
- source code management protocol, currently onlygit
supported{commit}
- full git commit sha1 id (no branches or short commits allowed){path}
- relative path to.cwl
file within a checkout of that git commit#{anchor}
- an optional anchor, e.g.#main
as-is fromcwltool --print-rdf
; not passed on to server
Anyone can construct a URI according to the above scheme for a given git commit and file - even if the commit only exists on a local disk or in a private git repository that the CWL Viewer does not know about.
These make good Linked Data identifiers for specific CWL workflow definitions because:
- The
cwl
file and its neighbors can't change within the git commit - The URI is the same wherever the git repository is pushed or hosted
Anyone generating the URIs should be aware of some edge cases:
- An uncommitted file change
- CWL file is within a git submodule which could be a movable branch (without any commits appearing on master git repository)
- CWL file is not tracked in git repository (e.g.
../../outside.cwl
)
Resolving
Resolving any URI starting with https://w3id.org/cwl/view/git/{rest}
will HTTP 302 redirect to the corresponding resource https://view.commonwl.org/git/{rest}
representing that path in that commit
GET https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
HTTP/1.1 302 Found
Location: https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl
Unknown commit?
If the public CWL viewer have never heard about the commit 933bf2a1a1cce32d88f88f136275535da9df0954
there is not much more to say:
HEAD https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
HTTP/1.1 404 Not Found
Unknown git commit `933bf2a1a1cce32d88f88f136275535da9df0954`
Content-negotiation
But if it is known, CWL Viewer finds a matching graph for that file in that commit, then the client can content-negotiate to get various RDF serializations like text/turtle
or application/ld+json
:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: text/turtle
HTTP/1.1 200 OK
Vary: Accept
Content-Type: text/turtle
@prefix cwl: <https://w3id.org/cwl/cwl#>.
<https://w3id.org/cwl/view/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl#main> a cwl:Workflow .
# ....
Notice how the returned RDF uses the location-independent w3id.org
namespace, not view.commonwl.org
YAML
If the client asks for the CWL file with type application/x-yaml
or application/octet-stream
, and the git repository has a public "raw" option, then the server can redirect to that:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/x-yaml
HTTP/1.1 302 Found
Vary: Accept
Location: https://cdn.rawgit.com/common-workflow-language/workflows/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl
GET https://cdn.rawgit.com/common-workflow-language/workflows/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/x-yaml
HTTP/1.1 200 OK
Content-Type: application/octet-stream
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow
inputs:
...
HTML and JSON API
If the user asks for text/html
, it is probably a browser. So CWL Viewer will redirect to the normal workflow rendering:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: text/html
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/workflows/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl
This works also for application/json
which then gives the JSON api output:
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/json
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/workflows/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl
GET https://view.commonwl.org/workflows/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/json
HTTP/1.1 200 OK
Vary: Accept
Content-Type: application/json
{
"retrievedFrom": {
"owner": "common-workflow-language",
"repoName": "workflows",
"branch": "master",
"path": "workflows/lobSTR/lobSTR-workflow.cwl",
"url": "https://github.com/common-workflow-language/workflows/tree/master/workflows/lobSTR/lobSTR-workflow.cwl"
},
"retrievedOn": 1499175275743,
"lastCommit": "920c6be45f08e979e715a0018f22c532b024074f",
"label": "lobSTR-workflow.cwl",
...
}
Images
OK, let's be cool and do images as well.
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: image/svg+xml
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/graph/svg/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl
Research Object Bundle
..and of course our Research Object Bundle if client asks for application/ro+zip
or application/zip
GET https://view.commonwl.org/git/933bf2a1a1cce32d88f88f136275535da9df0954/workflows/lobSTR/lobSTR-workflow.cwl HTTP/1.1
Accept: application/ro+zip
HTTP/1.1 302 Found
Vary: Accept
Location: https://view.commonwl.org/robundle/github.com/common-workflow-language/workflows/blob/lobstr-v1/workflows/lobSTR/lobSTR-workflow.cwl
Packed workflows
If there's a packed CWL file with nested workflows, then a workflow is not matchable by it's filename alone, as you need to know also the #{anchor}
. This is not a problem for the RDF output, as it will contain all workflows found in the packed CWL file, and you just match by #anchor
.
However it can be a problem for the HTTP and JSON rendering, which with #103 would have alternative URIs depending on the selected nested workflow. So it could be confusing to redirect to the top-level workflow (if that can even be determined) as the user won't find their `#nested1/step/nestedstep2# in there; we don't expand nested workflows in the UI.
So if the user asks for text/html
or application/json
for a packed workflow (multipe workflows found), then we'll give an error, with links to the candidates using #103 escaped URIs.
GET https://view.commonwl.org/git/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl HTTP/1.1
Accept: text/html
HTTP/1.1 300 Multiple Choices
Vary: Accept
Content-Type: text/uri-list
https://view.commonwl.org/workflows/example.com/blob/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl%23main
https://view.commonwl.org/workflows/example.com/blob/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl%23nested1
https://view.commonwl.org/workflows/example.com/blob/adc83b19e793491b1c6ea0fd8b46cd9f32e592fc/packed.cwl%23nested2