Skip to content

Commit aec5bb5

Browse files
committed
Merge remote-tracking branch 'upstream/master' into unicode_literal_usage
2 parents 9f964a8 + e9c6a38 commit aec5bb5

File tree

166 files changed

+4292
-39
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

166 files changed

+4292
-39
lines changed

README.rst

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,212 @@ The easiest way to use cwltool to run a tool or workflow from Python is to use a
139139

140140
# result["out"] == "foo"
141141

142+
Leveraging SoftwareRequirements (Beta)
143+
--------------------------------------
144+
145+
CWL tools may be decoarated with ``SoftwareRequirement`` hints that cwltool
146+
may in turn use to resolve to packages in various package managers or
147+
dependency management systems such as `Environment Modules
148+
<http://modules.sourceforge.net/>`__.
149+
150+
Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional
151+
dependency, for this reason be sure to use specify the ``deps`` modifier when
152+
installing cwltool. For instance::
153+
154+
$ pip install 'cwltool[deps]'
155+
156+
Installing cwltool in this fashion enables several new command line options.
157+
The most general of these options is ``--beta-dependency-resolvers-configuration``.
158+
This option allows one to specify a dependency resolvers configuration file.
159+
This file may be specified as either XML or YAML and very simply describes various
160+
plugins to enable to "resolve" ``SoftwareRequirement`` dependencies.
161+
162+
To discuss some of these plugins and how to configure them, first consider the
163+
following ``hint`` definition for an example CWL tool.
164+
165+
.. code:: yaml
166+
167+
SoftwareRequirement:
168+
packages:
169+
- package: seqtk
170+
version:
171+
- r93
172+
173+
Now imagine deploying cwltool on a cluster with Software Modules installed
174+
and that a ``seqtk`` module is avaialble at version ``r93``. This means cluster
175+
users likely won't have the ``seqtk`` the binary on their ``PATH`` by default but after
176+
sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is
177+
available on the ``PATH``. A simple dependency resolvers configuration file, called
178+
``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source
179+
the correct module environment before executing the above tool would simply be:
180+
181+
.. code:: yaml
182+
183+
- type: module
184+
185+
The outer list indicates that one plugin is being enabled, the plugin parameters are
186+
defined as a dictionary for this one list item. There is only one required parameter
187+
for the plugin above, this is ``type`` and defines the plugin type. This parameter
188+
is required for all plugins. The available plugins and the parameters
189+
available for each are documented (incompletely) `here
190+
<https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__.
191+
Unfortunately, this documentation is in the context of Galaxy tool ``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.
192+
193+
cwltool is distributed with an example of such seqtk tool and sample corresponding
194+
job. It could executed from the cwltool root using a dependency resolvers
195+
configuration file such as the above one using the command::
196+
197+
cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
198+
tests/seqtk_seq.cwl \
199+
tests/seqtk_seq_job.json
200+
201+
This example demonstrates both that cwltool can leverage
202+
existing software installations and also handle workflows with dependencies
203+
on different versions of the same software and libraries. However the above
204+
example does require an existing module setup so it is impossible to test this example
205+
"out of the box" with cwltool. For a more isolated test that demonstrates all
206+
the same concepts - the resolver plugin type ``galaxy_packages`` can be used.
207+
208+
"Galaxy packages" are a lighter weight alternative to Environment Modules that are
209+
really just defined by a way to lay out directories into packages and versions
210+
to find little scripts that are sourced to modify the environment. They have
211+
been used for years in Galaxy community to adapt Galaxy tools to cluster
212+
environments but require neither knowledge of Galaxy nor any special tools to
213+
setup. These should work just fine for CWL tools.
214+
215+
The cwltool source code repository's test directory is setup with a very simple
216+
directory that defines a set of "Galaxy packages" (but really just defines one
217+
package named ``random-lines``). The directory layout is simply::
218+
219+
tests/test_deps_env/
220+
random-lines/
221+
1.0/
222+
env.sh
223+
224+
If the ``galaxy_packages`` plugin is enabled and pointed at the
225+
``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``
226+
such as the following is encountered.
227+
228+
.. code:: yaml
229+
230+
hints:
231+
SoftwareRequirement:
232+
packages:
233+
- package: 'random-lines'
234+
version:
235+
- '1.0'
236+
237+
Then cwltool will simply find that ``env.sh`` file and source it before executing
238+
the corresponding tool. That ``env.sh`` script is only responsible for modifying
239+
the job's ``PATH`` to add the required binaries.
240+
241+
This is a full example that works since resolving "Galaxy packages" has no
242+
external requirements. Try it out by executing the following command from cwltool's
243+
root directory::
244+
245+
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
246+
tests/random_lines.cwl \
247+
tests/random_lines_job.json
248+
249+
The resolvers configuration file in the above example was simply:
250+
251+
.. code:: yaml
252+
253+
- type: galaxy_packages
254+
base_path: ./tests/test_deps_env
255+
256+
It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not
257+
match the module names for a given cluster. Such requirements can be re-mapped
258+
to specific deployed packages and/or versions using another file specified using
259+
the resolver plugin parameter `mapping_files`. We will
260+
demonstrate this using `galaxy_packages` but the concepts apply equally well
261+
to Environment Modules or Conda packages (described below) for instance.
262+
263+
So consider the resolvers configuration file
264+
(`tests/test_deps_env_resolvers_conf_rewrite.yml`):
265+
266+
.. code:: yaml
267+
268+
- type: galaxy_packages
269+
base_path: ./tests/test_deps_env
270+
mapping_files: ./tests/test_deps_mapping.yml
271+
272+
And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`):
273+
274+
.. code:: yaml
275+
276+
- from:
277+
name: randomLines
278+
version: 1.0.0-rc1
279+
to:
280+
name: random-lines
281+
version: '1.0'
282+
283+
This is saying if cwltool encounters a requirement of ``randomLines`` at version
284+
``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at
285+
version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``
286+
that contains such a source ``SoftwareRequirement``. To try out this example with
287+
mapping, execute the following command from the cwltool root directory::
288+
289+
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
290+
tests/random_lines_mapping.cwl \
291+
tests/random_lines_job.json
292+
293+
The previous examples demonstrated leveraging existing infrastructure to
294+
provide requirements for CWL tools. If instead a real package manager is used
295+
cwltool has the oppertunity to install requirements as needed. While initial
296+
support for Homebrew/Linuxbrew plugins is available, the most developed such
297+
plugin is for the `Conda <https://conda.io/docs/#>`__ package manager. Conda has the nice properties
298+
of allowing multiple versions of a package to be installed simultaneously,
299+
not requiring evalated permissions to install Conda itself or packages using
300+
Conda, and being cross platform. For these reasons, cwltool may run as a normal
301+
user, install its own Conda environment and manage multiple versions of Conda packages
302+
on both Linux and Mac OS X.
303+
304+
The Conda plugin can be endlessly configured, but a sensible set of defaults
305+
that has proven a powerful stack for dependency management within the Galaxy tool
306+
development ecosystem can be enabled by simply passing cwltool the
307+
``--beta-conda-dependencies`` flag.
308+
309+
With this we can use the seqtk example above without Docker and without
310+
any externally managed services - cwltool should install everything it needs
311+
and create an environment for the tool. Try it out with the follwing command::
312+
313+
cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
314+
315+
The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s
316+
that allow disambiguation of package names. If the mapping files described above
317+
allow deployers to adapt tools to their infrastructure, this mechanism allows
318+
tools to adapt their requirements to multiple package managers. To demonstrate
319+
this within the context of the seqtk, we can simply break the package name we
320+
use and then specify a specific Conda package as follows:
321+
322+
.. code:: yaml
323+
324+
hints:
325+
SoftwareRequirement:
326+
packages:
327+
- package: seqtk_seq
328+
version:
329+
- '1.2'
330+
specs:
331+
- https://anaconda.org/bioconda/seqtk
332+
- https://packages.debian.org/sid/seqtk
333+
334+
The example can be executed using the command::
335+
336+
cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
337+
338+
The plugin framework for managing resolution of these software requirements
339+
as maintained as part of `galaxy-lib <https://github.com/galaxyproject/galaxy-lib>`__ - a small, portable subset of the Galaxy
340+
project. More information on configuration and implementation can be found
341+
at the following links:
342+
343+
- `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html>`__
344+
- `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html>`__
345+
- `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb>`__
346+
- `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc>`__
347+
- `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214>`__
142348

143349
Cwltool control flow
144350
--------------------

cwltool/builder.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,14 +49,24 @@ def __init__(self): # type: () -> None
4949
self.pathmapper = None # type: PathMapper
5050
self.stagedir = None # type: Text
5151
self.make_fs_access = None # type: Type[StdFsAccess]
52-
self.build_job_script = None # type: Callable[[List[str]], str]
5352
self.debug = False # type: bool
5453
self.mutation_manager = None # type: MutationManager
5554

5655
# One of "no_listing", "shallow_listing", "deep_listing"
5756
# Will be default "no_listing" for CWL v1.1
5857
self.loadListing = "deep_listing" # type: Union[None, str]
5958

59+
self.find_default_container = None # type: Callable[[], Text]
60+
self.job_script_provider = None # type: Any
61+
62+
def build_job_script(self, commands):
63+
# type: (List[str]) -> Text
64+
build_job_script_method = getattr(self.job_script_provider, "build_job_script", None) # type: Callable[[Builder, List[str]], Text]
65+
if build_job_script_method:
66+
return build_job_script_method(self, commands)
67+
else:
68+
return None
69+
6070
def bind_input(self, schema, datum, lead_pos=None, tail_pos=None):
6171
# type: (Dict[Text, Any], Any, Union[int, List[int]], List[int]) -> List[Dict[Text, Any]]
6272
if tail_pos is None:

cwltool/draft2tool.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,9 +176,19 @@ class CommandLineTool(Process):
176176
def __init__(self, toolpath_object, **kwargs):
177177
# type: (Dict[Text, Any], **Any) -> None
178178
super(CommandLineTool, self).__init__(toolpath_object, **kwargs)
179+
self.find_default_container = kwargs.get("find_default_container", None)
179180

180181
def makeJobRunner(self, use_container=True): # type: (Optional[bool]) -> JobBase
181182
dockerReq, _ = self.get_requirement("DockerRequirement")
183+
if not dockerReq and use_container:
184+
default_container = self.find_default_container(self)
185+
if default_container:
186+
self.requirements.insert(0, {
187+
"class": "DockerRequirement",
188+
"dockerPull": default_container
189+
})
190+
dockerReq = self.requirements[0]
191+
182192
if dockerReq and use_container:
183193
return DockerCommandLineJob()
184194
else:

cwltool/job.py

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535

3636
PYTHON_RUN_SCRIPT = """
3737
import json
38+
import os
3839
import sys
3940
import subprocess
4041
@@ -43,6 +44,7 @@
4344
commands = popen_description["commands"]
4445
cwd = popen_description["cwd"]
4546
env = popen_description["env"]
47+
env["PATH"] = os.environ.get("PATH")
4648
stdin_path = popen_description["stdin_path"]
4749
stdout_path = popen_description["stdout_path"]
4850
stderr_path = popen_description["stderr_path"]
@@ -69,7 +71,7 @@
6971
if sp.stdin:
7072
sp.stdin.close()
7173
rcode = sp.wait()
72-
if isinstance(stdin, file):
74+
if stdin is not subprocess.PIPE:
7375
stdin.close()
7476
if stdout is not sys.stderr:
7577
stdout.close()
@@ -147,7 +149,6 @@ def _setup(self): # type: () -> None
147149
_logger.debug(u"[job %s] initial work dir %s", self.name,
148150
json.dumps({p: self.generatemapper.mapper(p) for p in self.generatemapper.files()}, indent=4))
149151

150-
151152
def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
152153
# type: (List[Text], MutableMapping[Text, Text], bool, Text) -> None
153154

@@ -191,15 +192,19 @@ def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"):
191192
os.makedirs(dn)
192193
stdout_path = absout
193194

194-
build_job_script = self.builder.build_job_script # type: Callable[[List[str]], str]
195+
commands = [Text(x).encode('utf-8') for x in runtime + self.command_line]
196+
job_script_contents = None # type: Text
197+
builder = getattr(self, "builder", None) # type: Builder
198+
if builder is not None:
199+
job_script_contents = builder.build_job_script(commands)
195200
rcode = _job_popen(
196-
[Text(x).encode('utf-8') for x in runtime + self.command_line],
201+
commands,
197202
stdin_path=stdin_path,
198203
stdout_path=stdout_path,
199204
stderr_path=stderr_path,
200205
env=env,
201206
cwd=self.outdir,
202-
build_job_script=build_job_script,
207+
job_script_contents=job_script_contents,
203208
)
204209

205210
if self.successCodes and rcode in self.successCodes:
@@ -330,8 +335,12 @@ def run(self, pull_image=True, rm_container=True,
330335
env = cast(MutableMapping[Text, Text], os.environ)
331336
if docker_req and kwargs.get("use_container") is not False:
332337
img_id = docker.get_from_requirements(docker_req, True, pull_image)
333-
elif kwargs.get("default_container", None) is not None:
334-
img_id = kwargs.get("default_container")
338+
if img_id is None:
339+
find_default_container = self.builder.find_default_container
340+
default_container = find_default_container and find_default_container()
341+
if default_container:
342+
img_id = default_container
343+
env = cast(MutableMapping[Text, Text], os.environ)
335344

336345
if docker_req and img_id is None and kwargs.get("use_container"):
337346
raise Exception("Docker image not available")
@@ -398,14 +407,9 @@ def _job_popen(
398407
env, # type: Union[MutableMapping[Text, Text], MutableMapping[str, str]]
399408
cwd, # type: Text
400409
job_dir=None, # type: Text
401-
build_job_script=None, # type: Callable[[List[str]], str]
410+
job_script_contents=None, # type: Text
402411
):
403412
# type: (...) -> int
404-
405-
job_script_contents = None
406-
if build_job_script:
407-
job_script_contents = build_job_script(commands)
408-
409413
if not job_script_contents and not FORCE_SHELLED_POPEN:
410414

411415
stdin = None # type: Union[IO[Any], int]
@@ -485,8 +489,8 @@ def _job_popen(
485489
["bash", job_script.encode("utf-8")],
486490
shell=False,
487491
cwd=job_dir,
488-
stdout=subprocess.PIPE,
489-
stderr=subprocess.PIPE,
492+
stdout=sys.stderr, # The nested script will output the paths to the correct files if they need
493+
stderr=sys.stderr, # to be captured. Else just write everything to stderr (same as above).
490494
stdin=subprocess.PIPE,
491495
)
492496
if sp.stdin:

0 commit comments

Comments
 (0)