Skip to content

write up how to reference a local script #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dshepelev15 opened this issue Aug 8, 2019 · 14 comments · Fixed by #299
Closed

write up how to reference a local script #158

dshepelev15 opened this issue Aug 8, 2019 · 14 comments · Fixed by #299

Comments

@dshepelev15
Copy link

Can someone explain to me how to use InitialWorkDir requirements without any input parameters.
For instance, I need to put my shell scripts near with cwl script and run them from that cwl script. Is there some way don't use absolute path to scripts? I tried to use EnvVariable like that:

baseCommand: sh
arguments: [$SCRIPT_FOLDER/script.sh]

But it is not working.

Then I found to use InitialWorkDir requirement. I wrote that via that I can put my scripts inside the cwl script and use relative path to shell scripts, but i didn't find any examples how use that without input_parameters.

Anybody ideas?

@mr-c
Copy link
Member

mr-c commented Aug 8, 2019

Hello @dshepelev15

I recommend one of the following:

  • Add the folder your scripts are in to the PATH environment variable. Then there is no need for InitialWorkDirRequirement and you can refer to the script name directly: baseCommand: scriptname.sh. Later you can put your script in a Docker format software container when you want to share your work.

  • Adding an input of type: File to be the script itself:

class: CommandLineTool

inputs:
  my_script:
     type: File
     inputBinding:
        position: 0


  # other inputs go here

baseCommand: sh

outputs: []

@mr-c mr-c changed the title InitialWorkDir syntax without input parameters write up how to reference a local script Aug 8, 2019
@dshepelev15
Copy link
Author

dshepelev15 commented Aug 8, 2019

No, I can't use scripts because I run programmatically from some folder and that cwl script don't find related scripts.
script.sh not found - for instance
May be is there some example of using InitialWorkDirRequirement or some examples of using env variables (via --preserve-environment) for that? Or are there other ways?

@mr-c
Copy link
Member

mr-c commented Aug 8, 2019

In CWL we have to be explicit about everything, so either the script needs to be

  1. available on the system PATH (for cwltool doing local execution, no --preserve-environment needed)

or

  1. part of the Docker format software container (either a fixed path or again on the system PATH defined in the container)

or

  1. it needs to be an input to the CommandLineTool description itself (as shown in my previous comment)

InitialWorkDirRequirement could be used to arrange the input data alongside one or more scripts (themselves also inputs) if they expect to be in the same folder; is that what you're asking for?

@dshepelev15
Copy link
Author

  1. I can't set system PATH variable before running because I run parallel multiple cwl scripts and every script has a special directory for shell scripts.
  2. I don't use docker for scripts now, but for future, what do you think I need to use for putting my shell scripts near cwl?
  3. I don't need to use input parameters for scripts.

I'm asking for the way for setting up execution shell script without absolute path by them in cwl script

@mr-c
Copy link
Member

mr-c commented Aug 9, 2019

The PATH environment variable doesn't need to be set system wide. Just append the extra directory path to it prior to calling cwltool. If you can link to the code where you are running CWL I can provide more advice.

@dshepelev15
Copy link
Author

dshepelev15 commented Aug 10, 2019

For instance, I have the following scripts:

runner.cwl

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: sh
arguments: [line_counter.sh]
stdout: result.txt
inputs:
  message:
    type: string
    inputBinding:
      position: 1
  input_file:
    type: string
    label: file
    inputBinding:
      position: 2
outputs:
  result:
    type: stdout

line_counter.sh

echo $1: `cat $2 | wc -l`

json file for input parameters:
inputs.json

{
    "message": "Something",
    "input_file": "path_to_file_is_here"
}

And also I have a python script which executes that:
execution_script.py

from cwltool.factory import Factory, RuntimeContext
from cwltool.main import main

def run_script():
    cwl_script_path = 'your_folder/runner.cwl'
    inputs_path = 'your_folder/inputs.json'

    res = main([
            cwl_script_path,
            inputs_path
        ]
    )

if __name__ == '__main__':
    run_script()

And then I get the following logs after running python3 execution_script.py:

execution_script.py 1.0.20190228155703
Resolved '/my_path/run_count_lines.cwl' to 'file:///my_path/runner.cwl'
[job runner.cwl] /private/tmp/docker_tmpr_sazwyg$ sh \
    line_counter.sh \
    My_message \
    my_path/test.txt > /private/tmp/docker_tmpr_sazwyg/result.txt
sh: line_counter.sh: No such file or directory
Could not collect memory usage, job ended before monitoring began.
[job runner.cwl] completed permanentFail
{
    "result": {
        "location": "file:///my_path/result.txt",
        "basename": "result.txt",
        "class": "File",
        "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
        "size": 0,
        "path": "my_path/result.txt"
    }
}
Final process status is permanentFail

@mr-c
Copy link
Member

mr-c commented Aug 11, 2019

@dshepelev15 A couple comments

Paths to input files must always be of type: File

so here is a revised inputs.json

{
    "message": "Something",
    "input_file": {
        "class": "File",
        "path": "README.md",
}

We'll also need to add #!/bin/bash to the top of the script, so here is the updated line_counter.sh:

#!/bin/bash
echo $1: `cat $2 | wc -l`

Since we will be dynamically adding the directory where our CWL description is located to the PATH we should move the script name into the baseCommand; here's the updated runner.cwl:

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: line_counter.sh
stdout: result.txt
inputs:
  message:
    type: string
    inputBinding:
      position: 1
  input_file:
    type: File
    label: file
    inputBinding:
      position: 2
outputs:
  result:
    type: stdout

Finally, here is the updated execution_script.py:

import os

from cwltool.main import main

def run_script():
    cwl_script_path = '/home/michael/cwltool/runner.cwl'
    inputs_path = '/home/michael/cwltool/inputs.json'

    original_environ = os.environ.copy()

    os.environ["PATH"] = "{}:{}".format(
        os.path.dirname(cwl_script_path),
        os.environ["PATH"])

    res = main([
            cwl_script_path,
            inputs_path
        ]
    )

    os.environ = original_environ  # not needed here, but better to be safe

if __name__ == '__main__':
    run_script()

Here is what I get when I run this:

$ chmod a+x line_counter.sh
$ python execution_script.py 
INFO execution_script.py 1.0.20190808141559
INFO Resolved '/home/michael/cwltool/runner.cwl' to 'file:///home/michael/cwltool/runner.cwl'
INFO [job runner.cwl] /tmp/2tfzlx_n$ line_counter.sh \
    Something \
    /tmp/tmpesjv4vno/stg231b1e17-2004-4191-bd65-4f6e8cd8e571/README.rst > /tmp/2tfzlx_n/result.txt
INFO [job runner.cwl] completed success
{
    "result": {
        "location": "file:///home/michael/cwltool/result.txt",
        "basename": "result.txt",
        "class": "File",
        "checksum": "sha1$ba78fc87000f5fc33b412762514d64cc81c56a5a",
        "size": 15,
        "path": "/home/michael/cwltool/result.txt"
    }
}
INFO Final process status is success
$ cat result.txt 
Something: 725

@dshepelev15
Copy link
Author

Okay, thank you so much, it works!
But what do you think about settings PATH variable like global. Am I right?

For instance, I have multiple processes and they change my PATH variable and they execute 2 cwl scripts with same bash scripts" name PATH variable will have their 2 directories. And so, the first cwl script may execute second bash script and vice versa. Is it right? If so, how can I solve that problem?

@dshepelev15
Copy link
Author

I found an explanation about that -
https://stackoverflow.com/questions/24642811/set-env-var-in-python-multiprocessing-process. It's okay for multiple processes.

@dshepelev15
Copy link
Author

dshepelev15 commented Aug 11, 2019

Can you explain to me what do I need to write in input json file for input with array type? For instance, an array of files.

@mr-c
Copy link
Member

mr-c commented Aug 12, 2019

@dshepelev15 Glad to hear that it worked!

For your last question, does https://www.commonwl.org/user_guide/09-array-inputs/index.html help?

@raginigupta6
Copy link

can I reference a python file the same way in tool description and specify this in bashcommand: ["python","filename.py"]

@raginigupta6
Copy link

Can this be done to run a python script:

!/usr/bin/env cwl-runner

cwlVersion: v1.0
class: CommandLineTool
baseCommand: pythonexample.py
inputs:
pythonscript:
type: File
inputBinding:
position: 0
outputs: []

@raginigupta6
Copy link

for instance, I am defining a CWL tool to refer to a python script to add two numbers given as inputs as follows:

cwlVersion: v1.0
class: CommandLineTool
baseCommand: ["python", "-m", "add_step1"]

inputs:
x:
type: int
inputBinding:
position: 1
y:
type: int
inputBinding:
position: 2

stdout: cwl.output.json

outputs:
answer:
type: int

and my python script is as follows:
import click
import json

@click.command()
@click.argument('x', type=int)
@click.argument('y', type=int)
def add(x, y):
click.echo(json.dumps({'answer': x+y}))

if name == 'main':
add()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants