Skip to content

inputs array of secondaryFiles not processed #88

Closed
@jeremiahsavage

Description

@jeremiahsavage

I am attempting to associate a secondaryFile (BAI) with each File is an array (of BAMs).
I get desired behavior with the Jan26 release [0], but the following Jan27 release [1] breaks our cwl.

This the the cwl snippet

cwlVersion: "cwl:draft-2"
requirements:
  - import: node-engine.cwl
  - import: envvar-global.cwl
  - class: DockerRequirement
    dockerPull: quay.io/___
class: CommandLineTool
inputs:
  - id: "#input_bam_path"
    type:
      type: array
      items: File
      inputBinding:
        prefix: --bam_path
        secondaryFiles:
          - engine: node-engine.cwl
            script: |
              {
              return {"path": $self.path.slice(0,-4)+".bai", "class": "File"};
              }

with --debug gives the desired bindings with the Jan26 release [2], but not with the Jan27 release [3].

I ran a diff [4] of the two releases. It looks like secondaryFiles are now in schema instead of binding. Is the post-Jan27 behavior the desired to our cwl?

[0]
https://pypi.python.org/packages/da/93/fd0885312894cda09ad4bcb04c7091ec7b6da15ab10e14f468cdc54caed5/cwltool-1.0.20160126211726.tar.gz

[1]
https://pypi.python.org/packages/8e/b3/c9326f44854d8ca71668070fb09746f4b0c1fe4f5d749de7db3d737eee88/cwltool-1.0.20160127144612.tar.gz

[2]

    {
        "secondaryFiles": [
            "${\nreturn {\"path\": self.path.slice(0,-4)+\".bai\", \"class\": \"File\"};\n}\n"
        ], 
        "prefix": "--bam_path", 
        "do_eval": {
            "path": "/tmp/job633706396_ubuntu/SCRATCH/47b42e81-2500-4ebc-a0c2-acd3187cc2f0_513/test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bam", 
            "class": "File", 
            "secondaryFiles": [
                {
                    "path": "/tmp/job633706396_ubuntu/SCRATCH/47b42e81-2500-4ebc-a0c2-acd3187cc2f0_513/test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bai", 
                    "class": "File"
                }
            ]
        }, 
        "valueFrom": {
            "path": "/tmp/job633706396_ubuntu/SCRATCH/47b42e81-2500-4ebc-a0c2-acd3187cc2f0_513/test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bam", 
            "class": "File", 
            "secondaryFiles": [
                {
                    "path": "/tmp/job633706396_ubuntu/SCRATCH/47b42e81-2500-4ebc-a0c2-acd3187cc2f0_513/test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bai", 
                    "class": "File"
                }
            ]
        }, 
        "position": [
            0, 
            0, 
            "input_bam_path", 
            "input_bam_path"
        ]
    }, 

[3]

    {
        "position": [
            0, 
            0, 
            "input_bam_path", 
            "input_bam_path"
        ], 
        "prefix": "--bam_path", 
        "do_eval": {
            "path": "/tmp/job557517512_test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bam", 
            "class": "File"
        }, 
        "valueFrom": {
            "path": "/tmp/job557517512_test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bam", 
            "class": "File"
        }
    }, 

[4]
https://gist.github.com/jeremiahsavage/c82b38027be30eccbf9b47361c8f7fbb

Activity

tetron

tetron commented on May 28, 2016

@tetron
Member

What happens if you try to run it with the latest cwltool?

You are correct that secondaryFiles moved up one level between draft-2 and draft-3. However, if you are writing draft-2 documents you should be using the draft-2 syntax. Recent versions of cwltool (since 3 weeks ago or so) have tightened up validation for specific spec version. Prior to that cwltool incorrectly accepted invalid documents that blended draft-2 and draft-3 syntax, it is possible that is where you are getting into trouble. Does that help?

jeremiahsavage

jeremiahsavage commented on May 31, 2016

@jeremiahsavage
ContributorAuthor

With latest cwltool 1.0.20160523144113 the secondaryFile is not picked up.

    {
        "position": [
            0, 
            0, 
            "input_bam_path", 
            "input_bam_path"
        ], 
        "prefix": "--bam_path", 
        "do_eval": {
            "path": "/var/lib/cwl/job805995621_ubuntu/SCRATCH/47b42e81-2500-4ebc-a0c2-acd3187cc2f0_513/test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bam", 
            "containerfs": true, 
            "class": "File"
        }, 
        "valueFrom": {
            "path": "/var/lib/cwl/job805995621_ubuntu/SCRATCH/47b42e81-2500-4ebc-a0c2-acd3187cc2f0_513/test/C440.TCGA-IN-8462-01A-11D-2340-08.1.bam", 
            "containerfs": true, 
            "class": "File"
        }
    }, 

I'll try altering the syntax.

jeremiahsavage

jeremiahsavage commented on Jun 1, 2016

@jeremiahsavage
ContributorAuthor

I've created a self-contained test (just used echo) in cwl:draft-3 which also doesn't associate secondaryFiles. This is the debug output:

$ cwltool --strict --debug ~/code/cocleaning-cwl/tools/array_secondary_3.cwl.yaml --bam_path test.bam --bam_path test.bam --bam_path test.bam
/home/jeremiah/.virtualenvs/p2/bin/cwltool 1.0.20160523144113
Parsed job order from command line: {
    "bam_path": [
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }
    ], 
    "id": "/home/jeremiah/code/cocleaning-cwl/tools/array_secondary_3.cwl.yaml", 
    "job_order": null
}
[job array_secondary_3.cwl.yaml] initializing from file:///home/jeremiah/code/cocleaning-cwl/tools/array_secondary_3.cwl.yaml
[job array_secondary_3.cwl.yaml] {
    "bam_path": [
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }
    ], 
    "job_order": null
}
[job array_secondary_3.cwl.yaml] path mappings is {
    "test.bam": [
        "/home/jeremiah/Downloads/test.bam", 
        "/home/jeremiah/Downloads/test.bam"
    ]
}
[job array_secondary_3.cwl.yaml] command line bindings is [
    {
        "shellQuote": false, 
        "position": [
            1, 
            0
        ], 
        "valueFrom": null, 
        "do_eval": "${ var bam_list = \"\"; for (var i = 0; i < inputs.bam_path.length; i ++) { bam_list += \" echo \" + inputs.bam_path[i].path.split('/').slice(-1)[0] + \" >> bam.list &&\" } return bam_list.slice(0,-2) }"
    }
]
[job array_secondary_3.cwl.yaml] /home/jeremiah/Downloads$ /bin/sh \
    -c \
     echo test.bam >> bam.list && echo test.bam >> bam.list && echo test.bam >> bam.list 
[job array_secondary_3.cwl.yaml] completed success
[job array_secondary_3.cwl.yaml] {
    "bam_list": {
        "size": 81, 
        "path": "/home/jeremiah/Downloads/bam.list", 
        "checksum": "sha1$b4231c7a667fc5149b8e9856688fc3713817ad32", 
        "class": "File"
    }
}
Final process status is success
[job array_secondary_3.cwl.yaml] Removing temporary directory /tmp/tmpE5ZoVg
{
    "bam_list": {
        "size": 81, 
        "path": "/home/jeremiah/Downloads/bam.list", 
        "checksum": "sha1$b4231c7a667fc5149b8e9856688fc3713817ad32", 
        "class": "File"
    }
}

and this the cwl

#!/usr/bin/env cwl-runner

cwlVersion: "cwl:draft-3"

requirements:
  - class: ShellCommandRequirement
  - class: InlineJavascriptRequirement

class: CommandLineTool

inputs:
  - id: bam_path
    type:
      type: array
      items: File
    secondaryFiles: |
      ${
      return {"path": $self.path.slice(0,-4)+".bai", "class": "File"};
      }

outputs:
  - id: bam_list
    type: File
    outputBinding:
      glob: "bam.list"
  # - id: bai_list
  #   type: File
  #   outputBinding:
  #     glob: "bai.list"

arguments:
  - valueFrom: ${
        var bam_list = "";
        for (var i = 0; i < inputs.bam_path.length; i ++) {
          bam_list += " echo " + inputs.bam_path[i].path.split('/').slice(-1)[0] + " >> bam.list &&"
        }
        return bam_list.slice(0,-2)
        }
    position: 1
    shellQuote: false

baseCommand: []
Shenglai

Shenglai commented on Jun 2, 2016

@Shenglai

With latest cwltool 1.0.20160523144113 the secondaryFile is not picked up even with cwlVersion: "cwl:draft-2"

the cwl is:

#!/usr/bin/env cwl-runner

cwlVersion: "cwl:draft-2"

requirements:
  - import: node-engine.cwl
  - import: envvar-global.cwl

class: CommandLineTool

inputs:
  - id: "#bam_path"
    type:
      type: array
      items: File
      inputBinding:
        secondaryFiles:
          - engine: node-engine.cwl
            script: |
              {
              return {"path": $self.path.slice(0,-4)+".bai", "class": "File"};
              }

outputs:
  - id: "#afile"
    type: File
    outputBinding:
      glob: "test.txt"

baseCommand: [touch, test.txt]

and the debug output is:

/usr/local/bin/cwl-runner 1.0.20160531173804
Parsed job order from command line: {
    "bam_path": [
        {
            "path": "chr22.normal.bam", 
            "class": "File"
        }, 
        {
            "path": "chr22.tumor.bam", 
            "class": "File"
        }
    ], 
    "id": "test.cwl.yaml", 
    "job_order": null
}
[job test.cwl.yaml] initializing from file:///mnt/benchmark/smallcase/test.cwl.yaml
[job test.cwl.yaml] {
    "bam_path": [
        {
            "path": "chr22.normal.bam", 
            "class": "File"
        }, 
        {
            "path": "chr22.tumor.bam", 
            "class": "File"
        }
    ], 
    "job_order": null
}
[job test.cwl.yaml] path mappings is {
    "chr22.tumor.bam": [
        "/mnt/benchmark/smallcase/chr22.tumor.bam", 
        "/mnt/benchmark/smallcase/chr22.tumor.bam"
    ], 
    "chr22.normal.bam": [
        "/mnt/benchmark/smallcase/chr22.normal.bam", 
        "/mnt/benchmark/smallcase/chr22.normal.bam"
    ]
}
[job test.cwl.yaml] command line bindings is [
    {
        "position": [
            -1000000, 
            0
        ], 
        "valueFrom": "touch"
    }, 
    {
        "position": [
            -1000000, 
            1
        ], 
        "valueFrom": "test.txt"
    }, 
    {
        "position": [
            0, 
            0, 
            "bam_path", 
            "bam_path"
        ], 
        "valueFrom": {
            "path": "/mnt/benchmark/smallcase/chr22.normal.bam", 
            "containerfs": true, 
            "class": "File"
        }, 
        "do_eval": {
            "path": "/mnt/benchmark/smallcase/chr22.normal.bam", 
            "containerfs": true, 
            "class": "File"
        }
    }, 
    {
        "position": [
            1, 
            0, 
            "bam_path", 
            "bam_path"
        ], 
        "valueFrom": {
            "path": "/mnt/benchmark/smallcase/chr22.tumor.bam", 
            "containerfs": true, 
            "class": "File"
        }, 
        "do_eval": {
            "path": "/mnt/benchmark/smallcase/chr22.tumor.bam", 
            "containerfs": true, 
            "class": "File"
        }
    }
]
[job test.cwl.yaml] /mnt/benchmark/smallcase$ touch \
    test.txt \
    /mnt/benchmark/smallcase/chr22.normal.bam \
    /mnt/benchmark/smallcase/chr22.tumor.bam
[job test.cwl.yaml] completed success
[job test.cwl.yaml] {
    "afile": {
        "size": 0, 
        "path": "/mnt/benchmark/smallcase/test.txt", 
        "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", 
        "class": "File"
    }
}
Final process status is success
[job test.cwl.yaml] Removing temporary directory /tmp/tmpQx7xTP
{
    "afile": {
        "size": 0, 
        "path": "/mnt/benchmark/smallcase/test.txt", 
        "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", 
        "class": "File"
    }
}
jeremiahsavage

jeremiahsavage commented on Jun 2, 2016

@jeremiahsavage
ContributorAuthor

Fixed with #91

jeremiahsavage

jeremiahsavage commented on Jun 2, 2016

@jeremiahsavage
ContributorAuthor

Actually, the fix only works if cwl is draft-2, and causes crash if draft-3. So don't merge.

jeremiahsavage

jeremiahsavage commented on Jun 3, 2016

@jeremiahsavage
ContributorAuthor

Now tested successfully with draft-3 using the below cwl and output (also tested with a gatk indelrealigner tool that requires bai for each bam).

cwl:

#!/usr/bin/env cwl-runner

cwlVersion: "cwl:draft-3"

requirements:
  - class: ShellCommandRequirement
  - class: InlineJavascriptRequirement

class: CommandLineTool

inputs:
  - id: bam_path
    type:
      type: array
      items: File
      secondaryFiles: |
        ${
        return {"path": self.path.slice(0,-4)+".bai", "class": "File"};
        }

outputs:
  - id: bam_list
    type: File
    outputBinding:
      glob: "bam.list"

arguments:
  - valueFrom: ${
        var bam_list = "";
        for (var i = 0; i < inputs.bam_path.length; i ++) {
          bam_list += " echo " + inputs.bam_path[i].path.split('/').slice(-1)[0] + " >> bam.list &&"
        }
        return bam_list.slice(0,-2)
        }
    position: 1
    shellQuote: false

baseCommand: []

output

(p2_fix)[jeremiah@localhost Downloads]$ cwltool --strict --debug ~/code/cocleaning-cwl/tools/array_secondary_3.cwl.yaml --bam_path test.bam --bam_path test.bam --bam_path test.bam
/home/jeremiah/.virtualenvs/p2_fix/bin/cwltool 1.0.20160531173804
Parsed job order from command line: {
    "bam_path": [
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }
    ], 
    "id": "/home/jeremiah/code/cocleaning-cwl/tools/array_secondary_3.cwl.yaml", 
    "job_order": null
}
[job array_secondary_3.cwl.yaml] initializing from file:///home/jeremiah/code/cocleaning-cwl/tools/array_secondary_3.cwl.yaml
[job array_secondary_3.cwl.yaml] {
    "bam_path": [
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }, 
        {
            "path": "test.bam", 
            "class": "File"
        }
    ], 
    "job_order": null
}
[job array_secondary_3.cwl.yaml] path mappings is {
    "test.bam": [
        "/home/jeremiah/Downloads/test.bam", 
        "/home/jeremiah/Downloads/test.bam"
    ], 
    "test.bai": [
        "/home/jeremiah/Downloads/test.bai", 
        "/home/jeremiah/Downloads/test.bai"
    ]
}
[job array_secondary_3.cwl.yaml] command line bindings is [
    {
        "shellQuote": false, 
        "position": [
            1, 
            0
        ], 
        "valueFrom": null, 
        "do_eval": "${ var bam_list = \"\"; for (var i = 0; i < inputs.bam_path.length; i ++) { bam_list += \" echo \" + inputs.bam_path[i].path.split('/').slice(-1)[0] + \" >> bam.list &&\" } return bam_list.slice(0,-2) }"
    }
]
[job array_secondary_3.cwl.yaml] /home/jeremiah/Downloads$ /bin/sh \
    -c \
     echo test.bam >> bam.list && echo test.bam >> bam.list && echo test.bam >> bam.list 
[job array_secondary_3.cwl.yaml] completed success
[job array_secondary_3.cwl.yaml] {
    "bam_list": {
        "size": 54, 
        "path": "/home/jeremiah/Downloads/bam.list", 
        "checksum": "sha1$05dd8404a864139f7f5afd0cd560044c28a8f41e", 
        "class": "File"
    }
}
Final process status is success
[job array_secondary_3.cwl.yaml] Removing temporary directory /tmp/tmp4bF0hg
{
    "bam_list": {
        "size": 54, 
        "path": "/home/jeremiah/Downloads/bam.list", 
        "checksum": "sha1$05dd8404a864139f7f5afd0cd560044c28a8f41e", 
        "class": "File"
    }
}
(p2_fix)[jeremiah@localhost Downloads]$ 
mr-c

mr-c commented on Jun 6, 2016

@mr-c
Member

Hello @jeremiahsavage, thank you for your issue and PR.

I'm trying to understand your use case better: As @chapmanb points out in https://groups.google.com/d/msg/common-workflow-language/u9q03lFBHpQ/L9on3M1MAgAJ it would seem that the secondaryFiles with a string value of .bai or ^.bai should meet your needs.

jeremiahsavage

jeremiahsavage commented on Jun 6, 2016

@jeremiahsavage
ContributorAuthor

Hi @mr-c . Yes. I've tried using the .bai and ^.bai methods suggested. But using latest (post January 26) version of cwltool, secondaryFiles were not attached to items of an array. For Jan26, and before the bug does not exist. Perhaps @chapmanb is using a branch prior to Jan26?

I used a lot of print() statements to see that secondaryFiles was not being passed to each item of the array at the point the patch touches.

The fix allows cwl written in draft-2 or draft-3 to pass secondaryFiles through arrays.

For example.
cwl:draft-2 case:
cwl: https://gist.github.com/jeremiahsavage/f49146e32e098697494d74b20ea10526
latest cwltool debug output (no bai): https://gist.github.com/jeremiahsavage/72172b5eeca50a654ee1732bd131317e
with patch debug output (gets bai): https://gist.github.com/jeremiahsavage/a8b923d9e50ed6ba34875f7ccae4d206

cwl:draft-3 case:
cwl (^.bai): https://gist.github.com/jeremiahsavage/760ffdae6a5220d327b997269b5f52ee
latest cwltool debug output (no bai): https://gist.github.com/jeremiahsavage/760ffdae6a5220d327b997269b5f52ee
with patch debug output (gets bai): https://gist.github.com/jeremiahsavage/f9b3e989d5d185015e1d7317e14591e1

chapmanb

chapmanb commented on Jun 7, 2016

@chapmanb
Member

Jeremiah;
I'm using 1.0.20160427142240 from bioconda (https://bioconda.github.io/). I'm not sure about the very latest version, Peter would be most helpful on assessing that. It would be useful to know if you still run into issues with the version in bioconda. You can install with:

/path/to/anaconda/bin/conda -c bioconda cwltool

Sorry to not have a good idea why you're seeing this but hopefully this helps some.

jeremiahsavage

jeremiahsavage commented on Jun 7, 2016

@jeremiahsavage
ContributorAuthor

Hi Brad and Michael,

I've created a true minimal test case to show the issue and the fix. In this test, the process will actually fail instead of just showing the pickup of the secondaryFile in the --debug output I was reporting before.

Three step instructions at:
https://github.com/jeremiahsavage/array_secondary

Docker has to be involved, as when docker is not use, paths are not redirected (to something like /var/lib/cwl/job445880475), so the issue of secondaryFiles not passing, is masked.

I've tested with the latest bioconda (to show issue) and with the fix (in a python virtualenv). The output is in the README.md

mr-c

mr-c commented on Jun 7, 2016

@mr-c
Member

Thank you Jeremiah for your detailed debugging. We are a bit swamped with
getting some other changes done in time for 1.0 of the standard. My work
day is over; but I will take a look tomorrow.
Pe 7 iun. 2016 8:09 p.m., "Jeremiah H. Savage" notifications@github.com a
scris:

Hi Brad and Michael,

I've created a true minimal test case to show the issue and the fix. In
this test, the process will actually fail instead of just showing the
pickup of the secondaryFile in the --debug output I was reporting before.

Three step instructions at:
https://github.com/jeremiahsavage/array_secondary

Docker has to be involved, as when docker is not use, paths are not
redirected (to something like /var/lib/cwl/job445880475), so the issue of
secondaryFiles not passing, is masked.

I've tested with the latest bioconda (to show issue) and with the fix (in
a python virtualenv). The output is in the README.md


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#88 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABROCPBoQDKIfYI2lPMTNno52bwpzY9Iks5qJbPxgaJpZM4Io7Y8
.

jeremiahsavage

jeremiahsavage commented on Jun 23, 2016

@jeremiahsavage
ContributorAuthor

Just a ping. It would be great if this was fixed in 1.0
test: https://github.com/jeremiahsavage/array_secondary
fix: #91

mr-c

mr-c commented on Oct 6, 2016

@mr-c
Member

Fix is now in #170 (but needs some assistance)

kmhernan

kmhernan commented on Oct 6, 2016

@kmhernan

@mr-c I am having this issue with v1.0 and cwltool 1.0.20160913171024

For example:

  known_vcf:
    type:
      type: array
      items: File
      inputBinding:
        prefix: --known
    doc: "Input VCF file(s) with known indels."
    default: null
    secondaryFiles:
      - ".tbi"
    inputBinding:
      position: 3

Will not bind the secondaryFiles and no information about them is available in the --debug output.

8 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @chapmanb@jeremiahsavage@tetron@mr-c@kmhernan

        Issue actions

          inputs array of secondaryFiles not processed · Issue #88 · common-workflow-language/cwltool