Skip to content

[Windows] Discussing issues in Cwltool windows compatibility #416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 2 tasks
kapilkd13 opened this issue Jun 7, 2017 · 21 comments
Closed
1 of 2 tasks

[Windows] Discussing issues in Cwltool windows compatibility #416

kapilkd13 opened this issue Jun 7, 2017 · 21 comments

Comments

@kapilkd13
Copy link
Contributor

kapilkd13 commented Jun 7, 2017

This issue is created to discuss problems that may occur in implementing windows compatibility for cwltool

  • need new conformance test to replicate @jvdzwaan's workflow with inlineJS issue
  • need documentation: What versions of Windows are supported? Link to docker-{engine,machine} installation instructions. Any known issues. Command(s) to verify installation and setup under MS Windows.
@kapilkd13
Copy link
Contributor Author

kapilkd13 commented Jun 7, 2017

On running cwltool test on windows. I am getting [Errno 13] Permission denied: 'C:\\users\\kapilk~1\\appdata\\local\\temp\\tmpwhycvl in here The reason is tempfile.NamedTemporaryFile() opens a temporary file which we latter access without closing it. This file access using file.name without closing file is allowed in unix but not on windows. Now if we try to close file before accessing it, file gets deleted as this is NamedTemporaryFile normal behaviour. Currently I am passing delete=False as a argument to this function, hence these temp files are not deleted when file is being closed. But then these temp files are not deleted and they remain in our tmp folder. Is there a better way to handle this.

@kapilkd13
Copy link
Contributor Author

kapilkd13 commented Jun 7, 2017

Hi I made some changes to remove symlinking on a local branch https://github.com/common-workflow-language/cwltool/compare/master...kapilkd13:testing-windows?expand=1 for testing, but I am getting https://travis-ci.org/kapilkd13/cwltool/jobs/240462166 these errors. Can anyone help me with these failing test. I can't understand the reason behind these failures.
Oddly all failing test cases use tempfile.mkdtemp to create temp directory whose docs says Creates a temporary directory in the most secure manner possible. There are no race conditions in the directory’s creation. The directory is readable, writable, and searchable only by the creating user ID. I think it is possible that this temp dir is not searchable/writable but then how did it worked earlier with symlinks and now it doesn't.

@mr-c
Copy link
Member

mr-c commented Jun 8, 2017

Hey @kapilkd13
If I was debugging these errors I would use something like https://pypi.python.org/pypi/pdbpp/ to troubleshoot the tests

@kapilkd13
Copy link
Contributor Author

Thanks @mr-c Although some bugs are still remaining, I solved other bugs using pdb++.

mr-c added a commit that referenced this issue Jun 22, 2017
8b6ddb1 Merge pull request #461 from MarkRobbo/rdfs-namespace
eb64c56 Add rdfs namespaces where used
e5f4086 Merge pull request #416 from common-workflow-language/mr-c-patch-1
4cf45da Merge pull request #459 from common-workflow-language/airflow-badge
100635d add badge and promote Airflow upwards
db0c837 Merge pull request #407 from common-workflow-language/fix-support-link
0d88a24 Merge pull request #455 from ohsu-comp-bio/cwl-tes-implementation
8d76233 Merge pull request #453 from common-workflow-language/fix-conflicting-ids
3496956 fix conflicting IDs
15a7d51 Merge pull request #452 from common-workflow-language/hmenager-patch-1
78940a8 Add ELIXIR + BioExcel participating organizations
05fcef8 remove funnel from list since we are no longer maintaining it
09c2d0c Merge pull request #445 from common-workflow-language/toil-backend-update
873ebfa Merge pull request #448 from common-workflow-language/skanwal-patch-1
d52c2b1 Update README.md
77f38c8 Merge pull request #447 from common-workflow-language/avro-spec
615646e Fix broken avro spec link
a863f80 Merge pull request #441 from common-workflow-language/more-contributors
9e84f13 Merge pull request #444 from common-workflow-language/mr-c-patch-2
506508b Merge pull request #432 from michael-kotliar/master
dfbb124 update list of schedulers and platforms for Toil
f7c68c4 add matching SVG
c2c1f1e Added more contributors
50e9d25 Merge pull request #439 from common-workflow-language/link-cwlgen
6362e57 Add link to python-cwlgen.
04c3f8e Add Airflow in Implementations Section of Readme
e150035 Merge pull request #426 from common-workflow-language/fix-wc-test
6acb016 Fix wc test by using wc -l to avoid padding issues on different implementions of 'wc' (!)
d80919e Revert "Fix identifier conflicts (checked by cwltool for the first time) by giving ids"
f2ea521 Merge pull request #418 from mdmiller53/master
2ade169 Update README.md
9930c4a Merge pull request #417 from common-workflow-language/fix-id-conflicts
fcc593a Fix identifier conflicts (checked by cwltool for the first time) by giving ids to embedded tools and workflows.
60b4701 Move blog link out of support section
d038d86 Merge pull request #355 from StarvingMarvin/master
f6b6f74 Merge pull request #409 from common-workflow-language/LourensVeen-uml-sketch
e4b4b99 Add UML sketch
81d5e1e Merge branch 'master' of https://github.com/common-workflow-language/common-workflow-language
074a420 Merge branch 'master' of github.com:StarvingMarvin/common-workflow-language
1f5ccec Use explicit anchor
b0e1212 fix support link
1452e65 Merge pull request #406 from sharmatime/patch-1
b60a48c Update README.md
aa44386 Merge pull request #400 from common-workflow-language/v1.1.0-dev1
e6dc6e4 Use ShellCommandRequirement for stderr tests.
7098e43 turn off shell quoting for stderr tests
8de0b2b sync remaining updates made to v1.0
8ab5eb9 stderr redirect tests use "echo" instead of checking for the usage message of "egrep".
a90334a Merge pull request #399 from common-workflow-language/Consonance
9566984 add platform support list for Consonance
d0fc2df Add missing files.
f0f8b67 Sync v1.1 conformance test suite to changes in v1.0 suite.
50f052e Merge pull request #340 from Jeltje/master
f8a2d76 Add Consonance
776eb3b Merge pull request #397 from common-workflow-language/ohofmann-clarifications
af0b112 participating organization clarifications
c7958cf Merge pull request #395 from common-workflow-language/dir5-allow-symlink
12c12ce Merge pull request #366 from common-workflow-language/support-link-and-collabs
142e34e Make the "dir5.cwl" pass if the implementation uses symlinks.
1d5c5b9 document Toil's slurm support
7063fc0 Merge pull request #386 from common-workflow-language/cwl-ecosystem-update
85976a7 Add latest additions to the CWL ecosystem
f956af5 Merge pull request #385 from common-workflow-language/fix/issue/195
3a5aef3 Additional participating organizations
103c976 Merge pull request #383 from andrewjesaitis/userguide_typo
f95eb57 Use v1.0 style id's in inputs/outputs example
2d7707c Merge pull request #376 from common-workflow-language/open-stand
6396df0 long overdue linking to open-stand.org
2d74be4 even dhough slim is larger than alpine, it reuses debian:8 that is already pulled for other tests, so overall smaller overhead
a4822d8 specifying python2 as baseCommand in order for tests to work with systems where python3 is the default
360e2b9 Merge pull request #371 from common-workflow-language/smoe-userguide-typo-contianer
1a571ba UserGuide.yml trivial typo
3bd20eb Merge pull request #368 from bamuzesc/patch-1
bed0329 removed redundant line 4 "baseCommand: javac"
3003fc4 support instructions link, list of collabs
37c3e0a Merge pull request #360 from zrilak/master
2bc1b13 Tweaks to README
87d982f Merge branch 'master' of github.com:common-workflow-language/common-workflow-language
4f80e57 some concluding remark
047c9ad UserGuide: about nested workflow
d5d124c include empty.yml
3b41eda outputSource within output parameter
70f528a CWL View link
dc7d391 even dhough slim is larger than alpine, it reuses debian:8 that is already pulled for other tests, so overall smaller overhead
40b65c2 Merge remote-tracking branch 'upstream/master'
25975b0 specifying python2 as baseCommand in order for tests to work with systems where python3 is the default
9ce806b empty.yml can't be quite empty..?
37ef31c How to make empty.yml
1104bf7 Fix typos in UserGuide
f5c5f4d Update README.md
dd083bd Clarify Xenon CWL work
97b3698 Update organizations
5473858 Merge pull request #344 from common-workflow-language/import-hint-test
a26c1ae add test for importing hints
fd09b15 fixed typos

git-subtree-dir: cwltool/schemas
git-subtree-split: 8b6ddb1
@kapilkd13
Copy link
Contributor Author

kapilkd13 commented Jul 1, 2017

This is related to the InlinejavascriptRequirement issue on windows. As the windows consider stdin,stdout as file descriptors the current select module do not allow us to do Non Blocking I/O. In order to overcome this issue I created a commit that used threads. here 69fc0fb. Onn initial testing on conformance test this solution looked fine. But thanks to Janneke, she pointed out a workflow on which this workflow failed. So Now the problem was:
My current patch worked and did gave output.

WorkflowException: :1:1: Expression evaluation error:
:1:1: Long-running script killed after 20 seconds: returncode was: -9
:1:1: script was:
:1:1: 01 "use strict";
:1:1: 02 var inputs = {};
:1:1: 03 var self = null;
:1:1: 04 var runtime = {
:1:1: 05     "outdirSize": 1024, 
:1:1: 06     "ram": 1024, 
:1:1: 07     "tmpdirSize": 1024, 
:1:1: 08     "cores": 1, 
:1:1: 09     "tmpdir": "/tmp/tmpgcXGd6", 
:1:1: 10     "outdir": "/tmp/tmpGqSJ21"
:1:1: 11 };
:1:1: 12 (function(){return ((1+1));})()
:1:1: stdout was: 
:1:1: 2
:1:1: stderr was: 

Process finished with exit code 1

notice stdout value 2, it is correct so expression is evaluated correctly. The problem is that as I am using threads instead to do this I/O read. on joining them the thread gets blocked on os.read statement as it expects input. I tried to find a way to read data without blocking/ way to check if data is in stdout and only read if it exists. On windows I can't find a reliable way. isatty didn't help.
There are now two possible solution that i know of.
Sol1: promote treads to daemon. Dont join them and they will die once program executes completely. But it has a catch, As @tetron told me we try to use same jsprocess as to evaluate multiple js expression as process is expensive. So Now if we use daemon threads when evaluating 2nd expression, thread from first expresion are still alive and would capture 2nd process output also. Its solution: use new js process for each expression evaluation.
Sol 2: use process instead of threads. Although there will be minor speed loss we will have better handling over it. So when we think we have received output, we can terminate process. This way we can utilise single js process.
Personally I am inclined to second method. what do you think @mr-c @anton-khodak ,Janneke

Update
Not a good news. I tried using multiprocess on windows but due to pickling I cant pass nodejs object and when I pass nodejs.stdout file descriptor to new process, it get closed. So I tried passing nodejs.stdout.fileno() and tried opening it later in new process using os.fdopen but this gave me Bad-descriptor Error. Note that if I pass sys.stdout.fileno(), it gets opened in new process. I can't seem to find solution to it. I made a SO question here. https://stackoverflow.com/questions/44866552/python-2-7-windowsunable-to-read-one-processnot-host-stdout-from-host-process complete description is here. Any suggestion is appreciated.

@tetron
Copy link
Member

tetron commented Jul 5, 2017

For the short term to get unblocked (hah) I suggest using Popen.communicate() to evaluate a single expression and running nodejs to completion. This will make expression evaluations expensive, however the common case of simple parameter references are already evaluated directly by cwltool.

I checked the Python standard library implementation of subprocess.communicate(). At least on Windows it uses threads to read from stdout and stderr, presumably because non-blocking reads/select are not available.

I found one stack overflow discussion about nonblocking reads. It's complicated:

https://stackoverflow.com/questions/34504970/non-blocking-read-on-os-pipe-on-windows

A longer term solution would be to provide an option that integrates a Javascript engine directly:

https://github.com/sony/v8eval

@kapilkd13
Copy link
Contributor Author

Thanks @tetron . For now I would go with the option to use new nodejs instance for each expression. Later I would work on the option to integrate Js engine directly. Thanks again! 😃

@kapilkd13
Copy link
Contributor Author

kapilkd13 commented Jul 6, 2017

I have fixed InlineJs problem with one process per expression for now. I want to discuss on a new issue that I am facing. @jvdzwaan gave me a CommandLineTool

#!/usr/bin/env cwl-runner
cwlVersion: cwl:v1.0
class: CommandLineTool
baseCommand: frog
arguments:
  - prefix: --outputdir=
    separate: false
    valueFrom: $(runtime.outdir)
hints:
  - class: DockerRequirement
    dockerPull: proycon/lamachine
inputs:
  - id: dir_in
    type: Directory
    inputBinding:
      position: 1
      prefix: --testdir=
      separate: false
  - id: skip
    type: string?
    inputBinding:
      position: 2
      prefix: --skip=
      separate: false
outputs:
  - id: frogout
    type:
      type: array
      items: File
    outputBinding:
glob: "*.out"

Notice outputdir is taking value from runtime.outdir. And later this value is used in linux docker during execution. Now Since we are working on windows, path resolution is as per windows and so runtime.outdir contains path like C:\foo\bar but in docker we need something like /c/foo/bar. When passing outdir or other usual file path to docker I convert them appropriately as I know they are path but incase of outputdir its just a variable and I don't know if its a path or not and should I convert it or not.
One possibility is that I convert all paths(like runtime.outdir, input.path etc) to docker compatible, but since these path are used before and after execution a lot in staging, file copying, path mapping etc and for this, these paths must be converted back to windows compatible. I don't know if this is the best solution or not.
Another idea is to make a list like pathlist and put runtime.outdir and all other variables that contain path to it. On variable resolution if a variable is pointing to any of these the value would be converted appropriately. I haven't tried this. I want to know if this is possible.
Last solution that comes to my mind is somehow user specify that variable he wants to be resolved is a path. Then on windows we can make changes appropriately. Maybe instead of

  - prefix: --outputdir=
    separate: false
valueFrom: $(runtime.outdir)

something like

  - prefix: --outputdir=
    separate: false
valueFrom: $(runtime.outdir)
type:path

But it doesn't sound a good idea to me also 😺

what do you think @mr-c @tetron @jvdzwaan @anton-khodak what direction should i take.

@tetron
Copy link
Member

tetron commented Jul 6, 2017

The values of $(runtime.outdir) and $(runtime.tmpdir) and $(input.somevar.path) must be in the environment of the tool, so when running inside a Docker container they should be /c/foo/bar in your example.

The job of the PathMapper class is to translate the host paths (which would be Windows path, or better file:/// URIs) to the correct paths inside the container.

@kapilkd13
Copy link
Contributor Author

@tetron Alright. I would try modifying these values itself instead of the values that are being passed. I thought maybe we are using runtime.outdir after completion to move output file to another folder. I will try and report the output. Thanks!

@tetron
Copy link
Member

tetron commented Jul 6, 2017

There's an internal outdir variable in cwltool, but the value of$(runtime.outdir) itself is specifically the value from the perspective of the tool (inside the container).

tetron pushed a commit that referenced this issue Jul 23, 2017
@kapilkd13
Copy link
Contributor Author

Since we have achieved windows compatibility. I think we can close this.

@KerstenBreuer
Copy link

I am still getting the same behavior with the latest cwltool and docker for windows on windows 10 pro 1809.

Even a simple sleep test (where nothing is actually written to the filesystem) cause this result:

sleep.cwl:

cwlVersion: v1.0
class: CommandLineTool
hints:
  DockerRequirement:
    dockerPull: ubuntu:latest
  ResourceRequirement:
    coresMin: 1
    ramMin: 5000
    tmpdirMin: 1000
baseCommand: ["sleep"]
inputs:
  sleep_time:
    type: int
    inputBinding:
      position: 1
outputs: []

Workflow:

cwlVersion: v1.0
class: Workflow
inputs:
  sleep_time:
    type: int
steps:
  sleep:
    run: "../tools/sleep.cwl"
    in:
      sleep_time:
        source: sleep_time
    out: []
outputs: []

I am getting following error:

PermissionError: [Errno 13] Permission denied: 'D:\\test\tmpledy674c'

Anyone know how to fix this?

@KerstenBreuer
Copy link

The above error does not occur for older releases of cwltool: 1.0.20180809224403

@mr-c mr-c reopened this Feb 18, 2019
@mr-c
Copy link
Member

mr-c commented Feb 18, 2019

@KerstenBreuer Thank you for your report.

Can you share the output of cwltool -debug [rest of the options]

@KerstenBreuer
Copy link

Sure thing, the complete output is:

C:\Users\kerst\AppData\Local\Programs\Python\Python37-32\Scripts\cwltool 1.0.20181217162649
Resolved '../CWL/workflows/wf_sleep.cwl' to 'file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl'
[workflow ] initialized from file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl
[workflow ] start
[workflow ] {
    "sleep_time": 10
}
[workflow ] starting step sleep
[job step sleep] job input {
    "file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl#sleep/sleep_time": 10
}
[job step sleep] evaluated job input to {
    "file:///D:/cwl_test_lsf/CWL/workflows/wf_sleep.cwl#sleep/sleep_time": 10
}
[step sleep] start
[job sleep] initializing from file:///D:/cwl_test_lsf/CWL/tools/sleep.cwl as part of step sleep
[job sleep] {
    "sleep_time": 10
}
[job sleep] path mappings is {}
[job sleep] command line bindings is [
    {
        "position": [
            -1000000,
            0
        ],
        "datum": "sleep"
    },
    {
        "position": [
            1,
            "sleep_time"
        ],
        "datum": 10
    }
]
[job sleep] initial work dir {}
[job sleep] Skipping Docker software container '--memory' limit despite presence of ResourceRequirement with ramMin and/or ramMax setting. Consider running with --strict-memory-limit for increased portability assurance.
[job sleep] D:\cwl_test_lsf\test_outov4h5ajz$ docker \
    run \
    -i \
    --volume=/D/cwl_test_lsf/test_outov4h5ajz:/vvYZzU:rw \
    --volume=/D/cwl_test_lsf/test_outakop6qwd:/tmp:rw \
    --workdir=/vvYZzU \
    --read-only=true \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/vvYZzU \
    --cidfile=D:\cwl_test_lsf\test_out\tmpip6bvnmm\20190218184117-175991.cid \
    ubuntu:latest \
    sleep \
    10
Exception while running job
Traceback (most recent call last):
  File "c:\users\kerst\appdata\local\programs\python\python37-32\lib\site-packages\cwltool\job.py", line 308, in _execute
    monitor_function=monitor_function
  File "c:\users\kerst\appdata\local\programs\python\python37-32\lib\site-packages\cwltool\job.py", line 769, in _job_popen
    monitor_function(sproc)
  File "c:\users\kerst\appdata\local\programs\python\python37-32\lib\site-packages\cwltool\job.py", line 692, in docker_monitor
    with open(stats_file.name, mode="w") as stats_file_handle:
PermissionError: [Errno 13] Permission denied: 'D:\\cwl_test_lsf\\test_out\\tmp9zb55c9f'
[job sleep] completed permanentFail
[job sleep] {}
[step sleep] produced output {}
[step sleep] completed permanentFail
[workflow ] completed permanentFail
[workflow ] {}
[job sleep] Removing input staging directory D:\cwl_test_lsf\test_outxafmhgjn
[job sleep] Removing temporary directory D:\cwl_test_lsf\test_outakop6qwd
{}
Final process status is permanentFail

Thanks for taking care of this.

@mr-c
Copy link
Member

mr-c commented Feb 19, 2019

    with open(stats_file.name, mode="w") as stats_file_handle:
PermissionError: [Errno 13] Permission denied: 'D:\\cwl_test_lsf\\test_out\\tmp9zb55c9f'

Does this file exist and what permissions does it have?

@KerstenBreuer
Copy link

No the file was not generated.

@ZimmerA
Copy link

ZimmerA commented May 15, 2019

Same Problem here, any solutions so far?

@KerstenBreuer
Copy link

I narrowed it down. The bug appears first in release 1.0.20190228155703. The last working release is 1.0.20181201184214.

@mr-c
Copy link
Member

mr-c commented Jun 24, 2021

We have dropped MS Windows compatibility, see https://github.com/common-workflow-language/cwltool#ms-windows-users for how to run using WSL 2.

@mr-c mr-c closed this as completed Jun 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants