-
-
Notifications
You must be signed in to change notification settings - Fork 232
Make data relocation at the end of workflow runs faster #850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #850 +/- ##
==========================================
- Coverage 73.82% 72.71% -1.11%
==========================================
Files 34 34
Lines 6429 6429
Branches 1626 1626
==========================================
- Hits 4746 4675 -71
- Misses 1231 1300 +69
- Partials 452 454 +2
Continue to review full report at Codecov.
|
cwltool/process.py
Outdated
outdir = (runtimeContext.outdir or | ||
tempfile.mkdtemp(prefix=getdefault(runtimeContext.tmp_outdir_prefix, | ||
DEFAULT_TMP_PREFIX))) | ||
outdir = fs_access.realpath(outdir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I wouldn't call fs_access.realpath
on the output of tempfile.mkdtemp
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not mean to change the behaviour here, just the formatting. having said that, I understood that fs_access.realpath
was already being called on the output of tempfile.mkdtemp
Did I misunderstand the statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed that this was a pure reformat. I'll investigate further, elsewhere.
I'm happy with the performance of the code right now, although it is still scanning the destination folder. I've tried to simplify the code so further changes are easier to do, if they are needed. (like saving all symlinks in the outputs and then only changing those, without going through all the folder in the output folder) The only thing I'm not sure about is if I included the I've not replaced the |
cwltool/process.py
Outdated
@@ -287,20 +279,38 @@ def relocateOutputs(outputObj, # type: Union[Dict[Text, Any],List[Di | |||
if action not in ("move", "copy"): | |||
return outputObj | |||
|
|||
def moveIt(src, dst): | |||
def _collectDirEntries(obj): | |||
# type: (Union[Dict[Text, Any], List[Dict[Text, Any]]], List[Dict[Text, Any]]) -> Iterator[Dict[Text, Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error: Type signature has too many arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed! Tried to also fix the type problems from the consolidation into a single temp variable, then processing it contents that, but couldn't make it work.
The two failures might have been because I used the wrong variable in a specific location. (strange the normal tests do not complain)
Not sure why this happens in Travis' python 3.5 when it fine on python 3.6+ |
The error
Is due to lack of type annotations for scandir. |
Getting back to the complex flow of an inlined function makes it work. I tried to make the behaviour clearer by removing an indent level. It's fishy, though, if the order is to 'move', why doesn't move actually move the files and they are copied instead? |
e4f3fca
to
7dd073f
Compare
e0f649a
to
056d910
Compare
Seeing the coverage report, the most complex parts of the function are not being used: Mainly the folder merging on the relocate subfunction and relinking of symlinks at the end of the process. Is there any case where this functionality may be useful? Code seems related to #317 |
@psafont could you add some unit tests? These functions are pretty self contained so it should be straightforward to write unit tests for them. The solution to not enough coverage for the tricky bits is to improve the coverage :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good opportunity to add tests.
Since this seems to be part of an extension I'll add tests to |
534644d
to
5e4660c
Compare
bc80d40
to
4bdf862
Compare
After reenabling the tests on Windows with Docker there are 2 types of failures, aside from the SSH transient ones:
Do we want to fix those in this merge request or is it better if we do so in another one, tracked with an issue? (all of them happen before and after my changes) |
Change image for workflows needing python to make it compatible with Windows. Changed style to pytest, as part of common-workflow-language#885 Uncomment all code, but still skip those tests
When moving files, only calculate realpaths of links instead of doing it for all the files in the destination directory
Also changed some names and reduced indentation levels
Remove two parameters that are always used with default values and the logic associated with dealing with non-default values
@psafont you can create another issue/PR for those un-related test failures. Thanks! |
@psafont Shall I merge this? |
Go ahead! |
for all the files in the destination directory