Skip to content

dvc run hangs when unzipping archive with large number of files #434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maidens opened this issue Jan 19, 2018 · 2 comments
Closed

dvc run hangs when unzipping archive with large number of files #434

maidens opened this issue Jan 19, 2018 · 2 comments
Assignees
Labels
bug Did we break something?

Comments

@maidens
Copy link

maidens commented Jan 19, 2018

I'm running into issues when trying to unzip a large archive with dvc run. I've put together a minimal working example of the issue I'm having.

mkdir myrepo
cd myrepo
git init
dvc init
dvc import https://johnmaidens.com/test_files.zip data/
dvc run unzip data/test_files.zip -d data/

When I do this, about 1500 of the files get unzipped then the program hangs. However if I run the last line without dvc run it works fine. I was able to reproduce the issue on both MacOS and Ubuntu using dvc version 0.8.6. If I interrupt the program when it's hanging, I get the following stack trace.

Computer:myrepo maidens$ dvc run unzip data/test_files.zip -d data/
^CTraceback (most recent call last):
  File "/Users/maidens/anaconda/bin/dvc", line 11, in <module>
    sys.exit(main())
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/main.py", line 60, in main
    Runtime.run(CmdRun)
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/runtime.py", line 41, in run
    sys.exit(instance.run())
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/command/run.py", line 62, in run
    self.parsed_args.shell)
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/command/run.py", line 75, in run_and_commit_if_needed
    shell)
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/command/run.py", line 88, in run_command
    repo_change = RepositoryChange(cmd_args, self.settings, stdout, stderr, shell=shell)
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/repository_change.py", line 30, in __init__
    Executor.exec_cmd_only_success(cmd_args, stdout, stderr, shell=shell)
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/executor.py", line 52, in exec_cmd_only_success
    stderr_file=stderr_file, cwd=cwd, shell=shell)
  File "/Users/maidens/anaconda/lib/python2.7/site-packages/dvc/executor.py", line 23, in exec_cmd
    p.wait()
  File "/Users/maidens/anaconda/lib/python2.7/subprocess.py", line 1073, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/Users/maidens/anaconda/lib/python2.7/subprocess.py", line 121, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

Anyway, I love what you're building with dvc. Any help I can get debugging this issue would be much appreciated!

@efiop
Copy link
Contributor

efiop commented Jan 19, 2018

Hi @maidens !
Thanks for trying out dvc!
I was able to reproduce this bug and can confirm it. It is caused by an overflowing pipe. The fix for it will be released with the new version soon. In the meantime, the bug can be worked around by adding '-q' option to unzip, so that it doesn't flood stdout.

@efiop efiop self-assigned this Jan 19, 2018
@efiop efiop added the bug Did we break something? label Jan 19, 2018
@maidens
Copy link
Author

maidens commented Jan 19, 2018

Thanks for the quick response! That makes sense. I can confirm that suppressing the output to stdout fixes the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something?
Projects
None yet
Development

No branches or pull requests

2 participants