Skip to content

UI progress cleanup #2846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jan 29, 2020
Merged

UI progress cleanup #2846

merged 23 commits into from
Jan 29, 2020

Conversation

casperdcl
Copy link
Contributor

@casperdcl casperdcl commented Nov 26, 2019

Extra progress bar concern:

We started using an extra progress bar that is counting targets, as described and shown here: #2658 . In most cases though there is a single target - dvc add file or dvc add directory. That extra progress bar is not needed in those cases. It's needed in case of multiple targets or if -R option is specified.

most issues fixed:
asciicast

large dir fix, no outer bar for single target, more progress output rather than hanging:
asciicast

recursive:
asciicast

@casperdcl
Copy link
Contributor Author

@efiop could you list scenarios that you refer to for bad (too little or too much verbosity)? e.g.:

  • dvc add single_file_over_10_gb
  • dvc add single_dir_containing_one_file_over_10_gb_and_lots_of_small_files_under_1kb
  • dvc add single_dir_containing_many_small_files_total_over_10_gb

Also you mentioned RemoteBASE so I presume fetch/push/pull are issues too?

@efiop
Copy link
Contributor

efiop commented Nov 26, 2019

@casperdcl Correct, I think you've listed them all. Same applies to push/pull/fetch/checkout too.

@casperdcl casperdcl self-assigned this Nov 26, 2019
@casperdcl casperdcl added discussion requires active participation to reach a conclusion enhancement Enhances DVC feature request Requesting a new feature performance improvement over resource / time consuming tasks refactoring Factoring and re-factoring research ui user interface / interaction labels Nov 26, 2019
@casperdcl
Copy link
Contributor Author

@efiop should we also consider these despite not being recommended?

  • dvc add -R single_dir_containing_many_small_files_total_over_10_gb
  • dvc add single_dir_containing_many_small_files_total_over_10_gb/*.*

@efiop
Copy link
Contributor

efiop commented Nov 27, 2019

@casperdcl Good point! What are your thoughts on that? 🙂

@casperdcl
Copy link
Contributor Author

er... just wondering whether I should bother profiling them too to find out where more progress/feedback is required for users

@efiop
Copy link
Contributor

efiop commented Nov 27, 2019

@casperdcl I guess both of those cases are dvc add small_file times N. dvc add small_file is a base scenario and is handled already and times N part is handled by the progress bar that you've introduced earlier. So unless I'm missing something, we are already done there (or will be done indirectly).

@casperdcl
Copy link
Contributor Author

@efiop there is much fun to be had with that last commit (a941967371b4839f629e6a253792178624c016bb), I added progress everywhere except check_modified_graph - should actually catch all cases I think. Might not even want to revert any of that additional feedback.

@casperdcl
Copy link
Contributor Author

So stage.commit() is a big time-consuming culprit which we hadn't added any progress for. Turns out copying files takes as long as reading & computing md5sums 🙃

@shcheklein
Copy link
Member

A bit offtopic, sorry about that. Just to share some possible solutions to the UI when there are a few stages (top level progress, below done with [1/4], [2/4] .. ), each of stages includes a progress bar (name of the stage + remove the bar when stage is done), multiple threads progress (in this case show them all, but they are very different - probably application specific):

https://asciinema.org/a/rUwhTMR06Rz7MIFEeFxFsjrHd

@casperdcl
Copy link
Contributor Author

casperdcl commented Dec 18, 2019

@shcheklein

cleanup output when if fails #2699 (comment)

when does this happen?

make sure that we show HINT message to enable cache optimization properly when copy is taking too long already

in what case(s) does the hint (I assume it's a message recommending reflink caches) fail to show?

@shcheklein
Copy link
Member

@casperdcl

when does this happen?

Try just some random path:

$ dvc add asedfsdfsd

in what case(s) does the hint (I assume it's a message recommending reflink caches) fail to show?

yes, this is the "reflink" message. cache.slow_link_warning here https://dvc.org/doc/command-reference/config#cache

@pared has more context about the limits we have now. You can try some huge file with a cache set to "copy". The purpose I created that checkbox for is to reconsider this policy, the way the message is written in the context of recent changes with progress bars, etc. It should be clearly visible if we hit a threshold (we can keep it the same for now?) along with progress bar (below them?).

@casperdcl
Copy link
Contributor Author

casperdcl commented Dec 18, 2019

@shcheklein right.

For dvc add nonexistent_file, what should the output be? 1/1 is bad imho. Should we go for 0/0 or 0/1 though?

About the reflink message, it uses logger.warning so plays fine with the progressbars (

dvc/dvc/logger.py

Lines 197 to 198 in e031366

"class": "dvc.logger.LoggerHandler",
"level": "WARNING",
, yes will appear above the bars).

About changing the default thresholds, I think that should be a separate issue.

@codecov
Copy link

codecov bot commented Dec 18, 2019

Codecov Report

Merging #2846 into master will decrease coverage by 0.92%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2846      +/-   ##
==========================================
- Coverage   92.66%   91.73%   -0.93%     
==========================================
  Files         139      138       -1     
  Lines        8726     8704      -22     
==========================================
- Hits         8086     7985     -101     
- Misses        640      719      +79
Impacted Files Coverage Δ
dvc/remote/gdrive.py 30.53% <0%> (-62.69%) ⬇️
dvc/lock.py 58.66% <0%> (-9.39%) ⬇️
dvc/dependency/repo.py 88.88% <0%> (-9.3%) ⬇️
dvc/istextfile.py 94.11% <0%> (-5.89%) ⬇️
dvc/command/get.py 87.5% <0%> (-4.4%) ⬇️
dvc/scm/tree.py 93.1% <0%> (-3.57%) ⬇️
dvc/version.py 83.33% <0%> (-2.78%) ⬇️
dvc/remote/ssh/connection.py 84.23% <0%> (-2.77%) ⬇️
dvc/repo/get.py 97.82% <0%> (-2.18%) ⬇️
dvc/utils/fs.py 94.84% <0%> (-2.08%) ⬇️
... and 109 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a42d4f2...ab115df. Read the comment docs.

@casperdcl casperdcl changed the title [WIP] UI progress cleanup UI progress cleanup Jan 5, 2020
@casperdcl casperdcl mentioned this pull request Jan 5, 2020
3 tasks
@casperdcl
Copy link
Contributor Author

casperdcl commented Jan 27, 2020

rebased, fixed conflicts; updated description with before & after recordings.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, thanks @casperdcl ! Let's merge and iterate over.

@efiop efiop merged commit 57dae7a into iterative:master Jan 29, 2020
@casperdcl casperdcl deleted the add branch January 29, 2020 20:09
@casperdcl casperdcl mentioned this pull request Jan 29, 2020
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion requires active participation to reach a conclusion enhancement Enhances DVC feature request Requesting a new feature performance improvement over resource / time consuming tasks refactoring Factoring and re-factoring research ui user interface / interaction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add: cleanup and improve output
3 participants