Skip to content

status: takes too long to get status #6543

Closed
@raylutz

Description

@raylutz

Bug Report

Description

I have dvc setup in the root of my project folder, which is at

C:\Users\raylu\Documents\Github\audit-engine

the stage file is established in

resources\WI_Ozaukee_20201103\dvc\precheck\dvc.yaml

I issue this command:

dvc status -R -v -v -v --show-json  resources\WI_Ozaukee_20201103\dvc

And I expect that it will walk the subtree under

C:\Users\raylu\Documents\Github\audit-engine\resources\WI_Ozaukee_20201103\dvc

to look for dvc.yaml stage files. Instead, it appears to walk the full tree below

C:\Users\raylu\Documents\Github\audit-engine

and this takes 75 seconds (there is 112 GB of data).
But this is just a hunch. We temporarily moved the .dvc folder to inside the folder

C:\Users\raylu\Documents\Github\audit-engine\resources\WI_Ozaukee_20201103\dvc

and it takes only 5.6 seconds (which is still pretty long). This should probably take only a second or two, because getting the etags from the three s3 files is very fast and it needs only to find one stage file. It seems something is wrong here.

Reproduce

To reproduce this, dvc must be configured with no scm, no remote, no cache and use -R in status, so it can find the dvc.yaml stage files. We have only one.

Expected

See above.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 2.6.4 (pip)
---------------------------------
Platform: Python 3.7.6 on Windows-10-10.0.19041-SP0
Supports:
        http (requests = 2.24.0),
        https (requests = 2.24.0),
        s3 (s3fs = 2021.8.0, boto3 = 1.17.106)

Additional Information (if any):
I will attach the profile dump and plot.

Profile Dump

https://cdn.discordapp.com/attachments/882823608949411850/884465153716920380/dump.prof

https://cdn.discordapp.com/attachments/882823608949411850/884467942111203348/image_output.png

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: statusRelated to the dvc diff/list/statusperformanceimprovement over resource / time consuming tasks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions