Skip to content

remote: reduce traverse weight multiplier #3704

Closed
@pmrowla

Description

@pmrowla

In remote.cache_exists(), the traverse/full remote listing method is weighted to account for some performance hits that make it inherently slower than querying individual objects

dvc/dvc/remote/base.py

Lines 923 to 928 in 18e8f07

# For sufficiently large remotes, traverse must be weighted to account
# for performance overhead from large lists/sets.
# From testing with S3, for remotes with 1M+ files, object_exists is
# faster until len(checksums) is at least 10k~100k
if remote_size > self.TRAVERSE_THRESHOLD_SIZE:
traverse_weight = traverse_pages * self.TRAVERSE_WEIGHT_MULTIPLIER

The initial weight multiplier value (20) can be lowered now, due to the subsequent remote improvements that have come since the initial traverse/no traverse optimization PR

Metadata

Metadata

Assignees

Labels

performanceimprovement over resource / time consuming tasks

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions