Skip to content

Commit 82a3afd

Browse files
committed
Improve performance of delegate_hashed_bins
Due to the performance overhead of deepcopy(), as used extensively in roledb, the delegate function is rather slow. This is especially noticeable when we have a large number_of_bins when calling delegate_hashed_bins. In order to be able to easily reduce the number of deepcopy() operations we remove direct calls to delegate() and instead use the newly added helper functions to replicate the behaviour, only with a single call update to the roledb. This improves the performance of a 16k bins delegation from a 1hr 24min operation on my laptop to 33s. Ideally once Issue #1005 has been properly fixed this commit can be reverted and we can once again just call delegate() here. Signed-off-by: Joshua Lock <[email protected]>
1 parent 5ed1b5b commit 82a3afd

File tree

1 file changed

+47
-3
lines changed

1 file changed

+47
-3
lines changed

tuf/repository_tool.py

Lines changed: 47 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2546,14 +2546,58 @@ def delegate_hashed_bins(self, list_of_targets, keys_of_hashed_bins,
25462546
hash_prefix = _get_hash(target_path.replace('\\', '/').lstrip('/'))[:prefix_length]
25472547
ordered_roles[int(hash_prefix, 16) // bin_size]["target_paths"].append(target_path)
25482548

2549+
keyids, keydict = _keys_to_keydict(keys_of_hashed_bins)
2550+
2551+
# A queue of roleinfo's that need to be updated in the roledb
2552+
delegated_roleinfos = []
2553+
25492554
for bin_rolename in ordered_roles:
2555+
# TODO: originally we just called self.delegate() for each item in this
2556+
# iteration. However, this is *extremely* slow when creating a large
2557+
# number of hashed bins, i.e. 16k as is recommended for PyPI usage in
2558+
# PEP 458: https://www.python.org/dev/peps/pep-0458/
2559+
# The source of the slowness is the interactions with the roledb, which
2560+
# causes several deep copies of roleinfo dictionaries:
2561+
# https://github.com/theupdateframework/tuf/issues/1005
2562+
# Once the underlying issues in #1005 are resolved, i.e. some combination
2563+
# of the intermediate and long-term fixes, we may simplify here by
2564+
# switching back to just calling self.delegate(), but until that time we
2565+
# queue roledb interactions and perform all updates to the roledb in one
2566+
# operation at the end of the iteration.
2567+
2568+
relative_paths = {}
2569+
targets_directory_length = len(self._targets_directory)
2570+
for path in bin_rolename['target_paths']:
2571+
relative_paths.update({path[targets_directory_length:]: {}})
2572+
25502573
# Delegate from the "unclaimed" targets role to each 'bin_rolename'
2551-
self.delegate(bin_rolename['name'], keys_of_hashed_bins, [],
2552-
list_of_targets=bin_rolename['target_paths'],
2553-
path_hash_prefixes=bin_rolename['target_hash_prefixes'])
2574+
target = self._create_delegated_target(bin_rolename['name'], keyids,
2575+
paths=relative_paths)
2576+
2577+
roleinfo = {'name': bin_rolename['name'],
2578+
'keyids': keyids,
2579+
'threshold': 1,
2580+
'terminating': False,
2581+
'path_hash_prefixes': bin_rolename['target_hash_prefixes']}
2582+
delegated_roleinfos.append(roleinfo)
2583+
2584+
for key in keys_of_hashed_bins:
2585+
target.add_verification_key(key)
2586+
2587+
# Add the new delegation to the top-level 'targets' role object (i.e.,
2588+
# 'repository.targets()').
2589+
if self.rolename != 'targets':
2590+
self._parent_targets_object.add_delegated_role(bin_rolename['name'],
2591+
target)
2592+
2593+
# Add 'new_targets_object' to the 'targets' role object (this object).
2594+
self.add_delegated_role(bin_rolename['name'], target)
25542595
logger.debug('Delegated from ' + repr(self.rolename) + ' to ' + repr(bin_rolename))
25552596

25562597

2598+
self._update_roledb_delegations(keydict, delegated_roleinfos)
2599+
2600+
25572601

25582602

25592603
def add_target_to_bin(self, target_filepath, number_of_bins, fileinfo=None):

0 commit comments

Comments
 (0)