Skip to content

Commit 845f307

Browse files
author
Jussi Kukkonen
committed
ADR: Add New repository library design
Document the decision to build a repository library on top of Metadata API. Signed-off-by: Jussi Kukkonen <[email protected]>
1 parent acb201d commit 845f307

File tree

2 files changed

+129
-0
lines changed

2 files changed

+129
-0
lines changed
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Repository library design built on top of Metadata API
2+
3+
4+
## Context and Problem Statement
5+
6+
The Metadata API provides a modern Python API for accessing individual pieces
7+
of metadata. It does not provide any wider context help to someone looking to
8+
implement a TUF repository.
9+
10+
The legacy python-tuf implementation offers tools for this but suffers from
11+
some issues (as do many other implementations):
12+
* There is a _very_ large amount of code to maintain: repo.py,
13+
repository_tool.py and repository_lib.py alone are almost 7000 lines of code.
14+
* The "library like" parts of the implementation do not form a good coherent
15+
API: methods routinely have a large number of arguments, code still depends
16+
on globals in a major way and application (repo.py) still implements a lot of
17+
"repository code" itself
18+
* The "library like" parts of the implementation make decisions that look like
19+
application decisions. As an example, repository_tool loads _every_ metadata
20+
file in the repository: this is fine for CLI that operates on a small
21+
repository but is unlikely to be a good choice for PyPI.
22+
23+
24+
## Decision Drivers
25+
26+
* There is a consensus on removing the legacy code from python-tuf due to
27+
maintainability issues
28+
* Metadata API makes modifying metadata far easier than legacy code base: this
29+
makes significantly different designs possible
30+
* Not providing a "repository library" (and leaving implementers on their own)
31+
may be a short term solution because of the previous point, but it does seem
32+
like the project would benefit from some shared repository code and shared
33+
repository design
34+
* Maintainability of new library code must be a top concern
35+
* Allowing a wide range of repository implementations (from CLI tools to
36+
minimal in-memory implementations to large scale applications like Warehouse)
37+
would be good: unfortunately these can have wildly differing requirements
38+
39+
40+
## Considered Options
41+
42+
1. No repository packages
43+
2. repository_tool -like API
44+
3. Minimal repository abstraction
45+
46+
47+
## Decision Outcome
48+
49+
Option 3: Minimal repository abstraction
50+
51+
While option 1 might be used temporarily, the goal should be to implement a
52+
minimal repository abstraction as soon as possible: this should give the
53+
project a path forward where the maintenance burden is reasonable and results
54+
should be usable very soon. The python-tuf repository functionality can be
55+
later extended as ideas are experimented with in upstream projects and in
56+
python-tuf example code.
57+
58+
The concept is still unproven but validating the design should be straight
59+
forward: decision could be re-evaluated in a few months if not in weeks.
60+
61+
62+
## Pros and Cons of the Options
63+
64+
### No repository packages
65+
66+
Metadata API makes editing the repository content vastly simpler. There are
67+
already repository implementations built with it (RepositorySimulator in
68+
python-tuf tests is an in-memory implementation, while
69+
repository-editor-for-tuf is an external CLI tool) so clearly a repository
70+
library is not an absolute requirement.
71+
72+
Not providing repository packages in python-tuf does mean that external
73+
projects could experiment and create implementations without adding to the
74+
maintenance burden of python-tuf. This would be the easiest way to iterate many
75+
different designs and hopefully find good ones in the end.
76+
77+
That said, there are some tricky parts of repository maintenance (e.g.
78+
initialization, snapshot update, hashed bin management) that would benefit from
79+
having a canonical implementation. Likewise, a well designed library could make
80+
some repeated actions (e.g. version bumps, expiry updates, signing) much easier
81+
to manage.
82+
83+
### repository_tool -like API
84+
85+
It won't be possible to support the repository_tool API as it is but a similar
86+
one would certainly be an option.
87+
88+
This would likely be the easiest upgrade path for any repository_tool users out
89+
there. The implementation would not be a huge amount of work as Metadata API
90+
makes many things easier.
91+
92+
However, repository_tool (and parts of repo.py) are not a great API. It is
93+
likely that a similar API suffers from some of the same issues: it might end up
94+
being a substantial amount of code that is only a good fit for one application.
95+
96+
### Minimal repository abstraction
97+
98+
python-tuf could define a tiny repository API that
99+
* provides carefully selected core functionality (like core snapshot update)
100+
but...
101+
* does not implement all repository actions itself, instead i makes it easy
102+
for the application code to do them
103+
* leaves application details to specific implementations (examples of decisions
104+
a library should not always decide: "are targets stored with the repo?",
105+
"which versions of metadata are stored?", "when to load metadata?", "when to
106+
unload metadata?", "when to bump metadata version?", "what is the new expiry
107+
date?", "which targets versions should be part of new snapshot?")
108+
109+
python-tuf could also provide one or more implementations of this abstraction
110+
as examples -- this could include a repo.py- or repository_tool-like
111+
implementation.
112+
113+
This could be a compromise that allows:
114+
* low maintenance burden on python-tuf: initial library could be tiny
115+
* sharing the important, canonical parts of a TUF repository implementation
116+
* ergonomic repository modification, meaning most actions do not have to be in
117+
the core code
118+
* very different repository implementations using the same core code and the
119+
same abstract API
120+
121+
The approach does have some downsides:
122+
* it's not a drop in replacement for repository_tool or repo.py
123+
* A prototype has been implemented (see Links below) but the concept is still
124+
unproven
125+
126+
## Links
127+
[Design document for minimal repository abstraction](https://docs.google.com/document/d/1YY83J4ihztsi1Qv0dJ22EcqND8dT80AGTduwgh0trpY)
128+
[Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/)

docs/adr/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ This log lists the architectural decisions for tuf.
1414

1515
- [ADR-0008](0008-accept-unrecognised-fields.md) - Accept metadata that includes unrecognized fields
1616
- [ADR-0009](0009-what-is-a-reference-implementation.md) - Primary purpose of the reference implementation
17+
- [ADR-0010](0010-repository-library-design.md) - Repository library design built on top of Metadata API
1718

1819
<!-- adrlogstop -->
1920

0 commit comments

Comments
 (0)