Skip to content

Commit b2d8572

Browse files
authored
Merge pull request #1693 from jku/add-repo-lib-design-adr
ADR: Add New repository library design
2 parents 942e6d2 + f6ede42 commit b2d8572

5 files changed

+363
-0
lines changed
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
# Repository library design built on top of Metadata API
2+
3+
4+
## Context and Problem Statement
5+
6+
The Metadata API provides a modern Python API for accessing individual pieces
7+
of metadata. It does not provide any wider context help to someone looking to
8+
implement a TUF repository.
9+
10+
The legacy python-tuf implementation offers tools for this but suffers from
11+
some issues (as do many other implementations):
12+
* There is a _very_ large amount of code to maintain: repo.py,
13+
repository_tool.py and repository_lib.py alone are almost 7000 lines of code.
14+
* The "library like" parts of the implementation do not form a good coherent
15+
API: methods routinely have a large number of arguments, code still depends
16+
on globals in a major way and application (repo.py) still implements a lot of
17+
"repository code" itself
18+
* The "library like" parts of the implementation make decisions that look like
19+
application decisions. As an example, repository_tool loads _every_ metadata
20+
file in the repository: this is fine for CLI that operates on a small
21+
repository but is unlikely to be a good choice for a large scale server.
22+
23+
24+
## Decision Drivers
25+
26+
* There is a consensus on removing the legacy code from python-tuf due to
27+
maintainability issues
28+
* Metadata API makes modifying metadata far easier than legacy code base: this
29+
makes significantly different designs possible
30+
* Not providing a "repository library" (and leaving implementers on their own)
31+
may be a short term solution because of the previous point, but to make
32+
adoption easier and to help adopters create safe implementations the project
33+
would benefit from some shared repository code and a shared repository design
34+
* Maintainability of new library code must be a top concern
35+
* Allowing a wide range of repository implementations (from CLI tools to
36+
minimal in-memory implementations to large scale application servers)
37+
would be good: unfortunately these can have wildly differing requirements
38+
39+
40+
## Considered Options
41+
42+
1. No repository packages
43+
2. repository_tool -like API
44+
3. Minimal repository abstraction
45+
46+
47+
## Decision Outcome
48+
49+
Option 3: Minimal repository abstraction
50+
51+
While option 1 might be used temporarily, the goal should be to implement a
52+
minimal repository abstraction as soon as possible: this should give the
53+
project a path forward where the maintenance burden is reasonable and results
54+
should be usable very soon. The python-tuf repository functionality can be
55+
later extended as ideas are experimented with in upstream projects and in
56+
python-tuf example code.
57+
58+
The concept is still unproven but validating the design should be straight
59+
forward: decision could be re-evaluated in a few months if not in weeks.
60+
61+
62+
## Pros and Cons of the Options
63+
64+
### No repository packages
65+
66+
Metadata API makes editing the repository content vastly simpler. There are
67+
already repository implementations built with it[^1] so clearly a repository
68+
library is not an absolute requirement.
69+
70+
Not providing repository packages in python-tuf does mean that external
71+
projects could experiment and create implementations without adding to the
72+
maintenance burden of python-tuf. This would be the easiest way to iterate many
73+
different designs and hopefully find good ones in the end.
74+
75+
That said, there are some tricky parts of repository maintenance (e.g.
76+
initialization, snapshot update, hashed bin management) that would benefit from
77+
having a canonical implementation, both for easier adoption of python-tuf and
78+
as a reference for other implementations. Likewise, a well designed library
79+
could make some repeated actions (e.g. version bumps, expiry updates, signing)
80+
much easier to manage.
81+
82+
### repository_tool -like API
83+
84+
It won't be possible to support the repository_tool API as it is but a similar
85+
one would certainly be an option.
86+
87+
This would likely be the easiest upgrade path for any repository_tool users out
88+
there. The implementation would not be a huge amount of work as Metadata API
89+
makes many things easier.
90+
91+
However, repository_tool (and parts of repo.py) are not a great API. It is
92+
likely that a similar API suffers from some of the same issues: it might end up
93+
being a substantial amount of code that is only a good fit for one application.
94+
95+
### Minimal repository abstraction
96+
97+
python-tuf could define a tiny repository API that
98+
* provides carefully selected core functionality (like core snapshot update)
99+
* does not implement all repository actions itself, instead it makes it easy
100+
for the application code to do them
101+
* leaves application details to specific implementations (examples of decisions
102+
a library should not always decide: "are targets stored with the repo?",
103+
"which versions of metadata are stored?", "when to load metadata?", "when to
104+
unload metadata?", "when to bump metadata version?", "what is the new expiry
105+
date?", "which targets versions should be part of new snapshot?")
106+
107+
python-tuf could also provide one or more implementations of this abstraction
108+
as examples -- this could include a _repo.py_- or _repository_tool_-like
109+
implementation.
110+
111+
This could be a compromise that allows:
112+
* low maintenance burden on python-tuf: initial library could be tiny
113+
* sharing the important, canonical parts of a TUF repository implementation
114+
* ergonomic repository modification, meaning most actions do not have to be in
115+
the core code
116+
* very different repository implementations using the same core code and the
117+
same abstract API
118+
119+
The approach does have some downsides:
120+
* it's not a drop in replacement for repository_tool or repo.py
121+
* A prototype has been implemented (see Links below) but the concept is still
122+
unproven
123+
124+
More details in [Design document](../repository-library-design.md).
125+
126+
## Links
127+
* [Design document for minimal repository abstraction](../repository-library-design.md)
128+
* [Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/)
129+
130+
131+
[^1]:
132+
[RepositorySimulator](https://github.com/theupdateframework/python-tuf/blob/develop/tests/repository_simulator.py)
133+
in python-tuf tests is an in-memory implementation, while
134+
[repository-editor-for-tuf](https://github.com/vmware-labs/repository-editor-for-tuf)
135+
is an external Command line repository maintenance tool.
136+

docs/adr/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ This log lists the architectural decisions for tuf.
1414

1515
- [ADR-0008](0008-accept-unrecognised-fields.md) - Accept metadata that includes unrecognized fields
1616
- [ADR-0009](0009-what-is-a-reference-implementation.md) - Primary purpose of the reference implementation
17+
- [ADR-0010](0010-repository-library-design.md) - Repository library design built on top of Metadata API
1718

1819
<!-- adrlogstop -->
1920

49 KB
Loading
42 KB
Loading

docs/repository-library-design.md

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# Python-tuf repository API proposal: _minimal repository abstraction_
2+
3+
This is an attachment to ADR 10: _Repository library design built on top of
4+
Metadata API_, and documents the design proposal in Dec 2021.
5+
6+
## Design principles
7+
8+
Primary goals of this repository library design are
9+
1. Support full range of repository implementations: from command line
10+
“repository editing” tools to production repositories like PyPI
11+
2. Provide canonical solutions for the difficult repository problems but avoid
12+
making implementation decisions
13+
3. Keep python-tuf maintenance burden in mind: less is more
14+
15+
Why does this design look so different from both legacy python-tuf code and
16+
other implementations?
17+
* Most existing implementations are focused on a specific use case (typically a
18+
command line application): this is a valid design choice but severely limits
19+
goal #1
20+
* The problem space contains many application decisions. Many implementations
21+
solve this by creating functions with 15 arguments: this design tries to find
22+
another way (#2)
23+
* The Metadata API makes modifying individual pieces of metadata simpler. This,
24+
combined with good repository API design, should enable more variance in
25+
where things are implemented: The repository library does not have to
26+
implement every little detail as we can safely let specific implementations
27+
handle things, see goal #3
28+
* This variance means we can start by implementing a minimal design: as
29+
experience from implementations is collected, we can then move implementation
30+
details into the library (goals #2, #3)
31+
32+
## Design
33+
34+
### Application and library components
35+
36+
![Design: Application and library components](repository-library-design-ownership.jpg)
37+
38+
The design expects a fully functional repository application to contain code at
39+
three levels:
40+
* Repository library (abstract classes that are part of python-tuf)
41+
* The Repository abstract class provides an ergonomic abstract metadata
42+
editing API for all code levels to use. It also provides implementations
43+
for some core edit actions like _snapshot update_.
44+
* A small amount of related functionality is also provided (private key
45+
management API, maybe repository validation).
46+
* is a very small library: possibly a few hundred lines of code.
47+
* Concrete Repository implementation (typically part of application code,
48+
implements interfaces provided by the repository API in python-tuf)
49+
* Contains the “application level” decisions that the Repository abstraction
50+
requires to operate: examples of application decisions include
51+
* _When should “targets” metadata next expire when it is edited?_
52+
* _What is the current “targets” metadata version? Where do we load it
53+
from?_
54+
* _Where to store current “targets” after editing? Should the previous
55+
version be deleted from storage?_
56+
* Actual application
57+
* Uses the Repository API to do the repository actions it needs to do
58+
59+
For context here’s a trivial example showing what “ergonomic editing” means --
60+
this key-adding code could be in the application (or later, if common patterns
61+
are found, in the python-tuf library):
62+
63+
```python
64+
with repository.edit(“targets”) as targets:
65+
# adds a key for role1 (as an example, arbitrary edits are allowed)
66+
targets.add_key(“role1”, key)
67+
```
68+
69+
This code loads current targets metadata for editing, adds the key to a role,
70+
and handles version and expiry bumps before persisting the new targets version.
71+
The reason for the context manager style is that it manages two things
72+
simultaneously:
73+
* Hides the complexity of loading and persisting metadata, and updating expiry
74+
and versions from the editing code (by putting it in the repository
75+
implementation that is defined in python-tuf but implemented by the
76+
application)
77+
* Still allows completely arbitrary edits on the metadata in question: now the
78+
library does not need to anticipate what application wants to do and on the
79+
other hand library can still provide e.g. snapshot functionality without
80+
knowing about the application decisions mentioned in previous point.
81+
82+
Other designs do not seem to manage both of these.
83+
84+
### How the components are used
85+
86+
![Design: How components are used](repository-library-design-usage.jpg)
87+
88+
The core idea here is that because editing is ergonomic enough, when new
89+
functionality (like “developer uploads new targets”) is added, _it can be added
90+
at any level_: the application might add a `handle_new_target_files()` method
91+
that adds a bunch of targets into the metadata, but one of the previous layers
92+
could offer that as a helper function as well: code in both cases would look
93+
similar as it would use the common editing interface.
94+
95+
The proposed design is purposefully spartan in that the library provides
96+
very few high-level actions (the prototype only provided _sign_ and
97+
_snapshot_): everything else is left to implementer at this point. As we gain
98+
experience of common usage patterns we can start providing other features as
99+
well.
100+
101+
There are a few additional items worth mentioning:
102+
* Private key management: the Repository API should come with a “keyring
103+
abstraction” -- a way for the application to provide roles’ private keys for
104+
the Repository to use. Some implementations could be provided as well.
105+
* Validating repository state: the design is very much focused on enabling
106+
efficient editing of individual metadata. Implementations are also likely to
107+
be interested in validating (after some edits) that the repository is correct
108+
according to client workflow and that it contains the expected changes. The
109+
Repository API should provide some validation, but we should recognise that
110+
validation may be implementation specific.
111+
* Improved metadata editing: There are a small number of improvements that
112+
could be made to metadata editing. These do not necessarily need to be part
113+
of the repository API: they could be part of Metadata API as well
114+
115+
It would make sense for python-tuf to ship with at least one concrete
116+
Repository implementation: possibly a repo.py look alike. This implementation
117+
should not be part of the library but an example.
118+
119+
## Details
120+
121+
This section includes links to a Proof of Concept implementation in
122+
[repository-editor-for-tuf](https://github.com/vmware-labs/repository-editor-for-tuf/):
123+
it should not be seen as the exact proposed API but a prototype of the ideas.
124+
125+
The ideas in this document map to POC components like this:
126+
127+
| Concept | repository-editor-for-tuf implementation |
128+
|-|-|
129+
| Repository API | [librepo/repo.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/librepo/repo.py), [librepo/keys.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/librepo/repo.py) |
130+
| Example of repository implementation | [git_repo.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/git_repo.py) |
131+
|Application code | [cli.py (command line app)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/cli.py), [keys_impl.py (keyring implementation)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/keys_impl.py) |
132+
| Repository validation | [verifier.py (very rough, not intended for python-tuf)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/verifier.py)
133+
| Improved Metadata editing | [helpers.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/helpers.py)
134+
135+
136+
### Repository API
137+
138+
Repository itself is a minimal abstract class: The value of this class is in
139+
defining the abstract method signatures (most importantly `_load`, `_save()`,
140+
`edit()`) that enable ergonomic metadata editing. The Repository class in this
141+
proposal includes concrete implementations only for the following:
142+
* `sign()` -- signing without editing metadata payload
143+
* `snapshot()` -- updates snapshot and timestamp metadata based on given input.
144+
Note that a concrete Repository implementation could provide an easier to use
145+
snapshot that does not require input (see example in git_repo.py)
146+
147+
More concrete method implementations (see cli.py for examples) could be added
148+
to Repository itself but none seem essential at this point.
149+
150+
The current prototype API defines five abstract methods that take care of
151+
access to metadata storage, expiry updates, version updates and signing. These
152+
must be implemented in the concrete implementation:
153+
154+
* **keyring()**: A property that returns the private key mapping that should be
155+
used for signing.
156+
157+
* **_load()**: Loads metadata from storage or cache. Is used by edit() and
158+
sign().
159+
160+
* **_save()**: Signs and persists metadata in cache/storage. Is used by edit()
161+
and sign().
162+
163+
* **edit()**: The ContextManager that enables ergonomic metadata
164+
editing by handling expiry and version number management.
165+
166+
* **init_role()**: initializes new metadata handling expiry and version number.
167+
(_init_role is in a way a special case of edit and should potentially be
168+
integrated there_).
169+
170+
The API requires a “Keyring” abstraction that the repository code can use to
171+
lookup a set of signers for a specific role. Specific implementations of
172+
Keyring could include a file-based keyring for testing, env-var keyring for CI
173+
use, etc. Some implementations should be provided in the python-tuf code base
174+
and more could be implemented in applications.
175+
176+
_Prototype status: Prototype Repository and Keyring abstractions exist in
177+
librepo/repo.py._
178+
179+
### Example concrete Repository implementation
180+
181+
The design decisions that the included example `GitRepository` makes are not
182+
important but provide an example of what is possible:
183+
* Metadata versions are stored in files in git, with filenames that allow
184+
serving the metadata directory as is over HTTP
185+
* Version bumps are made based on git status (so edits in staging area only
186+
bump version once)
187+
* “Current version” when loading metadata is decided based on filenames on disk
188+
* Files are removed once they are no longer part of the snapshot (to keep
189+
directory uncluttered)
190+
* Expiry times are decided based on an application specific metadata field
191+
* Private keys can be stored in a file or in environment variables (for CI use)
192+
193+
Note that GitRepository implementation is significantly larger than the
194+
Repository interface -- but all of the complexity in GitRepository is really
195+
related to the design decisions made there.
196+
197+
_Prototype status: The GitRepository example exists in git_repo.py._
198+
199+
### Validating repository state
200+
201+
This is mostly undesigned but something built on top of TrustedMetadataSet
202+
(currently ngclient component) might work as a way to easily check specific
203+
aspects like:
204+
* Is top-level metadata valid according to client workflow
205+
* Is a role included in the snapshot and the delegation tree
206+
207+
It’s likely that different implementations will have different needs though: a
208+
command line app for small repos might want to validate loading all metadata
209+
into memory, but a server application hosting tens of thousands of pieces of
210+
metadata is unlikely to do so.
211+
212+
_Prototype status: A very rough implementation exists in verifier.py : this is
213+
unlikely to be very useful_
214+
215+
### Improved metadata editing
216+
217+
Currently the identified improvement areas are:
218+
* Metadata initialization: this could potentially be improved by adding
219+
default argument values to Metadata API constructors
220+
* Modifying and looking up data about roles in delegating metadata
221+
(root/targets): they do similar things but root and targets do not have
222+
identical API. This may be a very specific use case and not interesting
223+
for some applications
224+
225+
_Prototype status: Some potential improvements have been collected in
226+
helpers.py_

0 commit comments

Comments
 (0)