Skip to content

Commit bcab2e9

Browse files
author
Jussi Kukkonen
committed
Include the design doc in repo
* Also add some new diagrams in the design doc * Fix some issues in ADR Signed-off-by: Jussi Kukkonen <[email protected]>
1 parent 845f307 commit bcab2e9

4 files changed

+218
-16
lines changed

docs/adr/0010-repository-library-design.md

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ some issues (as do many other implementations):
1818
* The "library like" parts of the implementation make decisions that look like
1919
application decisions. As an example, repository_tool loads _every_ metadata
2020
file in the repository: this is fine for CLI that operates on a small
21-
repository but is unlikely to be a good choice for PyPI.
21+
repository but is unlikely to be a good choice for a large scale server.
2222

2323

2424
## Decision Drivers
@@ -28,12 +28,12 @@ some issues (as do many other implementations):
2828
* Metadata API makes modifying metadata far easier than legacy code base: this
2929
makes significantly different designs possible
3030
* Not providing a "repository library" (and leaving implementers on their own)
31-
may be a short term solution because of the previous point, but it does seem
32-
like the project would benefit from some shared repository code and shared
33-
repository design
31+
may be a short term solution because of the previous point, but to make
32+
adoption easier and to help adopters create safe implementations the project
33+
would benefit from some shared repository code and a shared repository design
3434
* Maintainability of new library code must be a top concern
3535
* Allowing a wide range of repository implementations (from CLI tools to
36-
minimal in-memory implementations to large scale applications like Warehouse)
36+
minimal in-memory implementations to large scale application servers)
3737
would be good: unfortunately these can have wildly differing requirements
3838

3939

@@ -64,9 +64,7 @@ forward: decision could be re-evaluated in a few months if not in weeks.
6464
### No repository packages
6565

6666
Metadata API makes editing the repository content vastly simpler. There are
67-
already repository implementations built with it (RepositorySimulator in
68-
python-tuf tests is an in-memory implementation, while
69-
repository-editor-for-tuf is an external CLI tool) so clearly a repository
67+
already repository implementations built with it[^1] so clearly a repository
7068
library is not an absolute requirement.
7169

7270
Not providing repository packages in python-tuf does mean that external
@@ -76,9 +74,10 @@ different designs and hopefully find good ones in the end.
7674

7775
That said, there are some tricky parts of repository maintenance (e.g.
7876
initialization, snapshot update, hashed bin management) that would benefit from
79-
having a canonical implementation. Likewise, a well designed library could make
80-
some repeated actions (e.g. version bumps, expiry updates, signing) much easier
81-
to manage.
77+
having a canonical implementation, both for easier adoption of python-tuf and
78+
as a reference for other implementations. Likewise, a well designed library
79+
could make some repeated actions (e.g. version bumps, expiry updates, signing)
80+
much easier to manage.
8281

8382
### repository_tool -like API
8483

@@ -97,8 +96,7 @@ being a substantial amount of code that is only a good fit for one application.
9796

9897
python-tuf could define a tiny repository API that
9998
* provides carefully selected core functionality (like core snapshot update)
100-
but...
101-
* does not implement all repository actions itself, instead i makes it easy
99+
* does not implement all repository actions itself, instead it makes it easy
102100
for the application code to do them
103101
* leaves application details to specific implementations (examples of decisions
104102
a library should not always decide: "are targets stored with the repo?",
@@ -107,7 +105,7 @@ python-tuf could define a tiny repository API that
107105
date?", "which targets versions should be part of new snapshot?")
108106

109107
python-tuf could also provide one or more implementations of this abstraction
110-
as examples -- this could include a repo.py- or repository_tool-like
108+
as examples -- this could include a _repo.py_- or _repository_tool_-like
111109
implementation.
112110

113111
This could be a compromise that allows:
@@ -123,6 +121,16 @@ The approach does have some downsides:
123121
* A prototype has been implemented (see Links below) but the concept is still
124122
unproven
125123

124+
More details in [Design document](../repository-library-design.md).
125+
126126
## Links
127-
[Design document for minimal repository abstraction](https://docs.google.com/document/d/1YY83J4ihztsi1Qv0dJ22EcqND8dT80AGTduwgh0trpY)
128-
[Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/)
127+
* [Design document for minimal repository abstraction](../repository-library-design.md)
128+
* [Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/)
129+
130+
131+
[^1]:
132+
[RepositorySimulator](https://github.com/theupdateframework/python-tuf/blob/develop/tests/repository_simulator.py)
133+
in python-tuf tests is an in-memory implementation, while
134+
[repository-editor-for-tuf](https://github.com/vmware-labs/repository-editor-for-tuf)
135+
is an external Command line repository maintenance tool.
136+
49 KB
Loading
42 KB
Loading

docs/repository-library-design.md

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
# Python-tuf repository API proposal: _minimal repository abstraction_
2+
3+
This is an attachment to ADR 10: _Repository library design built on top of
4+
Metadata API_, and documents the design proposal in Dec 2020.
5+
6+
## Design principles
7+
8+
Primary goals of this repository library design are
9+
1. Support full range of repository implementations: from command line
10+
“repository editing” tools to production repositories like PyPI
11+
2. Provide canonical solutions for the difficult repository problems but avoid
12+
making implementation decisions
13+
3. Keep python-tuf maintenance burden in mind: less is more
14+
15+
Why does this design look so different from both legacy python-tuf code and
16+
other implementations?
17+
* Most existing implementations are focused on a specific use case (typically a
18+
command line application): this is a valid design choice but severely limits
19+
goal #1
20+
* The problem space contains many application decisions. Many implementations
21+
solve this by creating functions with 15 arguments: this design tries to find
22+
another way (#2)
23+
* The Metadata API makes modifying individual pieces of metadata simpler. This,
24+
combined with good repository API design, should enable more variance in
25+
where things are implemented: The repository library does not have to
26+
implement every little detail as we can safely let specific implementations
27+
handle things, see goal #3
28+
* This variance means we can start by implementing a minimal design: as
29+
experience from implementations is collected, we can then move implementation
30+
details into the library (goals #2, #3)
31+
32+
## Design
33+
34+
![Design: Application and library components](repository-library-design-ownership.jpg)
35+
36+
The design expects a fully functional repository application to contain code at
37+
three levels:
38+
* Repository library (abstract classes that are part of python-tuf)
39+
* The Repository abstract class provides an ergonomic metadata editing API
40+
for all code levels to use. It also implements some core edit actions like
41+
snapshot update
42+
* A small amount of related functionality is also provided (private key
43+
management API, maybe repository validation)
44+
* is a very small library: possibly a few hundred lines of code
45+
* Concrete Repository implementation (typically part of application code,
46+
implements interfaces provided by the repository API in python-tuf)
47+
* Contains the “application level” decisions that the Repository abstraction
48+
requires to operate: examples of application decisions include
49+
* _when should “targets” metadata next expire when it is edited?_
50+
* _What is the current “targets” metadata version? Where do we load it
51+
from?_
52+
* _Where to store current “targets” after editing? Should the previous
53+
version be deleted from storage?_
54+
* Actual application
55+
* Uses the Repository API to do the repository actions it needs to do
56+
57+
For context here’s a trivial example showing what “ergonomic editing” means --
58+
this key-adding code could be in the application or in the python-tuf library:
59+
60+
```python
61+
with repository.edit(“targets”) as targets:
62+
# adds a key for role1 (as an example, arbitrary edits are allowed)
63+
targets.add_key(“role1”, key)
64+
```
65+
66+
This code loads current targets metadata for editing, adds the key to a role,
67+
and handles version and expiry bumps before persisting the new targets version.
68+
The reason for the context manager style is that it manages two things
69+
simultaneously:
70+
* Hides the complexity of loading and persisting metadata, and updating expiry
71+
and versions from the editing code (by putting it in the repository
72+
implementation – which may still be provided by the application)
73+
* Still allows completely arbitrary edits on the metadata in question: now the
74+
library does not need to anticipate what application wants to do and on the
75+
other hand library can still provide e.g. snapshot functionality without
76+
knowing about the application decisions mentioned in previous point.
77+
78+
Other designs do not seem to manage both of these.
79+
80+
![Design: How components are used](repository-library-design-usage.jpg)
81+
82+
The core idea here is that because editing is ergonomic enough, when new
83+
functionality (like “developer uploads new targets”) is added, _it can be added
84+
at any level_: the application might add a `handle_new_target_files()` method
85+
that adds a bunch of targets into the metadata, but one of the previous layers
86+
could offer that as a helper function as well: code in both cases would look
87+
similar as it would use the common editing interface.
88+
89+
There are a few additional items worth mentioning:
90+
* Private key management: the Repository API should come with a “keyring
91+
abstraction” -- a way for the application to provide roles’ private keys for
92+
the Repository to use. Some implementations could be provided as well.
93+
* Validating repository state: the design is very much focused on enabling
94+
efficient editing of individual metadata. Implementations are also likely to
95+
be interested in validating (after some edits) that the repository is correct
96+
according to client workflow and that it contains the expected changes. The
97+
Repository API should provide some validation, but we should recognise that
98+
validation may be implementation specific.
99+
* Improved metadata editing: There are a small number of improvements that
100+
could be made to metadata editing. These do not necessarily need to be part
101+
of the repository API: they could be part of Metadata API as well
102+
103+
It would make sense for python-tuf to ship with at least one concrete
104+
Repository implementation: possibly a repo.py look alike. This implementation
105+
should not be part of the library but an example.
106+
107+
## Details
108+
109+
This section includes links to a Proof of Concept implementation in
110+
[repository-editor-for-tuf](https://github.com/vmware-labs/repository-editor-for-tuf/):
111+
it should not be seen as the exact proposed API but a prototype of the ideas.
112+
113+
The ideas in this document map to POC components like this:
114+
115+
| Concept | repository-editor-for-tuf implementation |
116+
|-|-|
117+
| Repository API | [librepo/repo.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/librepo/repo.py), [librepo/keys.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/librepo/repo.py) |
118+
| Example of repository implementation | [git_repo.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/git_repo.py) |
119+
|Application code | [cli.py (command line app)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/cli.py), [keys_impl.py (keyring implementation)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/keys_impl.py) |
120+
| Repository validation | [verifier.py (very rough, not intended for python-tuf)](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/verifier.py)
121+
| Improved Metadata editing | [helpers.py](https://github.com/vmware-labs/repository-editor-for-tuf/blob/main/tufrepo/helpers.py)
122+
123+
124+
### Repository API
125+
126+
Repository itself is a minimal abstract class: The value of this class is in
127+
defining the abstract method signatures (most importantly `_load`, `_save()`,
128+
`edit()`) that enable ergonomic metadata editing. The Repository class in this
129+
proposal includes concrete implementations only for the following:
130+
* `sign()` -- signing without editing metadata payload
131+
* `snapshot()` -- updates snapshot and timestamp metadata based on given input.
132+
Note that a concrete Repository implementation could provide an easier to use
133+
snapshot that does not require input (see example in git_repo.py)
134+
135+
More concrete implementations (see cli.py for examples) could be added to
136+
Repository itself but none seem essential at this point.
137+
138+
The API requires a “Keyring” abstraction that the repository code can use to
139+
lookup a set of signers for a specific role. Specific implementations of
140+
Keyring could include a file-based keyring for testing, env-var keyring for CI
141+
use, etc. Some implementations should be provided in the python-tuf code base
142+
and more could be implemented in applications.
143+
144+
_Prototype status: Prototype Repository and Keyring abstractions exist in
145+
librepo/repo.py._
146+
147+
### Example of Repository implementation
148+
149+
The design decisions that the included example `GitRepository` makes are not
150+
important but provide an example of what is possible:
151+
* Metadata versions are stored in files in git, with filenames that allow
152+
serving the metadata directory as is over HTTP
153+
* Version bumps are made based on git status (so edits in staging area only
154+
bump version once)
155+
* “Current version” when loading metadata is decided based on filenames on disk
156+
* Files are removed once they are no longer part of the snapshot (to keep
157+
directory uncluttered)
158+
* Expiry times are decided based on an application specific metadata field
159+
* Private keys can be stored in a file or in environment variables (for CI use)
160+
161+
Note that GitRepository implementation is significantly larger than the
162+
Repository interface -- but all of the complexity in GitRepository is really
163+
related to the design decisions made there.
164+
165+
_Prototype status: The GitRepository example exists in git_repo.py._
166+
167+
### Validating repository state
168+
169+
This is mostly undesigned but something built on top of TrustedMetadataSet
170+
(currently ngclient component) might work as a way to easily check specific
171+
aspects like:
172+
* Is top-level metadata valid according to client workflow
173+
* Is a role included in the snapshot and the delegation tree
174+
175+
It’s likely that different implementations will have different needs though: a
176+
command line app for small repos might want to validate loading all metadata
177+
into memory, but a server application hosting tens of thousands of pieces of
178+
metadata is unlikely to do so.
179+
180+
_Prototype status: A very rough implementation exists in verifier.py : this is
181+
unlikely to be very useful_
182+
183+
### Improved metadata editing
184+
185+
Currently the identified improvement areas are:
186+
* Metadata initialization: this could potentially be improved by adding
187+
default argument values to Metadata API constructors
188+
* Modifying and looking up data about roles in delegating metadata
189+
(root/targets): they do similar things but root and targets do not have
190+
identical API. This may be a very specific use case and not interesting
191+
for some applications
192+
193+
_Prototype status: Some potential improvements have been collected in
194+
helpers.py_

0 commit comments

Comments
 (0)