|
| 1 | +# Repository library design built on top of Metadata API |
| 2 | + |
| 3 | + |
| 4 | +## Context and Problem Statement |
| 5 | + |
| 6 | +The Metadata API provides a modern Python API for accessing individual pieces |
| 7 | +of metadata. It does not provide any wider context help to someone looking to |
| 8 | +implement a TUF repository. |
| 9 | + |
| 10 | +The legacy python-tuf implementation offers tools for this but suffers from |
| 11 | +some issues (as do many other implementations): |
| 12 | +* There is a _very_ large amount of code to maintain: repo.py, |
| 13 | + repository_tool.py and repository_lib.py alone are almost 7000 lines of code. |
| 14 | +* The "library like" parts of the implementation do not form a good coherent |
| 15 | + API: methods routinely have a large number of arguments, code still depends |
| 16 | + on globals in a major way and application (repo.py) still implements a lot of |
| 17 | + "repository code" itself |
| 18 | +* The "library like" parts of the implementation make decisions that look like |
| 19 | + application decisions. As an example, repository_tool loads _every_ metadata |
| 20 | + file in the repository: this is fine for CLI that operates on a small |
| 21 | + repository but is unlikely to be a good choice for PyPI. |
| 22 | + |
| 23 | + |
| 24 | +## Decision Drivers |
| 25 | + |
| 26 | +* There is a consensus on removing the legacy code from python-tuf due to |
| 27 | + maintainability issues |
| 28 | +* Metadata API makes modifying metadata far easier than legacy code base: this |
| 29 | + makes significantly different designs possible |
| 30 | +* Not providing a "repository library" (and leaving implementers on their own) |
| 31 | + may be a short term solution because of the previous point, but it does seem |
| 32 | + like the project would benefit from some shared repository code and shared |
| 33 | + repository design |
| 34 | +* Maintainability of new library code must be a top concern |
| 35 | +* Allowing a wide range of repository implementations (from CLI tools to |
| 36 | + minimal in-memory implementations to large scale applications like Warehouse) |
| 37 | + would be good: unfortunately these can have wildly differing requirements |
| 38 | + |
| 39 | + |
| 40 | +## Considered Options |
| 41 | + |
| 42 | +1. No repository packages |
| 43 | +2. repository_tool -like API |
| 44 | +3. Minimal repository abstraction |
| 45 | + |
| 46 | + |
| 47 | +## Decision Outcome |
| 48 | + |
| 49 | +Option 3: Minimal repository abstraction |
| 50 | + |
| 51 | +While option 1 might be used temporarily, the goal should be to implement a |
| 52 | +minimal repository abstraction as soon as possible: this should give the |
| 53 | +project a path forward where the maintenance burden is reasonable and results |
| 54 | +should be usable very soon. The python-tuf repository functionality can be |
| 55 | +later extended as ideas are experimented with in upstream projects and in |
| 56 | +python-tuf example code. |
| 57 | + |
| 58 | +The concept is still unproven but validating the design should be straight |
| 59 | +forward: decision could be re-evaluated in a few months if not in weeks. |
| 60 | + |
| 61 | + |
| 62 | +## Pros and Cons of the Options |
| 63 | + |
| 64 | +### No repository packages |
| 65 | + |
| 66 | +Metadata API makes editing the repository content vastly simpler. There are |
| 67 | +already repository implementations built with it (RepositorySimulator in |
| 68 | +python-tuf tests is an in-memory implementation, while |
| 69 | +repository-editor-for-tuf is an external CLI tool) so clearly a repository |
| 70 | +library is not an absolute requirement. |
| 71 | + |
| 72 | +Not providing repository packages in python-tuf does mean that external |
| 73 | +projects could experiment and create implementations without adding to the |
| 74 | +maintenance burden of python-tuf. This would be the easiest way to iterate many |
| 75 | +different designs and hopefully find good ones in the end. |
| 76 | + |
| 77 | +That said, there are some tricky parts of repository maintenance (e.g. |
| 78 | +initialization, snapshot update, hashed bin management) that would benefit from |
| 79 | +having a canonical implementation. Likewise, a well designed library could make |
| 80 | +some repeated actions (e.g. version bumps, expiry updates, signing) much easier |
| 81 | +to manage. |
| 82 | + |
| 83 | +### repository_tool -like API |
| 84 | + |
| 85 | +It won't be possible to support the repository_tool API as it is but a similar |
| 86 | +one would certainly be an option. |
| 87 | + |
| 88 | +This would likely be the easiest upgrade path for any repository_tool users out |
| 89 | +there. The implementation would not be a huge amount of work as Metadata API |
| 90 | +makes many things easier. |
| 91 | + |
| 92 | +However, repository_tool (and parts of repo.py) are not a great API. It is |
| 93 | +likely that a similar API suffers from some of the same issues: it might end up |
| 94 | +being a substantial amount of code that is only a good fit for one application. |
| 95 | + |
| 96 | +### Minimal repository abstraction |
| 97 | + |
| 98 | +python-tuf could define a tiny repository API that |
| 99 | +* provides carefully selected core functionality (like core snapshot update) |
| 100 | + but... |
| 101 | +* does not implement all repository actions itself, instead i makes it easy |
| 102 | + for the application code to do them |
| 103 | +* leaves application details to specific implementations (examples of decisions |
| 104 | + a library should not always decide: "are targets stored with the repo?", |
| 105 | + "which versions of metadata are stored?", "when to load metadata?", "when to |
| 106 | + unload metadata?", "when to bump metadata version?", "what is the new expiry |
| 107 | + date?", "which targets versions should be part of new snapshot?") |
| 108 | + |
| 109 | +python-tuf could also provide one or more implementations of this abstraction |
| 110 | +as examples -- this could include a repo.py- or repository_tool-like |
| 111 | +implementation. |
| 112 | + |
| 113 | +This could be a compromise that allows: |
| 114 | +* low maintenance burden on python-tuf: initial library could be tiny |
| 115 | +* sharing the important, canonical parts of a TUF repository implementation |
| 116 | +* ergonomic repository modification, meaning most actions do not have to be in |
| 117 | + the core code |
| 118 | +* very different repository implementations using the same core code and the |
| 119 | + same abstract API |
| 120 | + |
| 121 | +The approach does have some downsides: |
| 122 | +* it's not a drop in replacement for repository_tool or repo.py |
| 123 | +* A prototype has been implemented (see Links below) but the concept is still |
| 124 | + unproven |
| 125 | + |
| 126 | +## Links |
| 127 | +[Design document for minimal repository abstraction](https://docs.google.com/document/d/1YY83J4ihztsi1Qv0dJ22EcqND8dT80AGTduwgh0trpY) |
| 128 | +[Prototype implementation of minimal repository abstraction](https://github.com/vmware-labs/repository-editor-for-tuf/) |
0 commit comments