|
| 1 | +--- |
| 2 | +title: Internals and Technical Details |
| 3 | +--- |
| 4 | + |
| 5 | +# Internals and Technical Details |
| 6 | + |
| 7 | +!!! note |
| 8 | + |
| 9 | + This page is **not useful** to *users* of trusted publishers! |
| 10 | + |
| 11 | + It's intended primarily for PyPI developers and developers of other |
| 12 | + package indices looking to support similar authentication models. |
| 13 | + |
| 14 | +## How trusted publishing works |
| 15 | + |
| 16 | +PyPI's trusted publishing functionality is built on top of |
| 17 | +[OpenID Connect], or "OIDC" for short. |
| 18 | + |
| 19 | +OIDC gives *services* (like GitHub Actions) a way to *provably identify* |
| 20 | +themselves: an authorized entity (such as a GitHub user, or an automated |
| 21 | +workflow) can present an *OIDC token* to a third-party service. That |
| 22 | +party service can then verify the token and determine whether it's |
| 23 | +authorized to perform some other action. |
| 24 | + |
| 25 | +In the context of trusted publishing, the machinery is as follows: |
| 26 | + |
| 27 | +* *OIDC identity providers* like GitHub ("providers" for short) generate OIDC |
| 28 | + tokens that contain scoped *claims*, which convey appropriate authorization |
| 29 | + scopes. |
| 30 | + |
| 31 | + * For example, the `repo` claim might be bound to the value |
| 32 | + `octo-org/example`, indicating that the token should be authorized |
| 33 | + to access resources for which `octo-org/example` is a valid repository. |
| 34 | + |
| 35 | +* *Trusted publishers* are pieces of configuration on PyPI that tell PyPI |
| 36 | + *which* OIDC providers to trust, and *when* (i.e., which specific set |
| 37 | + of claims to consider valid). |
| 38 | + |
| 39 | + * For example, a trusted publisher configuration for GitHub Actions might |
| 40 | + specify `repo: octo-org/example` with `workflow: release.yml` and |
| 41 | + `environment: release`, indicating that a presented OIDC token **must** |
| 42 | + contain exactly those claims to be considered valid. |
| 43 | + |
| 44 | +* *Token exchange* is how PyPI converts OIDC tokens into credentials |
| 45 | + (PyPI API tokens) that can be used to authenticate against the package upload |
| 46 | + endpoint. |
| 47 | + |
| 48 | + * Token exchange boils down to a matching process between a presented |
| 49 | + OIDC token and every trusted publisher currently configured on PyPI: |
| 50 | + the token's signature is first verified (to ensure that it's actually |
| 51 | + coming from the expected provider), and then its claims are matched |
| 52 | + against zero or more projects with registered trusted publishers. |
| 53 | + |
| 54 | + If the OIDC token corresponds to one or more trusted publishers, then |
| 55 | + a short-lived (15 minute) PyPI API token is issued. This API token |
| 56 | + is scoped to every project with a matching trusted publisher, meaning |
| 57 | + that it can be used to upload to multiple projects (if so configured). |
| 58 | + |
| 59 | +If everything goes correctly, a successful trusted publishing flow results in |
| 60 | +a short-lived PyPI API token *without any user interaction*, which in turn |
| 61 | +offers security and ergonomic benefits to PyPI packagers: users no longer |
| 62 | +have to worry about token provisioning or revocation. |
| 63 | + |
| 64 | +## Q&A |
| 65 | + |
| 66 | +### Why does trusted publishing use a "two-phase" token exchange? |
| 67 | + |
| 68 | +As noted above, trusted publishing uses a "token exchange" mechanism, which |
| 69 | +happens in two phases: |
| 70 | + |
| 71 | +1. The uploading client presents an OIDC token, which PyPI verifies. |
| 72 | + If valid, PyPI responds with a valid and appropriately scoped PyPI API token. |
| 73 | + |
| 74 | +1. The uploading client takes the valid PyPI API token that it was given |
| 75 | + and uses it as normal. |
| 76 | + |
| 77 | +In principle, this is more complicated than necessary: PyPI could |
| 78 | +instead take the OIDC token *directly* and treat it as a special case during |
| 79 | +API token handling, skipping a network round-trip between the uploading |
| 80 | +client and the package index. |
| 81 | + |
| 82 | +While conceptually simpler, a "one-phase" token exchange presents problems |
| 83 | +of its own: |
| 84 | + |
| 85 | +1. *Isolation of concerns*: conceptually, an OIDC token is an *externally |
| 86 | + issued* token, with external concerns: it has failure modes that aren't |
| 87 | + internal to PyPI itself (e.g. a failure of the issuing identity provider |
| 88 | + to sign correctly). |
| 89 | + |
| 90 | + Keeping these concerns isolated from PyPI's actual business logic |
| 91 | + ensures that they remain encapsulated and do not impose design |
| 92 | + or security constraints on PyPI itself (e.g., mandating that |
| 93 | + PyPI use OIDC tokens in places where they are a poor fit). |
| 94 | + |
| 95 | +1. *Complications to existing authentication and authorization logic*: |
| 96 | + PyPI has a large pre-existing body of AuthN and AuthZ code. Most of the |
| 97 | + existing code for API tokens is directly adapted to the PyPI API token |
| 98 | + format, which is based on |
| 99 | + [Macaroons]. |
| 100 | + |
| 101 | + Handling OIDC tokens (which are [JSON Web Tokens] under the hood) would have |
| 102 | + required significant duplication of existing codepaths, which in turn |
| 103 | + means an increased testing (and vulnerability) surface. By exchanging |
| 104 | + OIDC tokens for API tokens in PyPI's existing format, our implementation |
| 105 | + could reuse our existing (and well-tested) codepaths without any significant |
| 106 | + changes. |
| 107 | + |
| 108 | +1. *Automatic secret scanning and revocation challenges*: PyPI is a partner |
| 109 | + in [GitHub's secret scanning system], which allows PyPI to automatically |
| 110 | + revoke PyPI API tokens that are accidentally leaked in public repositories. |
| 111 | + |
| 112 | + This system relies on PyPI tokens having a unique prefix: they all begin |
| 113 | + with `pypi-`. Without that prefix, GitHub would be unable to efficiently |
| 114 | + scan public repositories for tokens. |
| 115 | + |
| 116 | + OIDC tokens are issued by independent providers, meaning that PyPI has |
| 117 | + no ability to impose a `pypi-` prefix on them. Moreover, OIDC tokens |
| 118 | + are strictly defined as [JSON Web Tokens], meaning that they appear |
| 119 | + as mostly unstructured random characters. This makes them difficult to scan |
| 120 | + for. Finally, even an effective scanner for JWTs would need to report |
| 121 | + every compromised JWT to both its issuer (e.g., GitHub itself) *and* its |
| 122 | + consumer (e.g., PyPI), introducing complexity and additional |
| 123 | + failure modes during revocation. |
| 124 | + |
| 125 | + Exchanging OIDC tokens for PyPI API tokens completely sidesteps all of these |
| 126 | + problems. |
| 127 | + |
| 128 | +While these reasons are documented for PyPI, they are likely some of the |
| 129 | +same reasons why other "federated" consumers of OIDC (like cloud providers) |
| 130 | +do similar "two-phase" exchange mechanisms. |
| 131 | + |
| 132 | +### Why is the PyPI project to publisher relationship "many-many"? |
| 133 | + |
| 134 | +If you play around with trusted publishers on PyPI, you'll notice that |
| 135 | +PyPI projects can have multiple publishers, and individual publishers |
| 136 | +can be registered to multiple projects. |
| 137 | + |
| 138 | +This is a "many-many" relationship between PyPI projects and their trusted |
| 139 | +publishers which, like "two-phase" exchange, seems more complicated in |
| 140 | +principle than necessary. |
| 141 | + |
| 142 | +In practice, this many-many relationship addresses publishing patterns commonly |
| 143 | +used by the Python packaging community: |
| 144 | + |
| 145 | +1. *One publisher, many projects*: it's not uncommon for several related |
| 146 | + PyPI projects to share a single source repository. Moreover, it's not |
| 147 | + uncommon for several related PyPI projects to share the same release |
| 148 | + workflow, due to tandem releases (e.g., a simultaneous release |
| 149 | + of a library package and its corresponding CLI tool). |
| 150 | + |
| 151 | + Trusted publishing's design accommodates this use case: maintainers |
| 152 | + can use the same `release.yml` workflow for all of their packages, |
| 153 | + rather than having to split it up by packages. |
| 154 | + |
| 155 | +1. *One project, many publishers*: PyPI contains a large number of built |
| 156 | + distributions ("wheels"), some of which are "binary wheels" that contain |
| 157 | + processor, operating system, or platform-specific binaries. |
| 158 | + |
| 159 | + Because these binaries are specific to individual platforms, they frequently |
| 160 | + must be built on separate platforms, often on dedicated builder |
| 161 | + configurations for each platform. |
| 162 | + |
| 163 | + From there, it is common to have each individual platform builder also |
| 164 | + perform releases for that platform: Linux-specific wheels are uploaded |
| 165 | + by the Linux builder, etc. |
| 166 | + |
| 167 | + This is arguably **not best practice**, in terms of reliability and isolation |
| 168 | + of concerns: the best practice would be to *collect* all platform-specific |
| 169 | + builds in a final platform-agnostic publishing step, which could then |
| 170 | + be a single publisher. |
| 171 | + |
| 172 | + However, in the interest of getting trusted publishers into users' hands |
| 173 | + without requiring them to make significant unrelated changes to the builds, |
| 174 | + the trusted publishing feature allows users to register multiple |
| 175 | + publishers against a single project. Consequently, `sampleproject` |
| 176 | + can be published from both `release-linux.yml` and `release-macos.yml` |
| 177 | + without needing to be refactored into a single `release.yml`. |
| 178 | + |
| 179 | +[OpenID Connect]: https://openid.net/connect/ |
| 180 | + |
| 181 | +[Macaroons]: https://en.wikipedia.org/wiki/Macaroons_(computer_science) |
| 182 | + |
| 183 | +[JSON Web Tokens]: https://en.wikipedia.org/wiki/JSON_Web_Token |
| 184 | + |
| 185 | +[GitHub's secret scanning system]: https://docs.github.com/en/code-security/secret-scanning/about-secret-scanning |
0 commit comments