Skip to content

Commit 31e4b27

Browse files
woodruffwdi
andauthored
docs: internal docs for trusted publishers (#13537)
Signed-off-by: William Woodruff <[email protected]> Co-authored-by: Dustin Ingram <[email protected]>
1 parent a4058e3 commit 31e4b27

File tree

2 files changed

+186
-0
lines changed

2 files changed

+186
-0
lines changed

docs/mkdocs-user-docs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,3 +61,4 @@ nav:
6161
- "trusted-publishers/using-a-publisher.md"
6262
- "trusted-publishers/security-model.md"
6363
- "trusted-publishers/troubleshooting.md"
64+
- "trusted-publishers/internals.md"
Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
---
2+
title: Internals and Technical Details
3+
---
4+
5+
# Internals and Technical Details
6+
7+
!!! note
8+
9+
This page is **not useful** to *users* of trusted publishers!
10+
11+
It's intended primarily for PyPI developers and developers of other
12+
package indices looking to support similar authentication models.
13+
14+
## How trusted publishing works
15+
16+
PyPI's trusted publishing functionality is built on top of
17+
[OpenID Connect], or "OIDC" for short.
18+
19+
OIDC gives *services* (like GitHub Actions) a way to *provably identify*
20+
themselves: an authorized entity (such as a GitHub user, or an automated
21+
workflow) can present an *OIDC token* to a third-party service. That
22+
party service can then verify the token and determine whether it's
23+
authorized to perform some other action.
24+
25+
In the context of trusted publishing, the machinery is as follows:
26+
27+
* *OIDC identity providers* like GitHub ("providers" for short) generate OIDC
28+
tokens that contain scoped *claims*, which convey appropriate authorization
29+
scopes.
30+
31+
* For example, the `repo` claim might be bound to the value
32+
`octo-org/example`, indicating that the token should be authorized
33+
to access resources for which `octo-org/example` is a valid repository.
34+
35+
* *Trusted publishers* are pieces of configuration on PyPI that tell PyPI
36+
*which* OIDC providers to trust, and *when* (i.e., which specific set
37+
of claims to consider valid).
38+
39+
* For example, a trusted publisher configuration for GitHub Actions might
40+
specify `repo: octo-org/example` with `workflow: release.yml` and
41+
`environment: release`, indicating that a presented OIDC token **must**
42+
contain exactly those claims to be considered valid.
43+
44+
* *Token exchange* is how PyPI converts OIDC tokens into credentials
45+
(PyPI API tokens) that can be used to authenticate against the package upload
46+
endpoint.
47+
48+
* Token exchange boils down to a matching process between a presented
49+
OIDC token and every trusted publisher currently configured on PyPI:
50+
the token's signature is first verified (to ensure that it's actually
51+
coming from the expected provider), and then its claims are matched
52+
against zero or more projects with registered trusted publishers.
53+
54+
If the OIDC token corresponds to one or more trusted publishers, then
55+
a short-lived (15 minute) PyPI API token is issued. This API token
56+
is scoped to every project with a matching trusted publisher, meaning
57+
that it can be used to upload to multiple projects (if so configured).
58+
59+
If everything goes correctly, a successful trusted publishing flow results in
60+
a short-lived PyPI API token *without any user interaction*, which in turn
61+
offers security and ergonomic benefits to PyPI packagers: users no longer
62+
have to worry about token provisioning or revocation.
63+
64+
## Q&A
65+
66+
### Why does trusted publishing use a "two-phase" token exchange?
67+
68+
As noted above, trusted publishing uses a "token exchange" mechanism, which
69+
happens in two phases:
70+
71+
1. The uploading client presents an OIDC token, which PyPI verifies.
72+
If valid, PyPI responds with a valid and appropriately scoped PyPI API token.
73+
74+
1. The uploading client takes the valid PyPI API token that it was given
75+
and uses it as normal.
76+
77+
In principle, this is more complicated than necessary: PyPI could
78+
instead take the OIDC token *directly* and treat it as a special case during
79+
API token handling, skipping a network round-trip between the uploading
80+
client and the package index.
81+
82+
While conceptually simpler, a "one-phase" token exchange presents problems
83+
of its own:
84+
85+
1. *Isolation of concerns*: conceptually, an OIDC token is an *externally
86+
issued* token, with external concerns: it has failure modes that aren't
87+
internal to PyPI itself (e.g. a failure of the issuing identity provider
88+
to sign correctly).
89+
90+
Keeping these concerns isolated from PyPI's actual business logic
91+
ensures that they remain encapsulated and do not impose design
92+
or security constraints on PyPI itself (e.g., mandating that
93+
PyPI use OIDC tokens in places where they are a poor fit).
94+
95+
1. *Complications to existing authentication and authorization logic*:
96+
PyPI has a large pre-existing body of AuthN and AuthZ code. Most of the
97+
existing code for API tokens is directly adapted to the PyPI API token
98+
format, which is based on
99+
[Macaroons].
100+
101+
Handling OIDC tokens (which are [JSON Web Tokens] under the hood) would have
102+
required significant duplication of existing codepaths, which in turn
103+
means an increased testing (and vulnerability) surface. By exchanging
104+
OIDC tokens for API tokens in PyPI's existing format, our implementation
105+
could reuse our existing (and well-tested) codepaths without any significant
106+
changes.
107+
108+
1. *Automatic secret scanning and revocation challenges*: PyPI is a partner
109+
in [GitHub's secret scanning system], which allows PyPI to automatically
110+
revoke PyPI API tokens that are accidentally leaked in public repositories.
111+
112+
This system relies on PyPI tokens having a unique prefix: they all begin
113+
with `pypi-`. Without that prefix, GitHub would be unable to efficiently
114+
scan public repositories for tokens.
115+
116+
OIDC tokens are issued by independent providers, meaning that PyPI has
117+
no ability to impose a `pypi-` prefix on them. Moreover, OIDC tokens
118+
are strictly defined as [JSON Web Tokens], meaning that they appear
119+
as mostly unstructured random characters. This makes them difficult to scan
120+
for. Finally, even an effective scanner for JWTs would need to report
121+
every compromised JWT to both its issuer (e.g., GitHub itself) *and* its
122+
consumer (e.g., PyPI), introducing complexity and additional
123+
failure modes during revocation.
124+
125+
Exchanging OIDC tokens for PyPI API tokens completely sidesteps all of these
126+
problems.
127+
128+
While these reasons are documented for PyPI, they are likely some of the
129+
same reasons why other "federated" consumers of OIDC (like cloud providers)
130+
do similar "two-phase" exchange mechanisms.
131+
132+
### Why is the PyPI project to publisher relationship "many-many"?
133+
134+
If you play around with trusted publishers on PyPI, you'll notice that
135+
PyPI projects can have multiple publishers, and individual publishers
136+
can be registered to multiple projects.
137+
138+
This is a "many-many" relationship between PyPI projects and their trusted
139+
publishers which, like "two-phase" exchange, seems more complicated in
140+
principle than necessary.
141+
142+
In practice, this many-many relationship addresses publishing patterns commonly
143+
used by the Python packaging community:
144+
145+
1. *One publisher, many projects*: it's not uncommon for several related
146+
PyPI projects to share a single source repository. Moreover, it's not
147+
uncommon for several related PyPI projects to share the same release
148+
workflow, due to tandem releases (e.g., a simultaneous release
149+
of a library package and its corresponding CLI tool).
150+
151+
Trusted publishing's design accommodates this use case: maintainers
152+
can use the same `release.yml` workflow for all of their packages,
153+
rather than having to split it up by packages.
154+
155+
1. *One project, many publishers*: PyPI contains a large number of built
156+
distributions ("wheels"), some of which are "binary wheels" that contain
157+
processor, operating system, or platform-specific binaries.
158+
159+
Because these binaries are specific to individual platforms, they frequently
160+
must be built on separate platforms, often on dedicated builder
161+
configurations for each platform.
162+
163+
From there, it is common to have each individual platform builder also
164+
perform releases for that platform: Linux-specific wheels are uploaded
165+
by the Linux builder, etc.
166+
167+
This is arguably **not best practice**, in terms of reliability and isolation
168+
of concerns: the best practice would be to *collect* all platform-specific
169+
builds in a final platform-agnostic publishing step, which could then
170+
be a single publisher.
171+
172+
However, in the interest of getting trusted publishers into users' hands
173+
without requiring them to make significant unrelated changes to the builds,
174+
the trusted publishing feature allows users to register multiple
175+
publishers against a single project. Consequently, `sampleproject`
176+
can be published from both `release-linux.yml` and `release-macos.yml`
177+
without needing to be refactored into a single `release.yml`.
178+
179+
[OpenID Connect]: https://openid.net/connect/
180+
181+
[Macaroons]: https://en.wikipedia.org/wiki/Macaroons_(computer_science)
182+
183+
[JSON Web Tokens]: https://en.wikipedia.org/wiki/JSON_Web_Token
184+
185+
[GitHub's secret scanning system]: https://docs.github.com/en/code-security/secret-scanning/about-secret-scanning

0 commit comments

Comments
 (0)