-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix find_publisher_by_issuer
environment filter
#13566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
warehouse/oidc/utils.py
Outdated
repository_name=repository_name, | ||
repository_owner=repository_owner, | ||
repository_owner_id=signed_claims["repository_owner_id"], | ||
environment=signed_claims.get("environment"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It looks like signed_claims
could come in without an environment
key.
By my reading, the first query in the try
block would run a filter_by(..., environment=None
and try to get one()
, and if nothing comes back, issue a second identical query with one_or_none()
.
My question is:
- If there's 2 claim records, one with an environment, and one without, the first query will get the env-specific one. Yay, that's the intent.
- If there's 1 claim record, with no environment, why would the second query get run? Wouldn't the
filter_by
condition apply just the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It looks like
signed_claims
could come in without anenvironment
key.
Yep, that's correct: GitHub's OIDC JWTs don't contain an environment
claim if one isn't explicitly configured.
I think the confusing bit here is the possible states:
- OIDC token with an
environment
- OIDC token without an
environment
- Trusted publisher with an
environment
configured - Trusted publisher without an
environment
configured
State (3) must only match state (1), while state (4) can match either (1) or (2). So we need to explicitly carve out an environment=None
case.
(This would have been nicer if we'd added the environment
claim without broadening the uniqueness constraint to include it, but that would have made it impossible to register separate trusted publishers for different environments under the same workflow...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but I see what you mean -- I think the queries are slightly off here: the first should ensure that signed_claims["environment"]
is actually present, while the second should remain explicitly environment=None
. That would make the states clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's 1 claim record, with no environment, why would the second query get run? Wouldn't the
filter_by
condition apply just the same?
The second query is to capture the case where the signed claims have an environment
key, but a publisher is configured with environment=None
-- essentially, we can always fall back on this "wildcard" publisher no matter what, if it's present, if there isn't a publisher configured with a matching environment
.
I think we need two queries regardless: one to check if there's a publisher that matches the environment, and one to check for a "wildcard" publisher. Using signed_claims.get("environment")
in the first query here allows us to satisfy the first case and also short-circuit and only run one query when there is no environment in the signed claims at all. If we didn't do that, we'd have to add some more branching here, like:
I agree this is a little confusing though, let me see if I can make this more clear with some conditionals instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Mike Fiedler <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I think this is more comprehensible, even if we miss a short-circuiting optimization 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the logic simpler, based on the environment existing.
Not we should only ever make a single DB call for the publisher.
I'm betting this query could be refactored a little more to construct the query body, and conditionally add the specifics to switch the environment
flag, but that's fine to defer to another time.
Previously, we weren't adequately filtering on
environment
infind_publisher_by_issuer
, which resulted in this function raising aMultipleResultsFound
exception if more than one publisher was configured that matched the claimset.Instead, the behavior should be as follows:
environment
claim is presentenvironment
, return itNone
environment
claim is presentenvironment
, return itenvironment
, return itNone
Fixes https://python-software-foundation.sentry.io/issues/4150013748/.