Skip to content

Conversation

di
Copy link
Member

@di di commented May 3, 2023

Previously, we weren't adequately filtering on environment in find_publisher_by_issuer, which resulted in this function raising a MultipleResultsFound exception if more than one publisher was configured that matched the claimset.

Instead, the behavior should be as follows:

  • If no environment claim is present
    • If there's a publisher that doesn't restrict on environment, return it
    • or return None
  • If an environment claim is present
    • If there's a publisher that restricts on this environment, return it
    • If there's a publisher that doesn't restrict on environment, return it
    • or return None

Fixes https://python-software-foundation.sentry.io/issues/4150013748/.

@di di requested a review from a team as a code owner May 3, 2023 04:41
repository_name=repository_name,
repository_owner=repository_owner,
repository_owner_id=signed_claims["repository_owner_id"],
environment=signed_claims.get("environment"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: It looks like signed_claims could come in without an environment key.

By my reading, the first query in the try block would run a filter_by(..., environment=None and try to get one(), and if nothing comes back, issue a second identical query with one_or_none().

My question is:

  • If there's 2 claim records, one with an environment, and one without, the first query will get the env-specific one. Yay, that's the intent.
  • If there's 1 claim record, with no environment, why would the second query get run? Wouldn't the filter_by condition apply just the same?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: It looks like signed_claims could come in without an environment key.

Yep, that's correct: GitHub's OIDC JWTs don't contain an environment claim if one isn't explicitly configured.

I think the confusing bit here is the possible states:

  1. OIDC token with an environment
  2. OIDC token without an environment
  3. Trusted publisher with an environment configured
  4. Trusted publisher without an environment configured

State (3) must only match state (1), while state (4) can match either (1) or (2). So we need to explicitly carve out an environment=None case.

(This would have been nicer if we'd added the environment claim without broadening the uniqueness constraint to include it, but that would have made it impossible to register separate trusted publishers for different environments under the same workflow...)

Copy link
Member

@woodruffw woodruffw May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but I see what you mean -- I think the queries are slightly off here: the first should ensure that signed_claims["environment"] is actually present, while the second should remain explicitly environment=None. That would make the states clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's 1 claim record, with no environment, why would the second query get run? Wouldn't the filter_by condition apply just the same?

The second query is to capture the case where the signed claims have an environment key, but a publisher is configured with environment=None -- essentially, we can always fall back on this "wildcard" publisher no matter what, if it's present, if there isn't a publisher configured with a matching environment.

I think we need two queries regardless: one to check if there's a publisher that matches the environment, and one to check for a "wildcard" publisher. Using signed_claims.get("environment") in the first query here allows us to satisfy the first case and also short-circuit and only run one query when there is no environment in the signed claims at all. If we didn't do that, we'd have to add some more branching here, like:

I agree this is a little confusing though, let me see if I can make this more clear with some conditionals instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I think this is more comprehensible, even if we miss a short-circuiting optimization 🙂

Copy link
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the logic simpler, based on the environment existing.
Not we should only ever make a single DB call for the publisher.

I'm betting this query could be refactored a little more to construct the query body, and conditionally add the specifics to switch the environment flag, but that's fine to defer to another time.

@di di merged commit 4e91963 into pypi:main May 3, 2023
@di di deleted the fix-oidc-environment-filter branch May 3, 2023 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants