Skip to content

Notgithub: a service to simulate GitHub token scanning #9269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions dev/environment
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,5 @@ WAREHOUSE_LEGACY_DOMAIN=pypi.python.org

VAULT_URL="http://vault:8200"
VAULT_TOKEN="an insecure vault access token"

GITHUB_TOKEN_SCANNING_META_API_URL="http://notgithub:8000/meta/public_keys/token_scanning"
7 changes: 7 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,10 @@ services:
- "8125:8125/udp"
volumes:
- ./dev/notdatadog.py:/opt/warehouse/dev/notdatadog.py

notgithub:
image: ewjoachim/notgithub-token-scanning
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

environment:
NOTGITHUB_DEFAULT_URL: "http://web:8000/_/github/disclose-token"
ports:
- "8964:8000"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picked a random port to avoid clashing.

1 change: 1 addition & 0 deletions docs/development/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ or the `distutils-sig mailing list`_, to ask questions or get involved.
development-database
cloud
malware-checks
token-scanning

.. _`GitHub`: https://github.com/pypa/warehouse
.. _`"What to put in your bug report"`: http://www.contribution-guide.org/#what-to-put-in-your-bug-report
Expand Down
100 changes: 100 additions & 0 deletions docs/development/token-scanning.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
Token Scanning
==============

People make mistakes. Sometimes, they post their PyPI tokens publicly. Some
content managers run regexes to try and identify published secrets, and ideally
have them deactivated. PyPI has started integrating with such systems in order
to help secure packages.

.. contents::
:local:

How to recognize a PyPI secret
------------------------------

A PyPI API token is a string consisting of a prefix (``pypi``), a separator
(``-``) and a macaroon serialized with PyMacaroonv2, which means it's the
``base64`` of::

\x02\x01\x08pypi.org\x02\x01b

Thanks to this, we know that a PyPI token is bound to start with::

pypi-AgEIcHlwaS5vcmc[A-Za-z0-9-_]{70,}

A token can be arbitrary long because we may add arbitrary many caveats. For
more details on the token format, see `pypitoken
<https://pypitoken.readthedocs.io>`_.

GitHub Secret Scanning
----------------------

GitHub's Token scanning feature used to be called "Token Scanning" and is now
"Secret Scanning". You may find the 2 names. GitHub scans public commits with
the regex above (actually the limit to at least 130 characters long). For all
tokens identified within a "push" event, they send us reports in bulk. The
format is explained thouroughly in `their doc
<https://docs.github.com/en/developers/overview/secret-scanning>`_ as well as
in the `warehouse implementation ticket
<https://github.com/pypa/warehouse/issues/6051>`_.

In short: they send us a cryptographically signed payload describing each
leaked token alongside with a public URL pointing to it.

How to test it manually
^^^^^^^^^^^^^^^^^^^^^^^

A fake github service is launched by Docker Compose. Head your browser to
``http://localhost:8964``. Create/reorder/... one ore more public keys, make
sure one key is marked as current, then write your payload, using the following
format:

.. code-block:: json

[{
"type": "pypi_api_token",
"token": "pypi-...",
"url": "https://example.com"
}]

Send your payload. It sends it to your local Warhouse. If a match is found, you
should find that:

- the token you sent has disappeared from the user account page,
- 2 new security events have been sent: one for the token deletion, one for the
notification email.

After you send the token, the page will reload, and you'll find the details of
the request at the bottom. If all went well, you should see a ``204`` ('No
Content').

Whether it worked or not, a bunch of metrics have been issued, you can see them
in the `notdatadog` container log.

GitLab Secret Detection
-----------------------

GitLab also has an equivalent mechanism, named "Secret Detection", not
implemented in Warehouse yet (see `#9280
<https://github.com/pypa/warehouse/issues/9280>`_).

PyPI token disclosure infrastructure
------------------------------------

The code is mainly in ``warehouse/integration/github``.
There are 3 main parts in handling a token disclosure report:

- The Web view, which is the top-level glue but does not implement the logic
- Vendor specific authenticity check & loading. In the case of GitHub, we check
that the payload and the associated signature match with the public keys
available in their meta-API
- (Supposedly-)Vendor-independent disclosure analysis:

- Each token is processed individually in its own celery task
- Token is analyzed, we check if its format is correct and if it
corresponds to a macaroon we have in the DB
- We don't check the signature. This is something that could change in the
future but for now, we consider that if a token identifier leaked, even
without a valid signature, it's enough to warrant deleting it.
- If it's valid, we delete it, log a security event and send an email
(which will spawn a second celery task)
56 changes: 47 additions & 9 deletions tests/unit/integration/github/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,15 +114,21 @@ def test_init(self):
metrics = pretend.stub()
session = pretend.stub()
token = "api_token"
url = "http://foo"
cache = utils.PublicKeysCache(cache_time=12)

verifier = utils.GitHubTokenScanningPayloadVerifier(
session=session, metrics=metrics, api_token=token, public_keys_cache=cache
api_url=url,
session=session,
metrics=metrics,
api_token=token,
public_keys_cache=cache,
)

assert verifier._session is session
assert verifier._metrics is metrics
assert verifier._api_token == token
assert verifier._api_url == url
assert verifier._public_keys_cache is cache

def test_verify_cache_miss(self):
Expand All @@ -148,6 +154,7 @@ def test_verify_cache_miss(self):
metrics = pretend.stub(increment=pretend.call_recorder(lambda str: None))
cache = utils.PublicKeysCache(cache_time=12)
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=session,
metrics=metrics,
api_token="api-token",
Expand Down Expand Up @@ -189,6 +196,7 @@ def test_verify_cache_hit(self):
}
]
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=session,
metrics=metrics,
api_token="api-token",
Expand Down Expand Up @@ -219,6 +227,7 @@ def test_verify_error(self):
metrics = pretend.stub(increment=pretend.call_recorder(lambda str: None))
cache = utils.PublicKeysCache(cache_time=12)
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=metrics,
api_token="api-token",
Expand All @@ -237,6 +246,7 @@ def test_verify_error(self):

def test_headers_auth_no_token(self):
headers = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
api_token=None,
Expand All @@ -246,6 +256,7 @@ def test_headers_auth_no_token(self):

def test_headers_auth_token(self):
headers = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
api_token="api-token",
Expand Down Expand Up @@ -274,6 +285,7 @@ def test_retrieve_public_key_payload(self):
metrics = pretend.stub(increment=pretend.call_recorder(lambda str: None))

verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=session,
metrics=metrics,
api_token="api-token",
Expand All @@ -282,7 +294,7 @@ def test_retrieve_public_key_payload(self):
assert verifier._retrieve_public_key_payload() == meta_payload
assert session.get.calls == [
pretend.call(
"https://api.github.com/meta/public_keys/token_scanning",
"http://foo",
headers={"Authorization": "token api-token"},
)
]
Expand All @@ -295,7 +307,10 @@ def test_get_cached_public_key_cache_hit(self):
cache.set(now=time.time(), value=cache_value)

verifier = utils.GitHubTokenScanningPayloadVerifier(
session=session, metrics=metrics, public_keys_cache=cache
api_url="http://foo",
session=session,
metrics=metrics,
public_keys_cache=cache,
)

assert verifier._get_cached_public_keys() is cache_value
Expand All @@ -306,7 +321,10 @@ def test_get_cached_public_key_cache_miss_no_cache(self):
cache = utils.PublicKeysCache(cache_time=12)

verifier = utils.GitHubTokenScanningPayloadVerifier(
session=session, metrics=metrics, public_keys_cache=cache
api_url="http://foo",
session=session,
metrics=metrics,
public_keys_cache=cache,
)

with pytest.raises(utils.CacheMiss):
Expand All @@ -322,7 +340,10 @@ def test_retrieve_public_key_payload_http_error(self):
get=lambda *a, **k: response,
)
verifier = utils.GitHubTokenScanningPayloadVerifier(
session=session, metrics=pretend.stub(), public_keys_cache=pretend.stub()
api_url="http://foo",
session=session,
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
)
with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
verifier._retrieve_public_key_payload()
Expand All @@ -338,7 +359,10 @@ def test_retrieve_public_key_payload_json_error(self):
)
session = pretend.stub(get=lambda *a, **k: response)
verifier = utils.GitHubTokenScanningPayloadVerifier(
session=session, metrics=pretend.stub(), public_keys_cache=pretend.stub()
api_url="http://foo",
session=session,
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
)
with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
verifier._retrieve_public_key_payload()
Expand All @@ -350,7 +374,10 @@ def test_retrieve_public_key_payload_connection_error(self):
session = pretend.stub(get=pretend.raiser(requests.ConnectionError))

verifier = utils.GitHubTokenScanningPayloadVerifier(
session=session, metrics=pretend.stub(), public_keys_cache=pretend.stub()
api_url="http://foo",
session=session,
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
)

with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
Expand All @@ -375,7 +402,10 @@ def test_extract_public_keys(self):
}
cache = utils.PublicKeysCache(cache_time=12)
verifier = utils.GitHubTokenScanningPayloadVerifier(
session=pretend.stub(), metrics=pretend.stub(), public_keys_cache=cache
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
public_keys_cache=cache,
)

keys = verifier._extract_public_keys(pubkey_api_data=meta_payload)
Expand Down Expand Up @@ -415,7 +445,10 @@ def test_extract_public_keys(self):
def test_extract_public_keys_error(self, payload, expected):
cache = utils.PublicKeysCache(cache_time=12)
verifier = utils.GitHubTokenScanningPayloadVerifier(
session=pretend.stub(), metrics=pretend.stub(), public_keys_cache=cache
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
public_keys_cache=cache,
)

with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
Expand All @@ -427,6 +460,7 @@ def test_extract_public_keys_error(self, payload, expected):

def test_check_public_key(self):
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
Expand All @@ -440,6 +474,7 @@ def test_check_public_key(self):

def test_check_public_key_error(self):
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
Expand All @@ -453,6 +488,7 @@ def test_check_public_key_error(self):

def test_check_signature(self):
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
Expand Down Expand Up @@ -482,6 +518,7 @@ def test_check_signature(self):

def test_check_signature_invalid_signature(self):
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
Expand Down Expand Up @@ -513,6 +550,7 @@ def test_check_signature_invalid_signature(self):

def test_check_signature_invalid_crypto(self):
verifier = utils.GitHubTokenScanningPayloadVerifier(
api_url="http://foo",
session=pretend.stub(),
metrics=pretend.stub(),
public_keys_cache=pretend.stub(),
Expand Down
Loading