Skip to content

Commit b12fb3c

Browse files
ewjoachimdi
andauthored
Notgithub: a service to simulate GitHub token scanning (#9269)
* Add notgithub service to simulate github token scanning * Add corresponding documentation Co-authored-by: Dustin Ingram <[email protected]>
1 parent 155c97f commit b12fb3c

File tree

10 files changed

+194
-23
lines changed

10 files changed

+194
-23
lines changed

dev/environment

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,5 @@ WAREHOUSE_LEGACY_DOMAIN=pypi.python.org
4343

4444
VAULT_URL="http://vault:8200"
4545
VAULT_TOKEN="an insecure vault access token"
46+
47+
GITHUB_TOKEN_SCANNING_META_API_URL="http://notgithub:8000/meta/public_keys/token_scanning"

docker-compose.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,3 +142,10 @@ services:
142142
- "8125:8125/udp"
143143
volumes:
144144
- ./dev/notdatadog.py:/opt/warehouse/dev/notdatadog.py
145+
146+
notgithub:
147+
image: ewjoachim/notgithub-token-scanning
148+
environment:
149+
NOTGITHUB_DEFAULT_URL: "http://web:8000/_/github/disclose-token"
150+
ports:
151+
- "8964:8000"

docs/development/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ or the `distutils-sig mailing list`_, to ask questions or get involved.
3232
development-database
3333
cloud
3434
malware-checks
35+
token-scanning
3536

3637
.. _`GitHub`: https://github.com/pypa/warehouse
3738
.. _`"What to put in your bug report"`: http://www.contribution-guide.org/#what-to-put-in-your-bug-report

docs/development/token-scanning.rst

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
Token Scanning
2+
==============
3+
4+
People make mistakes. Sometimes, they post their PyPI tokens publicly. Some
5+
content managers run regexes to try and identify published secrets, and ideally
6+
have them deactivated. PyPI has started integrating with such systems in order
7+
to help secure packages.
8+
9+
.. contents::
10+
:local:
11+
12+
How to recognize a PyPI secret
13+
------------------------------
14+
15+
A PyPI API token is a string consisting of a prefix (``pypi``), a separator
16+
(``-``) and a macaroon serialized with PyMacaroonv2, which means it's the
17+
``base64`` of::
18+
19+
\x02\x01\x08pypi.org\x02\x01b
20+
21+
Thanks to this, we know that a PyPI token is bound to start with::
22+
23+
pypi-AgEIcHlwaS5vcmc[A-Za-z0-9-_]{70,}
24+
25+
A token can be arbitrary long because we may add arbitrary many caveats. For
26+
more details on the token format, see `pypitoken
27+
<https://pypitoken.readthedocs.io>`_.
28+
29+
GitHub Secret Scanning
30+
----------------------
31+
32+
GitHub's Token scanning feature used to be called "Token Scanning" and is now
33+
"Secret Scanning". You may find the 2 names. GitHub scans public commits with
34+
the regex above (actually the limit to at least 130 characters long). For all
35+
tokens identified within a "push" event, they send us reports in bulk. The
36+
format is explained thouroughly in `their doc
37+
<https://docs.github.com/en/developers/overview/secret-scanning>`_ as well as
38+
in the `warehouse implementation ticket
39+
<https://github.com/pypa/warehouse/issues/6051>`_.
40+
41+
In short: they send us a cryptographically signed payload describing each
42+
leaked token alongside with a public URL pointing to it.
43+
44+
How to test it manually
45+
^^^^^^^^^^^^^^^^^^^^^^^
46+
47+
A fake github service is launched by Docker Compose. Head your browser to
48+
``http://localhost:8964``. Create/reorder/... one ore more public keys, make
49+
sure one key is marked as current, then write your payload, using the following
50+
format:
51+
52+
.. code-block:: json
53+
54+
[{
55+
"type": "pypi_api_token",
56+
"token": "pypi-...",
57+
"url": "https://example.com"
58+
}]
59+
60+
Send your payload. It sends it to your local Warhouse. If a match is found, you
61+
should find that:
62+
63+
- the token you sent has disappeared from the user account page,
64+
- 2 new security events have been sent: one for the token deletion, one for the
65+
notification email.
66+
67+
After you send the token, the page will reload, and you'll find the details of
68+
the request at the bottom. If all went well, you should see a ``204`` ('No
69+
Content').
70+
71+
Whether it worked or not, a bunch of metrics have been issued, you can see them
72+
in the `notdatadog` container log.
73+
74+
GitLab Secret Detection
75+
-----------------------
76+
77+
GitLab also has an equivalent mechanism, named "Secret Detection", not
78+
implemented in Warehouse yet (see `#9280
79+
<https://github.com/pypa/warehouse/issues/9280>`_).
80+
81+
PyPI token disclosure infrastructure
82+
------------------------------------
83+
84+
The code is mainly in ``warehouse/integration/github``.
85+
There are 3 main parts in handling a token disclosure report:
86+
87+
- The Web view, which is the top-level glue but does not implement the logic
88+
- Vendor specific authenticity check & loading. In the case of GitHub, we check
89+
that the payload and the associated signature match with the public keys
90+
available in their meta-API
91+
- (Supposedly-)Vendor-independent disclosure analysis:
92+
93+
- Each token is processed individually in its own celery task
94+
- Token is analyzed, we check if its format is correct and if it
95+
corresponds to a macaroon we have in the DB
96+
- We don't check the signature. This is something that could change in the
97+
future but for now, we consider that if a token identifier leaked, even
98+
without a valid signature, it's enough to warrant deleting it.
99+
- If it's valid, we delete it, log a security event and send an email
100+
(which will spawn a second celery task)

tests/unit/integration/github/test_utils.py

Lines changed: 47 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -114,15 +114,21 @@ def test_init(self):
114114
metrics = pretend.stub()
115115
session = pretend.stub()
116116
token = "api_token"
117+
url = "http://foo"
117118
cache = utils.PublicKeysCache(cache_time=12)
118119

119120
verifier = utils.GitHubTokenScanningPayloadVerifier(
120-
session=session, metrics=metrics, api_token=token, public_keys_cache=cache
121+
api_url=url,
122+
session=session,
123+
metrics=metrics,
124+
api_token=token,
125+
public_keys_cache=cache,
121126
)
122127

123128
assert verifier._session is session
124129
assert verifier._metrics is metrics
125130
assert verifier._api_token == token
131+
assert verifier._api_url == url
126132
assert verifier._public_keys_cache is cache
127133

128134
def test_verify_cache_miss(self):
@@ -148,6 +154,7 @@ def test_verify_cache_miss(self):
148154
metrics = pretend.stub(increment=pretend.call_recorder(lambda str: None))
149155
cache = utils.PublicKeysCache(cache_time=12)
150156
verifier = utils.GitHubTokenScanningPayloadVerifier(
157+
api_url="http://foo",
151158
session=session,
152159
metrics=metrics,
153160
api_token="api-token",
@@ -189,6 +196,7 @@ def test_verify_cache_hit(self):
189196
}
190197
]
191198
verifier = utils.GitHubTokenScanningPayloadVerifier(
199+
api_url="http://foo",
192200
session=session,
193201
metrics=metrics,
194202
api_token="api-token",
@@ -219,6 +227,7 @@ def test_verify_error(self):
219227
metrics = pretend.stub(increment=pretend.call_recorder(lambda str: None))
220228
cache = utils.PublicKeysCache(cache_time=12)
221229
verifier = utils.GitHubTokenScanningPayloadVerifier(
230+
api_url="http://foo",
222231
session=pretend.stub(),
223232
metrics=metrics,
224233
api_token="api-token",
@@ -237,6 +246,7 @@ def test_verify_error(self):
237246

238247
def test_headers_auth_no_token(self):
239248
headers = utils.GitHubTokenScanningPayloadVerifier(
249+
api_url="http://foo",
240250
session=pretend.stub(),
241251
metrics=pretend.stub(),
242252
api_token=None,
@@ -246,6 +256,7 @@ def test_headers_auth_no_token(self):
246256

247257
def test_headers_auth_token(self):
248258
headers = utils.GitHubTokenScanningPayloadVerifier(
259+
api_url="http://foo",
249260
session=pretend.stub(),
250261
metrics=pretend.stub(),
251262
api_token="api-token",
@@ -274,6 +285,7 @@ def test_retrieve_public_key_payload(self):
274285
metrics = pretend.stub(increment=pretend.call_recorder(lambda str: None))
275286

276287
verifier = utils.GitHubTokenScanningPayloadVerifier(
288+
api_url="http://foo",
277289
session=session,
278290
metrics=metrics,
279291
api_token="api-token",
@@ -282,7 +294,7 @@ def test_retrieve_public_key_payload(self):
282294
assert verifier._retrieve_public_key_payload() == meta_payload
283295
assert session.get.calls == [
284296
pretend.call(
285-
"https://api.github.com/meta/public_keys/token_scanning",
297+
"http://foo",
286298
headers={"Authorization": "token api-token"},
287299
)
288300
]
@@ -295,7 +307,10 @@ def test_get_cached_public_key_cache_hit(self):
295307
cache.set(now=time.time(), value=cache_value)
296308

297309
verifier = utils.GitHubTokenScanningPayloadVerifier(
298-
session=session, metrics=metrics, public_keys_cache=cache
310+
api_url="http://foo",
311+
session=session,
312+
metrics=metrics,
313+
public_keys_cache=cache,
299314
)
300315

301316
assert verifier._get_cached_public_keys() is cache_value
@@ -306,7 +321,10 @@ def test_get_cached_public_key_cache_miss_no_cache(self):
306321
cache = utils.PublicKeysCache(cache_time=12)
307322

308323
verifier = utils.GitHubTokenScanningPayloadVerifier(
309-
session=session, metrics=metrics, public_keys_cache=cache
324+
api_url="http://foo",
325+
session=session,
326+
metrics=metrics,
327+
public_keys_cache=cache,
310328
)
311329

312330
with pytest.raises(utils.CacheMiss):
@@ -322,7 +340,10 @@ def test_retrieve_public_key_payload_http_error(self):
322340
get=lambda *a, **k: response,
323341
)
324342
verifier = utils.GitHubTokenScanningPayloadVerifier(
325-
session=session, metrics=pretend.stub(), public_keys_cache=pretend.stub()
343+
api_url="http://foo",
344+
session=session,
345+
metrics=pretend.stub(),
346+
public_keys_cache=pretend.stub(),
326347
)
327348
with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
328349
verifier._retrieve_public_key_payload()
@@ -338,7 +359,10 @@ def test_retrieve_public_key_payload_json_error(self):
338359
)
339360
session = pretend.stub(get=lambda *a, **k: response)
340361
verifier = utils.GitHubTokenScanningPayloadVerifier(
341-
session=session, metrics=pretend.stub(), public_keys_cache=pretend.stub()
362+
api_url="http://foo",
363+
session=session,
364+
metrics=pretend.stub(),
365+
public_keys_cache=pretend.stub(),
342366
)
343367
with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
344368
verifier._retrieve_public_key_payload()
@@ -350,7 +374,10 @@ def test_retrieve_public_key_payload_connection_error(self):
350374
session = pretend.stub(get=pretend.raiser(requests.ConnectionError))
351375

352376
verifier = utils.GitHubTokenScanningPayloadVerifier(
353-
session=session, metrics=pretend.stub(), public_keys_cache=pretend.stub()
377+
api_url="http://foo",
378+
session=session,
379+
metrics=pretend.stub(),
380+
public_keys_cache=pretend.stub(),
354381
)
355382

356383
with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
@@ -375,7 +402,10 @@ def test_extract_public_keys(self):
375402
}
376403
cache = utils.PublicKeysCache(cache_time=12)
377404
verifier = utils.GitHubTokenScanningPayloadVerifier(
378-
session=pretend.stub(), metrics=pretend.stub(), public_keys_cache=cache
405+
api_url="http://foo",
406+
session=pretend.stub(),
407+
metrics=pretend.stub(),
408+
public_keys_cache=cache,
379409
)
380410

381411
keys = verifier._extract_public_keys(pubkey_api_data=meta_payload)
@@ -415,7 +445,10 @@ def test_extract_public_keys(self):
415445
def test_extract_public_keys_error(self, payload, expected):
416446
cache = utils.PublicKeysCache(cache_time=12)
417447
verifier = utils.GitHubTokenScanningPayloadVerifier(
418-
session=pretend.stub(), metrics=pretend.stub(), public_keys_cache=cache
448+
api_url="http://foo",
449+
session=pretend.stub(),
450+
metrics=pretend.stub(),
451+
public_keys_cache=cache,
419452
)
420453

421454
with pytest.raises(utils.GitHubPublicKeyMetaAPIError) as exc:
@@ -427,6 +460,7 @@ def test_extract_public_keys_error(self, payload, expected):
427460

428461
def test_check_public_key(self):
429462
verifier = utils.GitHubTokenScanningPayloadVerifier(
463+
api_url="http://foo",
430464
session=pretend.stub(),
431465
metrics=pretend.stub(),
432466
public_keys_cache=pretend.stub(),
@@ -440,6 +474,7 @@ def test_check_public_key(self):
440474

441475
def test_check_public_key_error(self):
442476
verifier = utils.GitHubTokenScanningPayloadVerifier(
477+
api_url="http://foo",
443478
session=pretend.stub(),
444479
metrics=pretend.stub(),
445480
public_keys_cache=pretend.stub(),
@@ -453,6 +488,7 @@ def test_check_public_key_error(self):
453488

454489
def test_check_signature(self):
455490
verifier = utils.GitHubTokenScanningPayloadVerifier(
491+
api_url="http://foo",
456492
session=pretend.stub(),
457493
metrics=pretend.stub(),
458494
public_keys_cache=pretend.stub(),
@@ -482,6 +518,7 @@ def test_check_signature(self):
482518

483519
def test_check_signature_invalid_signature(self):
484520
verifier = utils.GitHubTokenScanningPayloadVerifier(
521+
api_url="http://foo",
485522
session=pretend.stub(),
486523
metrics=pretend.stub(),
487524
public_keys_cache=pretend.stub(),
@@ -513,6 +550,7 @@ def test_check_signature_invalid_signature(self):
513550

514551
def test_check_signature_invalid_crypto(self):
515552
verifier = utils.GitHubTokenScanningPayloadVerifier(
553+
api_url="http://foo",
516554
session=pretend.stub(),
517555
metrics=pretend.stub(),
518556
public_keys_cache=pretend.stub(),

0 commit comments

Comments
 (0)