-
Notifications
You must be signed in to change notification settings - Fork 1k
[DRAFT] PyPI Observation Reporting Payload #14503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Seems really reasonable to me @ewdurbin! Will |
That's one of the questions we're trying to answer here. If there's particular IOC collections folks are pretty used to and happy with, we'd like to make that easy for y'all to report. |
Echoing @louislang here, looks good from our (Vipyr Security) end as far as what we were anticipating this system look like at least. The IOC discussion is a little complicated-- if this is something that's going to be extended forward in the future, I'm not sure ATT&CK is sustainable in that regard. (Can novice reporters reliably generate ATT&CK mappings, or are we going to use the e-mail system forever?) Like Phylum, we've also been looking at ATT&CK behavior mapping internally, but chaining automated detection rules to produce comprehensive summaries of the capabilities and behaviors specific to ATT&CK can be quite the task. Ultimately, I think IOC's is going to be whatever PyPI administrators want to make of it-- if you (PyPI) want to see direct IOC mappings with some standard for data aggregation reasons, then that seems reasonable. However, I'm not sure it does a whole lot for us (the researchers/reporters). It seems reasonable to me to make this a soft requirement-- and allow individuals within the trusted reporting sphere to classify reports after the fact if they so choose, as well as review and update those reports as necessary if possible. I can think of a fair few situations where our organization can discern malicious intent with a few IOC's, only to peel back the layers and discover that this isn't a discrete 'hey it runs some obfuscated code'. And honestly, there are situations where we (collectively) simply don't have the time nor the bandwidth to accurately classify these as well. So I can see this being a situation that degrades that internal warehousing of malicious packages, and subsequently, makes the system less effective. That was a little wordy, summary:
|
Hi! Thanks for the proposal - this is definitely on the right track! Basing the reporting payload on OSV is a good idea. We (OpenSSF) are using OSV for tracking malware (see https://github.com/ossf/malicious-packages), which seems like a good application of OSV. That said, I do have some comments on the specification above.
I have some vested interest in the IOCs being defined well, as it would be convenient to share the same definition for IOCs and other meta-data with our use in https://github.com/ossf/malicious-packages. |
@import-pandas-as-numpy Thanks for your detailed response!
I think we'll likely always keep the email system active, since we use it for more than malware reports, but are looking for ways to automate and speed up the handling. Re: IOCs - I totally hear you! We wanted to start the conversation, and see what folks are using.
Indeed! With our initial proposal, they are not required, rather we'd like to get clear on the format that folks might want to use. And it looks like there's definitely interest in ATT&CK, and others - so what if we left that flexibility open, and y'all provide whatever IOCs from those sets you have when you have them. |
@calebbrown Thanks for taking the time!
I tried to make this apparent earlier, maybe I missed the mark? We're intending on using this payload as an transaction payload between Reporters and PyPI, and we can then re-materialize the OSV-compliant payload for a database to store (likely https://github.com/pypa/advisory-database )
I considered using EVIDENCE since its description kinda matched what we want, but was also overly broad for validation. If there's some invocation I can layer on to jsonschema to get that, that's be awesome! It might mean some more concrete Definitions for more types of things, so that they can be used as a
This is why I love these conversations - I'm not entirely certain, so let's figure that out together! It's apparent that different shops use this term, as well as classifications and categorizations, differently. Our intent is to have reporters supply as much structured details as they can, completely recognizing that there's a ton of hard-to-categorize-in-time issues. If I'm misusing the term
Nothing specifically, other than probably didn't think of it that way. |
That makes sense! That said though, I'm also concerned about the Alternatively, would it work at all to place all necessary new fields/types (e.g. INSPECTOR_URL) into The other differences you listed are around different requirements for fields (e.g. requiring more fields to exist, and the ecosystem be The one exception is the non-enforcement of requiring |
Hey gang! It's been a while, thanks for your responses and patience! There's been a lot of work streams leading back to this conversation. An update! I've since gotten a lot more clarity on this general topic, and am reconsidering the initial desire to have our inbound API reporting payload be very similar to the OSV schema "documentation" payload. As I've gone down the path of API design, since we're looking at using structural API segments, there's less of a need to duplicate the same information in the payload.
And since we'd want specific keys for our inbound reports like an inspector URL, we can thus not have to conflict with OSV spec at all on keys like I've got a draft PR underway, there's still some more work to do. But the general inbound reporting payload won't require any specific IOC mappings just yet, only our required keys for submitting, but we're very much open to watching the space unfold and learning more as we go. Any thoughts you have that conflict with this approach, please let me know! |
I'm going to close this issue, as we've made progress on our API design and have at least a single API payload for Projects in preview now. |
Background
As part of a broader malware handling project, we’re looking at creating the first implementation by which Reporters may submit Observations about packages on PyPI.
(Specific terminology may change as we learn more.)
We’re still working out other parts of the API infrastructure, but wanted to get a conversation started on what the API transaction payload between Reporters and PyPI might look like.
As a refresher, our currently security reporting process requests via email:
We have engaged a variety of security researchers and reporters who have been reporting malicious packages to PyPI Security to understand what aspects they prefer. We’ve learned more about their processes and use cases, and believe we’ve come up with something that is more automation-friendly, and leverages an evolving standard.
Proposal
We’ve learned that there’s a general desire for more standards in the overall security ecosystem, and the OSV project has defined a machine-friendly format for collecting published advisories.
The OSV Schema 1.6.0 is used for advisory databases.
While PyPI isn’t an advisory database, we thought using a format similar to OSV schema for an inbound payload format would be more sustainable long term, as we don’t invent our own standard, rather layer some extras on top of the existing one.
Minimal Example
A Terse, Minimal Example, that expresses only the absolutely required keys:
Changes from OSV schema
id
, as required top-level keyschema_version
,summary
,affected
,references
as required top-level keysecosystem
isPyPI
references.[INSPECTOR_URL]
starts withinspector.pypi.io
The only extension to the OSV Schema here (beyond validations) is adding
INSPECTOR_URL
toreferences
- to be explicit vsWEB
,EVIDENCE
or any other keys in the base schema.A lot of potential extras are in database_specific, and we can extend the schema to decide what’s required or not.
database_specific
seems more geared towards advisory databases rather than interaction payloads and for long-term storage.The verbose example below shows what that might look like.
Verbose Example
iocs
is shorthand for Indicators Of CompromiseI pulled the
iocs
values from an index curated by Backstabbers Knife Collection (private).There's also ATT&CK, CAPEC, CWE & CVE from MITRE - seems pretty useful to me.
Cool, so what?
We need your input on what other fields we should consider required, and what names for categories we'd allow.
Also, whether this format makes sense, or something else entirely is better or worse?
Please feel free to comment here, or if you'd prefer to converse privately, email me at mike at python dot org
Thanks for your feedback and engagement!
The text was updated successfully, but these errors were encountered: