diff --git a/rfds/0000-reporting-affected-artifacts.md b/rfds/0000-reporting-affected-artifacts.md new file mode 100644 index 0000000000..1e5fe57458 --- /dev/null +++ b/rfds/0000-reporting-affected-artifacts.md @@ -0,0 +1,366 @@ +# Reporting Affected Artifacts in CVE + +| Field | Value | +|:-----------------|:-------| +| RFD Submitter | Andrew Lilley Brinker | +| RFD Pull Request | [RFD #0000](https://github.com/CVEProject/cve-schema/pull/440) | + +## Summary +[summary]: #summary + +Today, CVE supports identifying affected products or packages using three +"identifier-like" constructs, with one more proposed in RFD #2, "Supporting +Package URLs in CVE". They are: + +- CPE, Common Platform Enumeration +- Vendor and product names, provided as a pair +- Collection URL and package names, provided as a pair +- (If accepted) Package URLs, also called "purls" + +While these coarse-grained identifiers are great for identifying affected +products or packages, they are insufficiently granular for identifying +_affected artifacts_. This makes it difficult for CNAs to report fine-grained +applicability information when they otherwise could. + +For example, a CNA may know that specific binaries they build and ship to users +are affected by a vulnerability. Today, there is not a clear, structured +mechanism for reporting identifiers for these affected binaries to CVE +consumers. + +This RFD proposes introducing support for reporting affected artifacts, by +adding a new optional `affectedArtifacts` field to `containers.cna`, which +would contain an array of objects specifying identifiers for artifacts affected +by a vulnerability. + +## Problem Statement +[problem-statement]: #problem-statement + +While CVE records today can contain substantial information about affected +products or packages, there isn't a clear and structured way to report +information about specific artifacts affected or not affected by a +vulnerability. + +This deficiency means CNAs who publish artifacts—such as prebuilt binaries, +archive files such as `.zip`s or `.tar.gz`s, script files, or configuration +files—lack a means to communicate when those artifacts are known to be +vulnerable or to not be vulnerable. + +For vulnerability managers, reacting to vulnerability disclosures with +coarse-grained identifiers for affected software requires maintaining accurate +software inventories, whether through Software Bills of Material, package +manifests (such as `package.json` or `Cargo.toml`), lockfiles (such as +`package-lock.json` or `Cargo.lock`), or other means. Without some method for +tracking what software is deployed in a production system, vulnerability +managers may struggle to turn identifiers provided in a CVE record into a clear +determination of applicability, and therefore also to respond quickly to +vulnerabilities when they're disclosed. Reducing the time-to-react for +vulnerability managers is a clear equity for the CVE program. + +## Proposed Solution +[proposed-solution]: #proposed-solution + +The presence of artifact identifiers in CVE Records would provide an additional +mechanism to vulnerability managers to identify applicable vulnerabilities. +For example, a hash of a known-vulnerable binary could be searched for on +production systems in addition to any deployed software inventories. + +Artifact identifiers also have the benefit of low false-positive matches. +Coarse-grained identifiers for products or packages may be decomposed further +with additional fields for objects in the `affected` array, such as `platforms`, +`versions`, `programFiles`, `programModules`, and more. These fields, and the +potential for ambiguity or complexity for checking in many of them, mean that +coarse-grained identifiers' applicability decisions can easily become complex +and require human intervention to assess, and even remain uncertain _despite_ +human intervention. + +By comparison, identifiers for affected artifacts, which are often based on +hashing file contents, are unlikely to produce false positives. The nature of +cryptographic hashing algorithms is that they are generally resistant to +engineering collisions, with properties such as collision resistance, preimage +resistance, and second-preimage resistance. The result of these properties is +that if a vulnerability manager finds a file in their system whose artifact +identifier matches an artifact identifier provided in a CVE Record, that +manager can act quickly with high confidence that the match is correct. + +Artifact identifiers have an additional benefit, because of their low false +positive rate and content-based construction, of being easy to automate and +check at scale. + +The following is the actual proposed change for the Record Format: + +### Add an `affectedArtifacts` field + +Add an `affectedArtifacts` field to the `cnaPublishedContainer` object, found +at the path `containers.cna` within a CVE Record. This new field would be an +array containing `affectedArtifact` objects. The specific edits to the schema +would be as follows: + +First, the introduction of the `affectedArtifacts` field within the +`cnaPublishedContainer` object: + +```json +"affectedArtifacts": { + "type": "array", + "description": "List of affected artifacts.", + "minItems": 1, + "items": {"$ref": "#/definitions/affectedArtifact"} +} +``` + +Second, the definition of the `affectedArtifact` type within the "definitions" +portion of the schema: + +```json +"affectedArtifact": { + "type": "object", + "description": "Provides information about a specific artifact affected by a vulnerability.", + "allOf": [ + { + "description": "An identifier-like field, to identify the artifact.", + "anyOf": [ + {"required": ["omniborArtifactID", "omniborArtifactType"]}, + {"required": ["sha256"]} + ] + }, + { + "description": "The status of the artifact.", + "anyOf": [ + {"required": ["status"]} + ] + } + ], + "properties": { + "omniborArtifactID": { + "type": "string", + "pattern": "^gitoid:blob:sha256:[0-9a-f]{64}$", + "description": "The OmniBOR Artifact ID of the artifact to be matched against.", + "examples": [ + "gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8", + "gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772", + "gitoid:blob:sha256:230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61" + ] + }, + "omniborArtifactType": { + "type": "string", + "enum": ["artifact", "buildInput"], + "description": "Specifies how consumers of the Artifact ID should search for matches. If the 'target' is 'artifact', then the Artifact ID is identifying an artifact which should be searched for directly (for example, within a file system by matching against Artifact IDs for files). If the 'target' is 'buildInput' then the Artifact ID is identifying a build input, and consumers should match the Artifact ID against IDs found in OmniBOR Input Manifests for their software." + }, + "sha256": { + "type": "string", + "pattern": "^[a-f0-9]{64}$", + "description": "The SHA-256 hash of the artifact.", + "examples": [ + "68e656b251e67e8358bef8483ab0d51c6619f3e7a1a9f0e75838d41ff368f728", + "2cc620f8a156b986806bc2757c0572d978d8cbfc4d25f0dfa7c552291bf68279", + "97272dc1b6ac7ca84735b797b4a04233b17fd55707f9c728fc3747e3f935f02c" + ] + }, + "status": { + "description": "The vulnerability status for the version or range of versions. For a range, the status may be refined by the 'changes' list.", + "$ref": "#/definitions/status" + }, + "version": { + "description": "The single version being described, or the version at the start of the range. By convention, typically 0 denotes the earliest possible version.", + "$ref": "#/definitions/version" + }, + "versionType": { + "type": "string", + "description": "The version numbering system used for specifying the range. This defines the exact semantics of the comparison (less-than) operation on versions, which is required to understand the range itself. 'Custom' indicates that the version type is unspecified and should be avoided whenever possible. It is included primarily for use in conversion of older data files.", + "minLength": 1, + "maxLength": 128, + "examples": [ + "custom", + "git", + "maven", + "python", + "rpm", + "semver" + ] + }, + "platforms": { + "description": "List of specific platforms if the vulnerability is only relevant in the context of these platforms (optional). Platforms may include execution environments, operating systems, virtualization technologies, hardware models, or computing architectures. The lack of this field implies that the other fields are applicable to all relevant platforms.", + "type": "array", + "minItems": 1, + "uniqueItems": true, + "items": { + "type": "string", + "examples": ["iOS", "Android", "Windows", "macOS", "x86", "ARM", "64 bit", "Big Endian", "iPad", "Chromebook", "Docker", "Model T"], + "maxLength": 1024 + } + } + } +} +``` + +The explanations of the fields for `affectedArtifact` objects is as follows: + +- `omniborArtifactID`: An OmniBOR Artifact Identifier, used to identify either + an artifact itself, such as a binary file, or to identify build inputs used to + produce the artifact. +- `omniborArtifactType`: The type associated with the `omniborArtifactID` field, + can be either `"artifact"` or `"buildInput"`. If `"artifact"` is used, then + the field is the Artifact ID of an artifact itself, such as a binary file. If + `"buildInput"` is used, then the field is the Artifact ID of a build input. + This field indicates to CVE consumers how to use the field in question. For + artifacts, they should search their systems and/or inventories for files with + a matching Artifact ID. For build inputs, they should search their OmniBOR + Input Manifests for IDs which match. +- `sha256`: The SHA-256 hash of the artifact in question. +- `status`: Indicates whether the identified artifact is affected, not affected, + or has an unknown affected status. +- `version`: The version applicable to the identified artifact, if relevant. +- `versionType`: If `"version"` is used, this indicates what type of version + is present, and should be used by CVE consumers to validate and interpret the + `"version"` field. +- `platforms`: A list of platforms, describing the specific platform the + identified artifact is intended for. + +Additionally, the data constraints on the `affectedArtifact` object ensure that +at least one set of identifier-like fields is present per object, and that each +object always includes a `"status"` field. + +Note that identifiers found in the same `affectedArtifact` object should be +interpreted as synonyms, identifying the same artifact. For example, an entry +in the `affectedArtifacts` array which contains both an `omniborArtifactID` +and `sha256` value should be interpreted as identifying only _one artifact_, +for which either identifier is valid. The presence of multiple identifiers is +intended only to make matching easier for CVE consumers by providing them with +options which may be more convenient depending on what identifiers or tooling +the consumer has available in their systems to support matching. + +### Use of this as a template for future identifiers + +This proposal is intended as a template for the introduction of more +fine-grained identifier types intended for identifying artifacts in the future. +Specifically, future identifiers should be added as new fields within the +`affectedArtifact` object inside the `affectedArtifacts` array. + +### Vendoring of the relevant specifications + +To ensure consistency about new identifier types added, the CVE project +should "vendor," meaning maintain its own public copy of, any relevant +specifications when those specifications are not versioned upstream. + +## Examples +[examples]: #examples + +The following is an example `affectedArtifacts` field, identifying three +binaries, one for each of Windows, macOS, and Linux systems on x86: + +```json +"affectedArtifacts": [ + { + "omniborArtifactID": "gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8", + "omniborArtifactType": "artifact", + "sha256": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824", + "status": "affected", + "version": "0.18.1", + "versionType": "semver", + "platforms": ["macOS", "x86"] + }, + { + "omniborArtifactID": "gitoid:blob:sha256:4043df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8", + "omniborArtifactType": "artifact", + "sha256": "40414dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824", + "status": "affected", + "version": "0.18.1", + "versionType": "semver", + "platforms": ["Windows", "x86"] + }, + { + "omniborArtifactID": "gitoid:blob:sha256:ccc4df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8", + "omniborArtifactType": "artifact", + "sha256": "ddd24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824", + "status": "affected", + "version": "0.18.1", + "versionType": "semver", + "platforms": ["Linux", "x86"] + } +] +``` + +## Impact Assessment +[impact-assessment]: #impact-assessment + +The addition of this new field would enable CNAs to report affected artifacts, +such as known-vulnerable prebuilt binaries shipped for versions of software +affected by a vulnerability, and would be complementary to the existing ability +in the Record Format to identify affected products and packages. + +For CVE consumers, the addition of this field would provide the ability to +search for the presence of known-vulnerable artifacts in their systems when +reported by CNAs. + +## Compatibility and Migration +[compatibility-and-migration]: #compatibility-and-migration + +This would be a minor change, as the addition of new optional fields is +considered non-breaking. + +CVE consumers could, if they wanted, gain the benefit of the new field by +updating their consumption logic to recognize the field and make use of its +contents. CVE consumers would only be broken if they incorrectly assume in their +consumption logic that no new optional fields will ever be added to the +`cnaContainer` object. + +## Success Metrics +[success-metrics]: #success-metrics + +The success of this proposal will depend on the adoption of the new field, +and the degree to which the new field provides value for CVE consumers. + +CNA adoption can be measured in reported CVEs. After a 6 month period from the +publication of the first version to include the new field, the QWG must assess +the prevalence of the new field in CVEs published in the past 6 months. If the +new field is present in 5% of new CVEs, this RFD will be considered successful +and the new field will not be rolled back. + +CVE may consider making inclusion of affected artifacts a requirement for CNA +recognition with the Enrichment Recognition List. + +Measuring use by CVE consumers is a significantly larger challenge. A potential +path would be to interview vulnerability management tool vendors, since many of +these ingest and process the CVE list. Enquiring as to the role affected +artifacts play in their processes would provide a strong indication of the value +these identifiers provide. Of course, it will take vendors some time to adjust +their processes. As such, the measure might be to look for at least two vendors +using the new software identifier formats within a year of the adoption of the +new formats. + +## Supporting Data or Research +[supporting-data-or-research]: #supporting-data-or-research + +Demand for OmniBOR was identified specifically in the most recent CVE user +survey, with positive demand shown in Question 16, with the strongest demand +shown from self-identified data aggregators and integrators. + +More generally, demand for identifying affected artifacts in CVE is unclear. +Beyond the question future priorities which included OmniBOR, there were no +specific questions in the survey around demand for identifying affected +artifacts. + +That said, this lack of support has been identified as a gap in discussions +among the QWG, and there is interest in addressing it, whether through this +proposal or a future alternative proposal. + +## Related Issues or Proposals +[related-issues-or-proposals]: #related-issues-or-proposals + +None identified. + +## Recommended Priority +[recommended-priority]: #recommended-priority + +Medium + +## Unresolved Questions +[unresolved-questions]: #unresolved-questions + +There are no remaining unresolved questions. + +## Future Possibilities +[future-possibilities]: #future-possibilities + +More identifier types may be desirable to add in the future. Any question of +what those types may be, or what they may look like within the CVE Record +Format, is not addressed here.