Skip to content

RFD: Support CNAs reporting affected artifacts #440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
366 changes: 366 additions & 0 deletions rfds/0000-reporting-affected-artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,366 @@
# Reporting Affected Artifacts in CVE

| Field | Value |
|:-----------------|:-------|
| RFD Submitter | Andrew Lilley Brinker |
| RFD Pull Request | [RFD #0000](https://github.com/CVEProject/cve-schema/pull/440) |

## Summary
[summary]: #summary

Today, CVE supports identifying affected products or packages using three
"identifier-like" constructs, with one more proposed in RFD #2, "Supporting
Package URLs in CVE". They are:

- CPE, Common Platform Enumeration
- Vendor and product names, provided as a pair
- Collection URL and package names, provided as a pair
- (If accepted) Package URLs, also called "purls"

While these coarse-grained identifiers are great for identifying affected
products or packages, they are insufficiently granular for identifying
_affected artifacts_. This makes it difficult for CNAs to report fine-grained
applicability information when they otherwise could.

For example, a CNA may know that specific binaries they build and ship to users
are affected by a vulnerability. Today, there is not a clear, structured
mechanism for reporting identifiers for these affected binaries to CVE
consumers.

This RFD proposes introducing support for reporting affected artifacts, by
adding a new optional `affectedArtifacts` field to `containers.cna`, which
would contain an array of objects specifying identifiers for artifacts affected
by a vulnerability.

## Problem Statement
[problem-statement]: #problem-statement

While CVE records today can contain substantial information about affected
products or packages, there isn't a clear and structured way to report
information about specific artifacts affected or not affected by a
vulnerability.

This deficiency means CNAs who publish artifacts—such as prebuilt binaries,
archive files such as `.zip`s or `.tar.gz`s, script files, or configuration
files—lack a means to communicate when those artifacts are known to be
vulnerable or to not be vulnerable.

For vulnerability managers, reacting to vulnerability disclosures with
coarse-grained identifiers for affected software requires maintaining accurate
software inventories, whether through Software Bills of Material, package
manifests (such as `package.json` or `Cargo.toml`), lockfiles (such as
`package-lock.json` or `Cargo.lock`), or other means. Without some method for
tracking what software is deployed in a production system, vulnerability
managers may struggle to turn identifiers provided in a CVE record into a clear
determination of applicability, and therefore also to respond quickly to
vulnerabilities when they're disclosed. Reducing the time-to-react for
vulnerability managers is a clear equity for the CVE program.

## Proposed Solution
[proposed-solution]: #proposed-solution

The presence of artifact identifiers in CVE Records would provide an additional
mechanism to vulnerability managers to identify applicable vulnerabilities.
For example, a hash of a known-vulnerable binary could be searched for on
production systems in addition to any deployed software inventories.

Artifact identifiers also have the benefit of low false-positive matches.
Coarse-grained identifiers for products or packages may be decomposed further
with additional fields for objects in the `affected` array, such as `platforms`,
`versions`, `programFiles`, `programModules`, and more. These fields, and the
potential for ambiguity or complexity for checking in many of them, mean that
coarse-grained identifiers' applicability decisions can easily become complex
and require human intervention to assess, and even remain uncertain _despite_
human intervention.

By comparison, identifiers for affected artifacts, which are often based on
hashing file contents, are unlikely to produce false positives. The nature of
cryptographic hashing algorithms is that they are generally resistant to
engineering collisions, with properties such as collision resistance, preimage
resistance, and second-preimage resistance. The result of these properties is
that if a vulnerability manager finds a file in their system whose artifact
identifier matches an artifact identifier provided in a CVE Record, that
manager can act quickly with high confidence that the match is correct.

Artifact identifiers have an additional benefit, because of their low false
positive rate and content-based construction, of being easy to automate and
check at scale.

The following is the actual proposed change for the Record Format:

### Add an `affectedArtifacts` field

Add an `affectedArtifacts` field to the `cnaPublishedContainer` object, found
at the path `containers.cna` within a CVE Record. This new field would be an
array containing `affectedArtifact` objects. The specific edits to the schema
would be as follows:

First, the introduction of the `affectedArtifacts` field within the
`cnaPublishedContainer` object:

```json
"affectedArtifacts": {
"type": "array",
"description": "List of affected artifacts.",
"minItems": 1,
"items": {"$ref": "#/definitions/affectedArtifact"}
}
```

Second, the definition of the `affectedArtifact` type within the "definitions"
portion of the schema:

```json
"affectedArtifact": {
"type": "object",
"description": "Provides information about a specific artifact affected by a vulnerability.",
"allOf": [
{
"description": "An identifier-like field, to identify the artifact.",
"anyOf": [
{"required": ["omniborArtifactID", "omniborArtifactType"]},
{"required": ["sha256"]}
]
},
{
"description": "The status of the artifact.",
"anyOf": [
{"required": ["status"]}
]
}
],
"properties": {
"omniborArtifactID": {
"type": "string",
"pattern": "^gitoid:blob:sha256:[0-9a-f]{64}$",
"description": "The OmniBOR Artifact ID of the artifact to be matched against.",
"examples": [
"gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"gitoid:blob:sha256:09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772",
"gitoid:blob:sha256:230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61"
]
},
"omniborArtifactType": {
"type": "string",
"enum": ["artifact", "buildInput"],
"description": "Specifies how consumers of the Artifact ID should search for matches. If the 'target' is 'artifact', then the Artifact ID is identifying an artifact which should be searched for directly (for example, within a file system by matching against Artifact IDs for files). If the 'target' is 'buildInput' then the Artifact ID is identifying a build input, and consumers should match the Artifact ID against IDs found in OmniBOR Input Manifests for their software."
},
"sha256": {
"type": "string",
"pattern": "^[a-f0-9]{64}$",
"description": "The SHA-256 hash of the artifact.",
"examples": [
"68e656b251e67e8358bef8483ab0d51c6619f3e7a1a9f0e75838d41ff368f728",
"2cc620f8a156b986806bc2757c0572d978d8cbfc4d25f0dfa7c552291bf68279",
"97272dc1b6ac7ca84735b797b4a04233b17fd55707f9c728fc3747e3f935f02c"
]
},
"status": {
"description": "The vulnerability status for the version or range of versions. For a range, the status may be refined by the 'changes' list.",
"$ref": "#/definitions/status"
},
"version": {
"description": "The single version being described, or the version at the start of the range. By convention, typically 0 denotes the earliest possible version.",
"$ref": "#/definitions/version"
},
"versionType": {
"type": "string",
"description": "The version numbering system used for specifying the range. This defines the exact semantics of the comparison (less-than) operation on versions, which is required to understand the range itself. 'Custom' indicates that the version type is unspecified and should be avoided whenever possible. It is included primarily for use in conversion of older data files.",
"minLength": 1,
"maxLength": 128,
"examples": [
"custom",
"git",
"maven",
"python",
"rpm",
"semver"
]
},
"platforms": {
"description": "List of specific platforms if the vulnerability is only relevant in the context of these platforms (optional). Platforms may include execution environments, operating systems, virtualization technologies, hardware models, or computing architectures. The lack of this field implies that the other fields are applicable to all relevant platforms.",
"type": "array",
"minItems": 1,
"uniqueItems": true,
"items": {
"type": "string",
"examples": ["iOS", "Android", "Windows", "macOS", "x86", "ARM", "64 bit", "Big Endian", "iPad", "Chromebook", "Docker", "Model T"],
"maxLength": 1024
}
}
}
}
```

The explanations of the fields for `affectedArtifact` objects is as follows:

- `omniborArtifactID`: An OmniBOR Artifact Identifier, used to identify either
an artifact itself, such as a binary file, or to identify build inputs used to
produce the artifact.
- `omniborArtifactType`: The type associated with the `omniborArtifactID` field,
can be either `"artifact"` or `"buildInput"`. If `"artifact"` is used, then
the field is the Artifact ID of an artifact itself, such as a binary file. If
`"buildInput"` is used, then the field is the Artifact ID of a build input.
This field indicates to CVE consumers how to use the field in question. For
artifacts, they should search their systems and/or inventories for files with
a matching Artifact ID. For build inputs, they should search their OmniBOR
Input Manifests for IDs which match.
- `sha256`: The SHA-256 hash of the artifact in question.
- `status`: Indicates whether the identified artifact is affected, not affected,
or has an unknown affected status.
- `version`: The version applicable to the identified artifact, if relevant.
- `versionType`: If `"version"` is used, this indicates what type of version
is present, and should be used by CVE consumers to validate and interpret the
`"version"` field.
- `platforms`: A list of platforms, describing the specific platform the
identified artifact is intended for.

Additionally, the data constraints on the `affectedArtifact` object ensure that
at least one set of identifier-like fields is present per object, and that each
object always includes a `"status"` field.

Note that identifiers found in the same `affectedArtifact` object should be
interpreted as synonyms, identifying the same artifact. For example, an entry
in the `affectedArtifacts` array which contains both an `omniborArtifactID`
and `sha256` value should be interpreted as identifying only _one artifact_,
for which either identifier is valid. The presence of multiple identifiers is
intended only to make matching easier for CVE consumers by providing them with
options which may be more convenient depending on what identifiers or tooling
the consumer has available in their systems to support matching.

### Use of this as a template for future identifiers

This proposal is intended as a template for the introduction of more
fine-grained identifier types intended for identifying artifacts in the future.
Specifically, future identifiers should be added as new fields within the
`affectedArtifact` object inside the `affectedArtifacts` array.

### Vendoring of the relevant specifications

To ensure consistency about new identifier types added, the CVE project
should "vendor," meaning maintain its own public copy of, any relevant
specifications when those specifications are not versioned upstream.

## Examples
[examples]: #examples

The following is an example `affectedArtifacts` field, identifying three
binaries, one for each of Windows, macOS, and Linux systems on x86:

```json
"affectedArtifacts": [
{
"omniborArtifactID": "gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"omniborArtifactType": "artifact",
"sha256": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
"status": "affected",
"version": "0.18.1",
"versionType": "semver",
"platforms": ["macOS", "x86"]
},
{
"omniborArtifactID": "gitoid:blob:sha256:4043df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"omniborArtifactType": "artifact",
"sha256": "40414dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
"status": "affected",
"version": "0.18.1",
"versionType": "semver",
"platforms": ["Windows", "x86"]
},
{
"omniborArtifactID": "gitoid:blob:sha256:ccc4df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
"omniborArtifactType": "artifact",
"sha256": "ddd24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
"status": "affected",
"version": "0.18.1",
"versionType": "semver",
"platforms": ["Linux", "x86"]
}
]
```

## Impact Assessment
[impact-assessment]: #impact-assessment

The addition of this new field would enable CNAs to report affected artifacts,
such as known-vulnerable prebuilt binaries shipped for versions of software
affected by a vulnerability, and would be complementary to the existing ability
in the Record Format to identify affected products and packages.

For CVE consumers, the addition of this field would provide the ability to
search for the presence of known-vulnerable artifacts in their systems when
reported by CNAs.

## Compatibility and Migration
[compatibility-and-migration]: #compatibility-and-migration

This would be a minor change, as the addition of new optional fields is
considered non-breaking.

CVE consumers could, if they wanted, gain the benefit of the new field by
updating their consumption logic to recognize the field and make use of its
contents. CVE consumers would only be broken if they incorrectly assume in their
consumption logic that no new optional fields will ever be added to the
`cnaContainer` object.

## Success Metrics
[success-metrics]: #success-metrics

The success of this proposal will depend on the adoption of the new field,
and the degree to which the new field provides value for CVE consumers.

CNA adoption can be measured in reported CVEs. After a 6 month period from the
publication of the first version to include the new field, the QWG must assess
the prevalence of the new field in CVEs published in the past 6 months. If the
new field is present in 5% of new CVEs, this RFD will be considered successful
and the new field will not be rolled back.

CVE may consider making inclusion of affected artifacts a requirement for CNA
recognition with the Enrichment Recognition List.

Measuring use by CVE consumers is a significantly larger challenge. A potential
path would be to interview vulnerability management tool vendors, since many of
these ingest and process the CVE list. Enquiring as to the role affected
artifacts play in their processes would provide a strong indication of the value
these identifiers provide. Of course, it will take vendors some time to adjust
their processes. As such, the measure might be to look for at least two vendors
using the new software identifier formats within a year of the adoption of the
new formats.

## Supporting Data or Research
[supporting-data-or-research]: #supporting-data-or-research

Demand for OmniBOR was identified specifically in the most recent CVE user
survey, with positive demand shown in Question 16, with the strongest demand
shown from self-identified data aggregators and integrators.

More generally, demand for identifying affected artifacts in CVE is unclear.
Beyond the question future priorities which included OmniBOR, there were no
specific questions in the survey around demand for identifying affected
artifacts.

That said, this lack of support has been identified as a gap in discussions
among the QWG, and there is interest in addressing it, whether through this
proposal or a future alternative proposal.

## Related Issues or Proposals
[related-issues-or-proposals]: #related-issues-or-proposals

None identified.

## Recommended Priority
[recommended-priority]: #recommended-priority

Medium

## Unresolved Questions
[unresolved-questions]: #unresolved-questions

There are no remaining unresolved questions.

## Future Possibilities
[future-possibilities]: #future-possibilities

More identifier types may be desirable to add in the future. Any question of
what those types may be, or what they may look like within the CVE Record
Format, is not addressed here.