Skip to content

Encourage longer descriptions #439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Encourage longer descriptions #439

wants to merge 1 commit into from

Conversation

todb
Copy link

@todb todb commented Aug 1, 2025

Call me crazy, but I think a single character for a description isn't particularly useful. Let's say 5 characters. CWE-1 is five characters.

This PR is a little bit of a troll, but if we're going to be serious about encouraging quality records, a minimum standard for descriptions is just as good a place as any to start.

Call me crazy, but I think a single character for a description isn't particularly useful. Let's say 5 characters. CWE-1 is five characters.
@@ -823,7 +823,7 @@
"value": {
"type": "string",
"description": "Supporting media content, up to 16K. If base64 is true, this field stores base64 encoded data.",
"minLength": 1,
"minLength": 5,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the minimum a base64 string can be is 4 characters. A five character ASCII string becomes a 9 character base64 string. If you want to be consistent with the other proposed minimum.

@kernelsmith
Copy link
Contributor

I think it should be at least 4 so Mitre can't publish CVEs with "n/a" for everything, but I'd prefer as long as possible while being truly minimal. If I base it on our data and my opinions, the description should probably mention the vendor, the product, a problem type, and the word, ya know, "vulnerability". If I look at our 15,000 or so vulnerability disclosures, our shortest vendor name length is 2, our shortest product name length is 1, the word "vulnerability" has 13, and the shortest reasonable problem type is the shortest CWE name in our db, which is "Deadlock" at 8, so the shortest it could reasonably be, using that info, is 24. Now, how does that compare to our actually shortest description? Well, our descriptions are fully version controlled, so finding the shortest one is more work, so I will just use our shortest TITLE which is "7-Zip Mark-of-the-Web Bypass Vulnerability" which is 42 so 24 isn't out of the question. Now, that reminds me that we allow our analysts to indicate only the vendor or only the product when the 2 are the same, that way you don't get something like "7-zip 7-zip". Therefore, one could argue that I should only use the shorter of the vendor and product, which would be 1, so we drop the length 2 vendor name, which takes us to 22. However, none of this would make sense without spaces, so putting a space between we have "1prod/vend Deadlock Vulnerability", so we're back to 24.

Now, one could argue that since MITRE uses "n/a" for many of these fields (such as this one), which of course has a length of 3, the shortest description should be "n/a n/a Vulnerability" which is 21, so we would need to make accommodation for ultra short descriptions of length 21.

So given how many CVEs that MITRE has published with "n/a" everywhere, my recommendation would be 21. Unless of course, we're also going to allow "na" or "NA", in which case it would have to be 19. However, the schema also allows these fields to be of length 1, so " " would work, so I guess 17 is the real minimum. Unless we don't want to even require the most fundamental of words "vulnerability", cuz in that case...

@kernelsmith
Copy link
Contributor

Coincidentally, the description for the vuln associated with the vuln w/the shortest title [above] has a length of 579. I have removed the " Was ZDI-CAN-12345." from that length since that's specific to our style. We also have I believe 4 newlines in there as well, so let's call it 575. Clearly however, that's not going to work for descriptions published by MITRE, or most CNAs honestly, but hey at least they use CVSS 3.1. Well, ya know, when they actually include CVSS at all, which they don't do. See this aforementioned CVE.

Maybe the folks from VulnCheck could weight in any other lengths we could increase?

@dwelch2344
Copy link

Appreciate the sentiment, as CVEs without any effective description are super fun and useful 😅

That said, would rather this is approached with a tangible solution in mind. Character limits are arbitrary and useless – though I understand the minimum of 1 just to psychologically encourage good behavior.

If all we require is “at least one character,” we’re effectively saying “just put something,” not “put something meaningful.” Probably worth revisiting the actual rules and start to move us towards a something that both produces value + is enforceable.

Rather than wrestling over arbitrary character counts, let’s lean on CNAs to actually write descriptions that convey who, what, and how. It’s CNA behavior—not JSON-schema tweaks—that will move the needle.

It's also something that could probably be instrumented at the tooling level, for those using Vulnogram / if the CVE project spun up it's own frontend. An upfront simple check + "hey, your description looks lame. Want AI to take a stab at it for you or can you do it on your own like a big kid?" + a public shame campaign (ie informal/playful leaderboard of offenders) for those who don't do the minimum is far less arbitrary & far more effective IME.

@kernelsmith
Copy link
Contributor

As complex as JSON Schema can be (I consider it to be its own DSL), it can't do anything beyond RegEx in this situation. It's about structure, not quality. For once, I vote for AI here LOL

@todb
Copy link
Author

todb commented Aug 4, 2025

All I'm saying is that you could enforce just a little more rigor in basic JSON. For example, you could have a regex match of ([\S]+[\s]){3} to ensure at least one 4 word sentence, like This vulnerability is CWE-200. This rapidly gets silly, of course, because as @kernelsmith and @dwelch2344 have indicated, this is far from sufficient.

But it's an easy schema change to at least set an absolute floor of verbosity in text descriptions. If you wanted to automate for quality, you at least need something like this, but should prefer something more than this.

@darakian
Copy link

I'll link in this oldie to merge the contexts.
#232

It's trivially easy to work/troll around a character limit, so lets dig into the actual concern here. Is it just mitre publishing n/a that's the problem or are there other CNAs?

@kernelsmith
Copy link
Contributor

kernelsmith commented Aug 11, 2025

I'll link in this oldie to merge the contexts. #232

It's trivially easy to work/troll around a character limit, so lets dig into the actual concern here. Is it just mitre publishing n/a that's the problem or are there other CNAs?

I don't know if they're the only ones, but they have a metric butt-ton of them https://github.com/raw/jgamblin/cvelint-action/refs/heads/main/CNAReports/mitre.csv
Now, some are conversions from JSONv4, so I'd give any of the v4 converted ones a pass, if for no other reason than to narrow the scope. I can't tell how many of those are for the title vs description etc as the file is 61MB in size and just doing search this page action is brutal. Certainly the "affected" (as in version) is...affected.

Edit: this report doesn't look at title or description, but the string "n/a" shows up 327,511 times just as a rough order of magnitude and all of those are for the "affected" product/version

@darakian
Copy link

darakian commented Aug 11, 2025

There's a lot going on in that csv and I see Invalid version string: "n/a" right off the bat. Ex.

CVE-2025-51726,mitre,/home/runner/.cache/cvelint/cvelistV5/cves/2025/51xxx/CVE-2025-51726.json,check-invalid-version-string,E007,containers.cna.affected.#.versions.#.version,Invalid version string: "n/a"

In a purely technical sense the version string "n/a" is not invalid and I've been trying to move the needle on that one myself. More here #362 if you want more

Also, perhaps my grep is failing me, but I don't actually see anything about description in there

Edit: Aaaaaaand I commented at the same time as your edit.
I think what you're getting at is that data fields should have meanings and rules and I wholeheartedly agree. You should really throw a thumbs up (or other positive emoji) on this
#423 (comment)
Maybe also add your own thoughts on what record format rules we should have in that discussion thread as its an active conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants