Skip to content

Consider using an alternative for ruamel.yaml due to critical CVE #4134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stefanvangastel opened this issue Jun 30, 2020 · 25 comments
Closed
Labels
awaiting response we are waiting for your reply, please respond! :)

Comments

@stefanvangastel
Copy link

As a dependency of DVC, ruamel.yaml up to 0.16.10 (latest) contains a critical CVE (https://nvd.nist.gov/vuln/detail/CVE-2019-20478). This prevents us (and probably others) to use DVC in production systems.

@ghost ghost added the triage Needs to be triaged label Jun 30, 2020
@skshetry
Copy link
Collaborator

skshetry commented Jun 30, 2020

@stefanvangastel, sadly, there's no good alternative of ruamel.yaml, as others do not preserve comments. But, I think we can make an optional opt-in config that uses (C)SafeLoader that can be used in production.

@skshetry
Copy link
Collaborator

Looks like safe_load is not affected.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jun 30, 2020
@ghost ghost removed the triage Needs to be triaged label Jun 30, 2020
@skshetry
Copy link
Collaborator

skshetry commented Jun 30, 2020

Also, another thing to note from the documentation:

The default loader (typ='rt') is a direct derivative of the safe loader, without the methods to construct arbitrary Python
objects that make the unsafe loader unsafe, but with the changes needed for round-trip preservation of comments, etc.

So, I think we are safe here as we only use round-trip loader once when trying to dump, and looks like it is safe. All other uses PyYAML's safe_load.

@honnibal
Copy link

We had a lot of trouble with serialization libraries for spaCy: the JSON library we were using, ujson, was unmaintained, and the msgpack library we were using needed a monkeypatch to support numpy arrays and then made several breaking changes.

What we ended up doing was making our own package that vendored these dependencies, srsly, so that we could maintain them. We've recently added YAML support, vendoring our own fork of Ruamel.

The current YAML support attempts to set the safe loading flags, but I want to simply remove this terrible feature from the fork. I don't think anyone should ever want that behaviour, and if they do they can use a different YAML library.

So, if it suits your needs, perhaps DVC can use srsly? We have similar concerns and together we can make sure it stays secure.

@efiop
Copy link
Contributor

efiop commented Jun 30, 2020

Discussed with @skshetry and it looks like dvc is safe and doesn't look like we will get any value from moving to srsly (but will gain 300KB more in dep size) at this point, but will will keep an eye on srsly if we find ourselves needing its other features. Closing for now.

@efiop efiop closed this as completed Jun 30, 2020
@efiop
Copy link
Contributor

efiop commented Jun 30, 2020

@stefanvangastel btw, was it reported by some static analyzator? Veracode? If so, are you able to mark it as mitigated?

@stefanvangastel
Copy link
Author

stefanvangastel commented Jul 1, 2020 via email

@efiop
Copy link
Contributor

efiop commented Jul 1, 2020

@stefanvangastel Are you able to mark things as mitigated? How do you usually deal with such issues? Is it not possible to review particular warnings?

@stefanvangastel
Copy link
Author

We can provide waivers to whitelist specific packages, so that issue is solved (not being able to install DVC due to usage of ruamel.yaml).

@efiop
Copy link
Contributor

efiop commented Jul 2, 2020

@stefanvangastel Just to clarify, will you be able to do that in this case to install dvc?

@stefanvangastel
Copy link
Author

Ok, so to paint the entire picture;

We are in an offline / on-premise netwerk. We use Sonatype Nexus (on-prem) to proxy / cache PyPi packages. So using pip install we provide a the Nexus server als proxy.

Additionally Sonatype IQ (also on-prem) will scan all packages traveling through the proxy before allowing them in our network.

If we install DVC, pip returns an 403 error stating ruamel.yaml is placed in quarantine due to a CVE score >= 9 (in this case 9.8 out of 10). So DVC on itself is not the issue.

After providing your explanation our SEC dept allowed the specific version of ruamel.yaml package to be installed, making it possible to install DVC.

The whitelisting and allowing is just local, so we don't provide any upstream feedback to Sonatype or NIST since the CVE is valid and undisputed.

That's why my original question was; Can DVC use another package since using ruamel.yaml makes you (DVC) inherit its CVE score as it is a dependency.

@efiop
Copy link
Contributor

efiop commented Jul 2, 2020

@stefanvangastel Thank you for the explanation! Makes sense now!

Btw, other than on-premise Sonatype, are you aware of any tool that we could use to check our cve score during development? Without it, we won't ever be sure if the new package that we introduce is actually better or worse in CVE score. Regarding the ruamel itself, we could consider switching to srsly as suggested by @honnibal , but again I'm not sure what is the CVE score for it (seems like there is no score at all right now) and if that would really help us.

@skshetry
Copy link
Collaborator

skshetry commented Jul 2, 2020

@efiop, srsly right now has only vendored the ruamel.yaml, so it's the same.

are you aware of any tool that we could use to check our cve score during development?

Doesn't dependabot catch those? I think, it's having trouble reading from our setup.py, it should work with requirements.txt files.

@stefanvangastel
Copy link
Author

@efiop I can reproduce the CVE findings locally (macOS) by:

Installing a OWASP dependency checker CLI tool

brew install dependency-check

Create venv and install DVC

python3 -m venv dvc
source dvc/bin/activate
pip3 install dvc

Run a scan

dependency-check --project "dvc" --enableExperimental --scan dvc

(--enableExperimental has to be on to use the Python scanner)

This results in a dependency-check-report.html that looks like this (confirming the ruamel.yaml severity):

Screenshot 2020-07-02 at 16 01 39

@efiop
Copy link
Contributor

efiop commented Jul 2, 2020

@stefanvangastel Oh, I didn't know that dependency-check supports cve scanning. Nice! Created #4153 for that.

@shcheklein
Copy link
Member

@skshetry @efiop do you know if this issue was fixed in 0.17? I'm still hitting the same issue with Nexus with one of our customers.

@pmrowla
Copy link
Contributor

pmrowla commented Oct 22, 2023

The CVE is for ruamel.yaml <= 16.7. As discussed in this issue, it was also only specific to using the default load() method. If the application or library using ruamel.yaml is properly using safe_load() then the CVE is irrelevant.

@shcheklein
Copy link
Member

The CVE is for ruamel.yaml <= 16.7.

https://security-tracker.debian.org/tracker/CVE-2019-20478

It's not clear tbh

If the application or library using ruamel.yaml is properly using safe_load() then the CVE is irrelevant.

I think it is still relevant since there is no way to prevent someone else to install and use in it in a non-safe way?

@pmrowla
Copy link
Contributor

pmrowla commented Oct 23, 2023

I think it is still relevant since there is no way to prevent someone else to install and use in it in a non-safe way?

I'm not following here? Are you talking about someone pip installing DVC and then using ruamel.yaml from the same venv?

In DVC our yaml deserialization does not use the default load, we explicitly use safe_load so we are not affected by the CVE.

@shcheklein
Copy link
Member

I'm not following here?

If quarantine is lifted from the package, anyone could just use it in their apps, independently from DVC, right. And there will be no easy way to guarantee that it is being used in a safe way, even if DVC is fine.

@pmrowla
Copy link
Contributor

pmrowla commented Oct 23, 2023

Yes but I don't think that's relevant here at all. The original CVE is misleading because it's classified as severity 9.8 since it's technically a remote code execution issue (where deserializing untrusted input can execute a script in the yaml).

But it's essentially the same level of issue as "using Python exec on untrusted input or unpickling untrusted input is inherently unsafe". pip installing DVC also requires installing Python. Someone could then misuse exec or pickle, but I would not consider that to be a DVC security issue.

@shcheklein
Copy link
Member

Okay, I see. I wonder why it is kept open? Is there a way to close it then? (it still obviously creates some friction that could be avoided).

@pmrowla
Copy link
Contributor

pmrowla commented Oct 23, 2023

There is a formal dispute process for most of the CVE databases but it's unlikely anything would come of it since everything in the report is still technically accurate.

To clarify, the underlying issue here is that is that is that the "unsafe default" behavior in ruamel.yaml is actually just supporting the official yaml spec. The yaml spec formally supports defining custom tags and defining how to execute them, and this is enabled by default in ruamel.yaml. Using safe_load() disables this support.

@shcheklein
Copy link
Member

shcheklein commented Oct 23, 2023

Hmm, to your point it's not different from eval. The package could be considered as eval - people could use or not use it, the same as with eval itself. May be the difference here is that it's harder to detect it (vs eval in Python)?

Why is there no report for Python then?

@pmrowla
Copy link
Contributor

pmrowla commented Oct 23, 2023

CVE reports can basically be opened by anyone. You could absolutely open a technically valid CVE regarding Python eval/exec. But given the visibility of Python and the fact that a CVE open against it would block entire OS releases, it would probably be heavily disputed and then taken down almost immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

6 participants