Skip to content

Download and validate root TUF data automatically if needed #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

phenaproxima
Copy link
Collaborator

This is an experiment, implementing @davidstrauss's idea in #7.

The idea is basically: to get around the fact that we can't easily resolve the path of the starting root metadata, we could instead expect consumers of this plugin to have a set of hashes for the root metadata in their composer.json. The plugin could then try to download the root metadata and validate it against those hashes very early, before any other data is downloaded.

@davidstrauss
Copy link

This looks great. Does anyone else have thoughts?

@pwolanin
Copy link

Maybe add the file size as well as the hash?

Copy link
Contributor

@TravisCarden TravisCarden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks functionally correct to me. I have one question about key rotation and one request to rename some variables.

"root": "root.json"
"root": {
"hashes": {
"sha256": "6cf95b77cedc832c980b81560704bd2fb9ee32ec4c1a73395a029b76715705cc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, the TUF spec locates the root key offline (e.g., literally in a physical safe or something) and suggests rotating it roughly annually. Wouldn't this sha hash change when that happens? And in that case, would this "just stop working" one day for our users until they update the hash from the webpage or whatever they originally got it from? cc @davidstrauss @tedbow

Copy link

@davidstrauss davidstrauss Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to review the code again to understand how it's implemented, but what it should do is only use this pinning to bootstrap new clients. The code should probably ship with the version and size for that, too.

When we rotate root, it should have the following effects with this design:

  • No special effect on bootstrapped clients, which have already pulled and stored local root data and no longer use the bootstrapping information.
  • If someone installs using a download from before the rotation, their client will need to bootstrap to the old root and update to the new one by standard means. This would also be the case if someone installed from a stale tarball that included a full but stale root.json. We're required to keep the chain of previous root data published, at least as far back as we expect clients to need to bootstrap.

Here's how my thought process went around working with this root bootstrapping problem:

  1. We could use a design where we ship the initial root data in a way that both populates the client's repo metadata store and retains its writability, but that's kind of awkward. We usually like to split things into (a) shipped and nearly immutable vs. (b) mutable but initially empty (e.g. /files).
  2. We could ship root.json and copy it into the client repo metadata. It's kind of unwieldly to embed in other JSON, though. We'd probably want to ship it separately and reference it somehow in the repo config. This was the prior design, and my suggestion was to simplify it by going to the next level of abstraction by "pinning" the expected metadata rather than providing it directly.
  3. Since old repo metadata must remain published, we ought to be able to bootstrap with just a hash (and maybe size) for the data. If it can't fetch the current (or fresher) root data, then it probably can't pull anything else from the repo. Having a full root.json doesn't seem to help the client survive any attacks above and beyond providing it the version + hash + size.

Still, I'll run this past people at the Secure Systems Lab.

Copy link

@davidstrauss davidstrauss Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Marina Moore at the Secure Systems Lab on TUF Slack:

For the purpose of bootstrapping trust, the secure hash should be functionally equivalent to the root metadata, so that seems like a reasonable space-saving mechanism. It also gives more paranoid users an opportunity to look for external verification that they have the correct root metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unclear of the motivation for this PR. @phenaproxima in the first comment describes the problem as "we can't easily resolve the path of the starting root metadata,"

but then @davidstrauss in TUF slack mentions "include it in Composer's repository configuration in a compact way" and Marina Moore mentions "reasonable space-saving mechanism".

Is there not an actual problem of resolving? Are the initial root.json files so big we are worried about the space?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just awkward to have the Composer config either (1) reference a file elsewhere on disk or (2) embed the root data into an otherwise sensible JSON file.

Another option could be some sort of naming convention for discovering the root data for a given repo. For example, we could use the FQDN or hash of the base URL. This would allow populating the root on disk without awkwardly referencing it from the JSON.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that would solve the problem. The issue is, if we expect to find the root JSON in the file system, we need to know its path. And as you know, there are two kinds of paths: relative and absolute. Both present problems:

If we used a relative path, then what is it relative to? The current working directory? The plugin's src directory? The composer.json file itself? Some package's path in the vendor directory? (If it's relative to a package, what's the path of the package? We cannot know this until the package info is loaded, but TUF needs to be fully bootstrapped and updated before we can even begin to figure that out.)

If it's an absolute URL, then how can we guarantee that path be valid in all environments the composer.json file might live in?

Downloading the root JSON is not my favorite idea, but it does get around these questions, because the root file lives at a stable, known location (i.e., a URL on the server). But I'm absolutely open to other ideas.

Copy link
Collaborator Author

@phenaproxima phenaproxima Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option that @davidstrauss suggested later in this issue is to use magic naming and a fixed directory structure. For example, if we want people to add TUF protection to the packages.drupal.org repository, we document that we expect the project to have a directory structure like this:

.pki/
    tuf/
        packages.drupal.org.json
composer.json

@phenaproxima
Copy link
Collaborator Author

@TravisCarden I think that, to avoid getting too verbose, the best thing to do might just be a comment. What do you think?

@davidstrauss
Copy link

The Secure Systems Lab has suggested we review/collaborate with this PIP issue.

@jku
Copy link

jku commented Apr 18, 2021

Some drive-by comments (from python client experience, without any knowledge of the php version):

  • if you download a root.json and expect it to match a hash, you should download specific version (like 1.root.json)
  • root.json getting rotated should be fine if you follow the first point: expired root metadata should be accepted while it is "intermediate" (while the client is in the process of updating root metadata). Only the final root metadata version should be checked for expiry: this is 5.3.6 in the specification update process.

That said, I don't understand why you are doing this: if you have place to store the hash of a root.json, I don't see why you would not have a place to store the root.json itself? I would definitely choose embedding it in the source code if a local file can't be reliably loaded from disk.

Comment on lines +64 to +65
$fetcher = $repoConfig['tuf']['_fileFetcher'] ?? GuzzleFileFetcher::createFromUri($url);
$this->updater = $repoConfig['tuf']['_updater'] ?? new ComposerCompatibleUpdater($fetcher, [], new FileStorage($repoPath));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are already doing this for _fileFetcher but it seems weird to accept config options that should only ever be used in testing. I don't know enough about how Composer plugins work but it makes me wonder if another plugin would ever have a chance to alter this config to say pass malicious file fetcher or updater in these keys.

It seems like if we don't want _updater to be used during a non-test run we should do whatever we can to make that impossible.

Looking at the test

return $this->composer->getRepositoryManager()
            ->createRepository('composer', $config);

Because TufValidatedComposerRepository is not final could we extend it as TestTufValidatedComposerRepository to be able to set the fetcher and updater in some way then use \Composer\Repository\RepositoryManager::setRepositoryClass to make it use our test class. The maybe avoiding having this test only code in our class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed an alternate approach in #22, although I don't know if it's really much of an improvement.

A lot of these problems stem from the fact that ComposerRepository::$config is private, and I need to be able to modify the configuration in the constructor before it's given to the parent class. If that property were protected, this would be a non-issue, and I could move all of this logic into a different method, rather than being constrained by the fact that it's all got to happen in the constructor.

@davidstrauss
Copy link

davidstrauss commented Apr 21, 2021

That said, I don't understand why you are doing this: if you have place to store the hash of a root.json, I don't see why you would not have a place to store the root.json itself? I would definitely choose embedding it in the source code if a local file can't be reliably loaded from disk.

I don't have a strong opinion that we should do this. It was simply a way to allow composer.json to serve as the initial trust anchor without reliance on external assets (which don't always follow along in practice as a composer.json gets copied around) or polluting the composer.json with a wall of serialized data representing the root.json content.

So, this approach makes composer.json self-sufficient, which it would already be normally (without TUF integration). You could consider it avoiding a usability regression that we would gain by having composer.json depend on an accompanying root.json. I dislike creating a format that feels like .bin + .cue, where you need a matching pair to use either.

Another approach, which I might actually like more, is where composer.json doesn't define the trust anchors for the configured repositories at all, but our TUF integration has a pattern for finding the right root.json on disk. For example, there could be a naming convention where the TUF repo at http://example.com/repos/current/ would trigger discovery of root.json at $CWD/tuf-anchors/example.com/root.json (which may or may not be where it tracks updated root.json data). This design admittedly retains the annoyance of needing to pair composer.json with the right root.json, but it at least avoids awkwardly including half of the trust anchor configuration in composer.json (in the form of a path to root.json). I'm also always a fan of convention over configuration.

@phenaproxima
Copy link
Collaborator Author

phenaproxima commented Apr 21, 2021

I kinda like that idea, @davidstrauss. This is a case where I think that convention and magic naming are going to be the clearest option for users. And Drupal could certainly modify the core-composer-scaffold plugin to automatically put its root.json into the correct location, which would make it even more painless for people to use.

One small question we'd have to figure out is how to handle repository URLs where the path is significant -- packages.drupal.org being an decent example. For instance, if we needed to have two separate TUF repos for packages.drupal.org/8 and packages.drupal.org/7, how would we name the root files? Maybe just replace / with _, or some other character that's almost never seen in URLs?

@tedbow
Copy link
Contributor

tedbow commented Apr 22, 2021

Should we close this PR or set it to draft since it seems we want to go with #23 instead?

@phenaproxima phenaproxima marked this pull request as draft April 22, 2021 15:22
@phenaproxima
Copy link
Collaborator Author

Closed in favor of #23.

@phenaproxima phenaproxima deleted the fetch-root branch April 22, 2021 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants