Skip to content

Utilize S3 as the backing file store #373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 12, 2019

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Jul 4, 2019

I've included the relevant access keys on the docs.rs server already, so this can be freely deployed. It should not cause any regressions, other than possibly in performance, as all reads fall back to the database. We'll want to work on a script that moves the existing rows in the database to S3, but that can be done later (and possibly not in this repository at all).

r? @QuietMisdreavus

@pietroalbini
Copy link
Member

We'll probably want some way to handle configuration of a local s3-like bucket, but I'm not sure what options exist in that space

Minio is a mostly feature complete S3 clone, running it with Docker locally should be painless.

For now, when fetching a file, look on S3 first, but if that request
fails, return the content column in the database.
Fallback is always AWS, but this allows easy use of minio and the like
for local development.
@Mark-Simulacrum
Copy link
Member Author

Latest commit implements support for setting the S3 endpoint. This makes it possible to use minio (I've tested).

@Mark-Simulacrum
Copy link
Member Author

I'm going to hold off on working on the upload script until this is merged; my current plan is that we can simply run a query that goes something like select path, content from files where content != "in-s3" limit 100; and upload the results from this query to S3; in the same transaction we would update these paths to have "in-s3" as the content. Once all paths are converted we can drop the content column, if desired.

At this point I think this PR is ready to go.

@QuietMisdreavus
Copy link
Member

Thanks for putting this together! This looks great, and i'm glad you were able to get this working.

I'm slightly against forcing the assumption that you'll either have the S3 connection or a minio server available. Even if we bundle it into the vagrant container for the dev build, i still want a way to be independent of the S3 protocol. The simplest way to do that is to keep the database fallback, and allow the use of S3/minio as an optional optimization.

To that end, what happens right now if there are no S3 credentials in the environment? It looks like the "read" code tries to run the rusoto code, which (i assume) returns an error, so it reads out the database value in the unwrap_or branch. If we require S3/minio in the future, would that be replaced with a straight unwrap? And it looks like the "write" code already uses a straight unwrap, so now everyone who has a current development environment is forced to somehow provision a minio instance or set up an S3 bucket to load files into.

Unless/until we reconfigure the virtual machine used with Vagrant to include a minio instance and the proper configuration to make the development instance saved to that, i can't accept this. At the very least, i would want to see directions written into the README so that people who want to work on docs.rs can set it up themselves. (I certainly know nothing about minio and vanishingly little about setting up an S3 bucket, and i would need this information to be able to keep working on docs.rs after this change.)

@Mark-Simulacrum
Copy link
Member Author

I can definitely provide some setup instructions for minio and try and get a patch for Vagrant put together though I don't have Vagrant installed currently (using qemu for docs.rs development instead).

With that said, I'll also implement the database fallback to ease transition.

This avoids developers needing to install/configure minio. While the
installation process is somewhat simple, it is still additional overhead
that is too much to ask of developers at this point.
@Mark-Simulacrum
Copy link
Member Author

Okay, installing minio and configuring it in vagrant without vagrant installed locally proved a little too ambitious for me, so I scaled back. We no longer require minio (or any S3-compatible client); all reads/writes to the file DB will continue to use it if AWS access keys are not present in the environment. Note, however, that if we detect AWS_ACCESS_KEY_ID in the environment we will error on writes to the DB, though not reads to maintain the ability to gradually migrate files to S3.

src/db/file.rs Outdated
} else {
let client = s3_client();
let row = rows.get(0);
let content = client.and_then(|c| c.get_object(GetObjectRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use the in-s3 hard-coded value to gate this check, so that we don't generate erroneous calls to S3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea, it'll definitely help with the common case for now. I've pushed up a commit that does this.

Copy link
Member

@QuietMisdreavus QuietMisdreavus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go! Thanks so much for bearing with the review (and for being much faster at responding to comments than i was at making them >_>). Let's get this merged, so we can start offloading our files listing from the database.

@QuietMisdreavus QuietMisdreavus merged commit 45dc58f into rust-lang:master Jul 12, 2019
@Mark-Simulacrum Mark-Simulacrum deleted the s3 branch July 12, 2019 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants