Utilize S3 as the backing file store #373

Mark-Simulacrum · 2019-07-04T18:27:12Z

I've included the relevant access keys on the docs.rs server already, so this can be freely deployed. It should not cause any regressions, other than possibly in performance, as all reads fall back to the database. We'll want to work on a script that moves the existing rows in the database to S3, but that can be done later (and possibly not in this repository at all).

r? @QuietMisdreavus

pietroalbini · 2019-07-04T18:46:00Z

We'll probably want some way to handle configuration of a local s3-like bucket, but I'm not sure what options exist in that space

Minio is a mostly feature complete S3 clone, running it with Docker locally should be painless.

For now, when fetching a file, look on S3 first, but if that request fails, return the content column in the database.

Fallback is always AWS, but this allows easy use of minio and the like for local development.

Mark-Simulacrum · 2019-07-05T01:46:07Z

Latest commit implements support for setting the S3 endpoint. This makes it possible to use minio (I've tested).

Mark-Simulacrum · 2019-07-05T01:49:50Z

I'm going to hold off on working on the upload script until this is merged; my current plan is that we can simply run a query that goes something like select path, content from files where content != "in-s3" limit 100; and upload the results from this query to S3; in the same transaction we would update these paths to have "in-s3" as the content. Once all paths are converted we can drop the content column, if desired.

At this point I think this PR is ready to go.

QuietMisdreavus · 2019-07-07T02:18:44Z

Thanks for putting this together! This looks great, and i'm glad you were able to get this working.

I'm slightly against forcing the assumption that you'll either have the S3 connection or a minio server available. Even if we bundle it into the vagrant container for the dev build, i still want a way to be independent of the S3 protocol. The simplest way to do that is to keep the database fallback, and allow the use of S3/minio as an optional optimization.

To that end, what happens right now if there are no S3 credentials in the environment? It looks like the "read" code tries to run the rusoto code, which (i assume) returns an error, so it reads out the database value in the unwrap_or branch. If we require S3/minio in the future, would that be replaced with a straight unwrap? And it looks like the "write" code already uses a straight unwrap, so now everyone who has a current development environment is forced to somehow provision a minio instance or set up an S3 bucket to load files into.

Unless/until we reconfigure the virtual machine used with Vagrant to include a minio instance and the proper configuration to make the development instance saved to that, i can't accept this. At the very least, i would want to see directions written into the README so that people who want to work on docs.rs can set it up themselves. (I certainly know nothing about minio and vanishingly little about setting up an S3 bucket, and i would need this information to be able to keep working on docs.rs after this change.)

Mark-Simulacrum · 2019-07-07T03:15:19Z

I can definitely provide some setup instructions for minio and try and get a patch for Vagrant put together though I don't have Vagrant installed currently (using qemu for docs.rs development instead).

With that said, I'll also implement the database fallback to ease transition.

This avoids developers needing to install/configure minio. While the installation process is somewhat simple, it is still additional overhead that is too much to ask of developers at this point.

Mark-Simulacrum · 2019-07-07T14:05:28Z

Okay, installing minio and configuring it in vagrant without vagrant installed locally proved a little too ambitious for me, so I scaled back. We no longer require minio (or any S3-compatible client); all reads/writes to the file DB will continue to use it if AWS access keys are not present in the environment. Note, however, that if we detect AWS_ACCESS_KEY_ID in the environment we will error on writes to the DB, though not reads to maintain the ability to gradually migrate files to S3.

QuietMisdreavus · 2019-07-11T22:38:59Z

src/db/file.rs

+    } else {
+        let client = s3_client();
+        let row = rows.get(0);
+        let content = client.and_then(|c| c.get_object(GetObjectRequest {


I wonder if we could use the in-s3 hard-coded value to gate this check, so that we don't generate erroneous calls to S3.

That's a great idea, it'll definitely help with the common case for now. I've pushed up a commit that does this.

QuietMisdreavus

I think this is good to go! Thanks so much for bearing with the review (and for being much faster at responding to comments than i was at making them >_>). Let's get this merged, so we can start offloading our files listing from the database.

Mark-Simulacrum added 3 commits July 4, 2019 14:22

Utilize path for directory traversal

b5f8b1d

Move all code that deals with files table to db::file

b6f1586

Stop retrieving date_added

e1e0bda

Mark-Simulacrum force-pushed the s3 branch from a40c3cf to 6709401 Compare July 5, 2019 01:21

Mark-Simulacrum added 3 commits July 4, 2019 21:43

Store files exclusively on S3

f4a3ff7

For now, when fetching a file, look on S3 first, but if that request fails, return the content column in the database.

Remove redundant imports

acd2833

Add support for local s3 endpoint

3f9ab46

Fallback is always AWS, but this allows easy use of minio and the like for local development.

Mark-Simulacrum force-pushed the s3 branch from 6709401 to 3f9ab46 Compare July 5, 2019 01:44

Fall back to reading/writing to the database

6db1a46

This avoids developers needing to install/configure minio. While the installation process is somewhat simple, it is still additional overhead that is too much to ask of developers at this point.

QuietMisdreavus reviewed Jul 11, 2019

View reviewed changes

Only attempt an S3 request if this row is in S3

46e5331

QuietMisdreavus approved these changes Jul 12, 2019

View reviewed changes

QuietMisdreavus merged commit 45dc58f into rust-lang:master Jul 12, 2019

Mark-Simulacrum deleted the s3 branch July 12, 2019 17:01

Mark-Simulacrum mentioned this pull request Jul 12, 2019

remove duplicate imports #372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Utilize S3 as the backing file store #373

Utilize S3 as the backing file store #373

Uh oh!

Mark-Simulacrum commented Jul 4, 2019 •

edited

Loading

Uh oh!

pietroalbini commented Jul 4, 2019

Uh oh!

Mark-Simulacrum commented Jul 5, 2019

Uh oh!

Mark-Simulacrum commented Jul 5, 2019

Uh oh!

QuietMisdreavus commented Jul 7, 2019

Uh oh!

Mark-Simulacrum commented Jul 7, 2019

Uh oh!

Mark-Simulacrum commented Jul 7, 2019

Uh oh!

QuietMisdreavus Jul 11, 2019

Uh oh!

Mark-Simulacrum Jul 12, 2019

Uh oh!

QuietMisdreavus left a comment

Uh oh!

Uh oh!

Utilize S3 as the backing file store #373

Utilize S3 as the backing file store #373

Uh oh!

Conversation

Mark-Simulacrum commented Jul 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pietroalbini commented Jul 4, 2019

Uh oh!

Mark-Simulacrum commented Jul 5, 2019

Uh oh!

Mark-Simulacrum commented Jul 5, 2019

Uh oh!

QuietMisdreavus commented Jul 7, 2019

Uh oh!

Mark-Simulacrum commented Jul 7, 2019

Uh oh!

Mark-Simulacrum commented Jul 7, 2019

Uh oh!

QuietMisdreavus Jul 11, 2019

Choose a reason for hiding this comment

Uh oh!

Mark-Simulacrum Jul 12, 2019

Choose a reason for hiding this comment

Uh oh!

QuietMisdreavus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Mark-Simulacrum commented Jul 4, 2019 •

edited

Loading