Skip to content

Compress documentation uploaded to S3 #379

Closed
@pietroalbini

Description

@pietroalbini

At the moment we don't compress the documentation uploaded to S3, wasting a lot of space. While money for S3 isn't an issue at the moment, avoiding compression could hurt docs.rs's sustainability in the future.

I ran some very rough benchmarks on reqwest 0.9.3, compressing each .html file separately:

Bad benchmark
Algorithm Size Compression time Decompression time Options
Plaintext 33.9 MB - - -
Gzip 12.0 MB 3.2s 3.4s -9 (best)
Gzip 12.8 MB 2.5s 3.4s -1 (fast)
Zstd 11.7 MB 7.8s 2.4s -19 (best)
Zstd 12.5 MB 2.5s 2.4s -1 (fast)
Brotli 11.5 MB 5.5s 2.3s -9 (best)
Brotli 13.0 MB 2.3s 2.3s -0 (fast)

Looking at the numbers, on average if we compress the uploaded docs we're going to save 63% of storage space, which is great from a sustainability point of view. I think we should compress all the uploaded docs going forward, and try to compress (part of) the initial import as well.

For the algoritm choice, I'd say we can go with gzip: there isn't much difference between the resulting sizes and the compression time delta between gzip's fast and best modes is the smallest. We can compress the initial import with -1 to speed it up, and all the new crates with -9.

cc @Mark-Simulacrum @QuietMisdreavus

Benchmark method

Installed compression tools on Ubuntu 18.04 LTS:

$ sudo apt install gzip brotli zstd

Downloaded locally the reqwest documentation:

$ aws s3 cp --recursive s3://rust-docs-rs/rustdoc/reqwest/0.9.3/ .

Compressed every .html file with find:

$ time find <dir> -name "*.html" -exec <command> {} \;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions