Skip to content

cargo publish "memory allocation of .. bytes failed" with large, ignored test assets #7543

Closed
@FauxFaux

Description

@FauxFaux

Problem

cargo publish fails with:

memory allocation of 1099511627782 bytes failed
zsh: abort (core dumped) cargo publish --dry-run

...if you have large files in your build directory, even if they are excluded.

I expect this to complete in all cases, i.e. a crate can be published regardless of how much memory I have (within reason!). But, I especially expect it to pass in this case, where the file causing the error is ignored, and will not be published.

Steps

  1. Have a test data directory which is ignored.
mkdir -p tests/generated
echo '*' > tests/generated/.gitignore
g add -f tests/generated/.gitignore
  1. Have a build.rs which creates a large file in this test data directory.
use std::io::{Seek, SeekFrom, Write};
use std::fs;

fn main() {
    let mut f = fs::File::create("tests/generated/large.txt").unwrap();
    f.seek(SeekFrom::Start(1024 * 1024 * 1024 * 1024)).unwrap();
    f.write_all(b"hello").unwrap();
}
% du -h tests/generated/large.txt    
4.0K	tests/generated/large.txt

% du --apparent-size -h tests/generated/large.txt
1.1T	tests/generated/large.txt
  1. Run cargo package or cargo publish.
% cargo +stable publish --dry-run --allow-dirty
    Updating crates.io index
warning: manifest has no description, license, license-file, documentation, homepage or repository.
See https://doc.rust-lang.org/cargo/reference/manifest.html#package-metadata for more info.
   Packaging dozen v0.1.0 (foo/dozen)
   Verifying dozen v0.1.0 (foo/dozen)
   Compiling dozen v0.1.0 (foo/dozen/target/package/dozen-0.1.0)
    Finished dev [unoptimized + debuginfo] target(s) in 0.53s
memory allocation of 1099511627782 bytes failed
zsh: abort (core dumped)  cargo +stable publish --dry-run --allow-dirty

Possible Solution(s)

I believe this was introduced by 78a60bc , which intentionally hashes all of build.rs's output, to detect changes it has made. I believe that change is maybe correct in this case; if build.rs changed the test data, then maybe the tests will perform differently, so it's a different test?

The allocation is failing here, which you can see with gdb, but not with logging or backtraces:

let contents = fs::read(entry.path())?;
let hash = util::hex::hash_u64(&contents);

i.e. read the entire file into memory, then hash the contents. Chunked hashing should not require unbounded memory?

Maybe:

diff --git a/src/cargo/ops/cargo_package.rs b/src/cargo/ops/cargo_package.rs
index a1b9a5f6..804ddb26 100644
--- a/src/cargo/ops/cargo_package.rs
+++ b/src/cargo/ops/cargo_package.rs
@@ -690,6 +690,7 @@ fn hash_all(path: &Path) -> CargoResult<HashMap<PathBuf, u64>> {
             let entry = entry?;
             let file_type = entry.file_type();
             if file_type.is_file() {
+                debug!("hashing {:?}", entry.path());
                 let contents = fs::read(entry.path())?;
                 let hash = util::hex::hash_u64(&contents);
                 result.insert(entry.path().to_path_buf(), hash);

I saw this in a real project, https://github.com/FauxFaux/ext4-rs , which extracts some sparse (i.e. tiny but apparently large) disc images from a tiny tar file during its tests. It publishes fine on e.g. 1.34.2, but not on stable.

Notes

Output of cargo version:

% cargo +stable --version                      
cargo 1.38.0 (23ef9a4ef 2019-08-20)

I'm on amd64 Ubuntu with ~30gb of memory available. If you have over 1TB of virtual memory available, then the above testcase might pass (lucky you).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions