Skip to content

Race condition over build directories can cause the build queue reader to panic #984

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jyn514 opened this issue Aug 20, 2020 · 2 comments
Closed
Labels
A-admin Area: Administration of the production docs.rs server A-builds Area: Building the documentation for a crate C-bug Category: This is a bug P-high High priority

Comments

@jyn514
Copy link
Member

jyn514 commented Aug 20, 2020

This is the reason there are currently 86 crates in the queue.

2020/08/19 20:58:02 [INFO] rustwide::cmd: [stderr]     Updating crates.io index
[INFO] rustwide::cmd: [stderr]      Ignored package `git-credential-null v1.0.1` is already installed, use --force to override
thread 'build queue reader' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 39, kind: Other, message: "Directory not empty" }
   0: failure::backtrace::internal::InternalBacktrace::new
   1: failure::backtrace::Backtrace::new
   2: rustwide::workspace::Workspace::purge_all_build_dirs
   3: cratesfyi::docbuilder::rustwide_builder::RustwideBuilder::init
   4: cratesfyi::utils::queue_builder::queue_builder
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
   6: core::ops::function::FnOnce::call_once{{vtable.shim}}
   7: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/liballoc/boxed.rs:1008
      <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/liballoc/boxed.rs:1008
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/c7087fe00d2ba919df1d813c040a5d47e43b0fe7/src/libstd/sys/unix/thread.rs:87
   8: start_thread
   9: __clone
', src/utils/daemon.rs:73:13

The error went away after a re-deploy so it's an intermittent failure of some sort. I suspect that the new build queue reader is trying to start a build before the old deploy of the queue finishes its build, so it tries to purge the directory for an ongoing build and panics.

Do we really not have any synchronization for this?

@jyn514 jyn514 added C-bug Category: This is a bug P-high High priority A-builds Area: Building the documentation for a crate A-admin Area: Administration of the production docs.rs server labels Aug 20, 2020
@pietroalbini
Copy link
Member

The old deploy of the queue shouldn't be running after the new deploy is... well... deployed.

Caches purging is best-effort though, we could just let _ = the error.

@jyn514
Copy link
Member Author

jyn514 commented Aug 22, 2020

I'm a little confused what 'deploy' means in this context. The systemd script looks like

[Service]
User=cratesfyi
Group=cratesfyi
EnvironmentFile=/home/cratesfyi/.docs-rs-env
ExecStart=/home/ubuntu/docs.rs/target/release/cratesfyi daemon --foreground
WorkingDirectory=/home/ubuntu/docs.rs
LimitNOFILE=20000
Restart=on-failure

[Install]
WantedBy=multi-user.target

and the deploy script just does systemctl restart docs.rs. We don't have any signal handlers, so this immediately kills the old daemon, right? And the new daemon starts handling new requests? So then why is rustwide saying the directory isn't empty?

@syphar syphar closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admin Area: Administration of the production docs.rs server A-builds Area: Building the documentation for a crate C-bug Category: This is a bug P-high High priority
Projects
None yet
Development

No branches or pull requests

3 participants