Skip to content

Add lock option to File.OpenFlags and File.CreateFlags #4711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Apr 10, 2020

Conversation

leroycep
Copy link
Contributor

@leroycep leroycep commented Mar 11, 2020

This pull request makes it possible to gain exclusive access to a file by passing in a lock option when opening a file.

For example, this program:

const std = @import("std");

pub fn main() !void {
    const cwd = std.fs.cwd();
    const childpid = try std.os.fork();
    const pid = std.os.linux.getpid();

    std.debug.warn("[{}] Open file\n", .{pid});
    const first_open = try cwd.createFile("test.txt", .{ .lock = true });

    std.debug.warn("[{}] Get lock\n", .{pid});
    std.time.sleep(10 * std.time.ns_per_s);

    std.debug.warn("[{}] Release lock\n", .{pid});
    first_open.close();

    if (childpid != 0) {
        var status: u32 = 0;
        _ = std.os.linux.waitpid(childpid, &status, 0);
    }
}

Produces this output:

$ ./zig build-exe test_deadlock.zig; and ./test_deadlock
[31286] Open file
[31287] Open file
[31286] Get lock
[31286] Release lock
[31287] Get lock
[31287] Release lock

Todo

  • Implement blocking version
    • On Linux/POSIX systems
    • On Windows
  • [ ] Integrate with async IO
  • Write unit tests
  • [ ] Make every file open on Linux check for locks?

Some notes

  • On windows, the blocking API will catch SHARING_VIOLATION errors and try to open the file again after a small delay
  • Locks are acquired differently on Windows and Linux:
    • Linux calls fcntl after the file has been opened
    • NtCreateFile takes a list of what other file handles are allowed to do, like a lock with an inverted API
  • The behavior of locking is slightly different between windows and linux
    • On Windows
      • Each file handle acquires a lock
      • Locks are enforced by Windows
      • Opening a file twice within a program can cause it to deadlock
    • On Linux
      • Locks are advisory only; meaning that any program that doesn't check for a lock will ignore it
        • Mandatory locking methods exist, but these methods have issues
      • Locks are tracked in the kernel using a tuple of pid and inode
        • This means that a process opening a file twice will not block, but may cause data races

@leroycep
Copy link
Contributor Author

As an example of the differences between windows and linux:

const std = @import("std");

pub fn main() !void {
    const filename = "text.txt";
    const threads = [_]*std.Thread{
        try std.Thread.spawn(Context{ .thread_name = "one", .filename = filename }, lock_file),
        try std.Thread.spawn(Context{ .thread_name = "two", .filename = filename }, lock_file),
    };
    for (threads[0..]) |thread| {
        thread.wait();
    }
}

const Context = struct {
    thread_name: []const u8,
    filename: []const u8,
};

pub fn lock_file(ctx: Context) void {
    std.debug.warn("[{}] Open file\n", .{ctx.thread_name});
    const file = std.fs.cwd().createFile(ctx.filename, .{ .lock = true }) catch unreachable;

    std.debug.warn("[{}] Get lock\n", .{ctx.thread_name});
    std.time.sleep(1 * std.time.ns_per_s);

    std.debug.warn("[{}] Release lock\n", .{ctx.thread_name});
    file.close();
}

With WINE it produces the following:

$ ./zig build-exe test_deadlock.zig -target x86_64-windows-gnu; and ./test_deadlock.exe
[one] Open file
[one] Get lock
[two] Open file
[one] Release lock
[two] Get lock
[two] Release lock

On Linux it produces:

$ ./zig build-exe test_deadlock.zig; and ./test_deadlock
[one] Open file
[two] Open file
[one] Get lock
[two] Get lock
[two] Release lock
[one] Release lock

@LemonBoy
Copy link
Contributor

File locks on Linux may be mandatory as well [1] if the file system where the file resides is mounted with the mand option.

[1] https://www.kernel.org/doc/Documentation/filesystems/mandatory-locking.txt

@leroycep
Copy link
Contributor Author

@LemonBoy Yes? I'm not sure what you're trying to say. That's not something that can be controlled by the std library. The user has to make sure their filesystem is mounted with the correct flag, and that any files that need mandatory locking have the correct bits set. And then we don't have to change anything about the locking API; advisory locks are simply upgraded to mandatory locks.

Are you saying that I should add that to the documentation?

@LemonBoy
Copy link
Contributor

Are you saying that I should add that to the documentation?

Yeah, just a small note to say that your locks may become exclusive if certain conditions are met.

@andrewrk
Copy link
Member

Feel free to omit the async I/O integration in this PR. That's an important piece to this but it can be done as a separate change, and there may be some other prerequisite tasks

@andrewrk
Copy link
Member

Unit tests would be glorious

@leroycep
Copy link
Contributor Author

Okay, I added a unit test that checks if the two file openings overlapped; if they did, that means that the lock didn't block the other processes. There are a few more things that could be done:

  • Add unit test that checks if read locks allow other processes to acquire a read lock as well
  • Make every file check for locks (on Linux) before opening it
  • Return an error instead of blocking if the user does not intend to lock the file, but the file is locked by some other process?

On the topic of blocking vs returning an error: Windows only returns an error, so to make it blocking I simply check the file in a loop with a delay. Linux supports a blocking API natively. Should we make both nonblocking, or just leave as it is right now? With async we'll obviously want to use the nonblocking API, but I'm not sure in the blocking case.

@leroycep leroycep force-pushed the feature-file-locks branch from 569310c to 43ccc2d Compare March 14, 2020 16:14
@leroycep
Copy link
Contributor Author

I added a test that checks that two read locks are able to run at the same time.

I've also decided against checking for locks on before opening a file on Linux, because as I was implementing it, I got an error from openSelfExe. I suspect that the system locks the file while it's running, but I don't really know. Perhaps in some other pull request.

@leroycep leroycep marked this pull request as ready for review March 15, 2020 06:05
@LemonBoy
Copy link
Contributor

LemonBoy commented Apr 3, 2020

Okay, I'll try reimplmenting using flock.

Awesome, now the behavior is consistent across all platforms!

The tests I wrote don't care about order, they care about the locks not overlapping.

The access is serialized, that's what I meant. The PR is good as-is, we can always refine the tests at a later time if they become flaky.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I have some requested adjustments.

I'll try to pay close attention to this if you push any more commits, so we can try to get this merged in before the release on Monday (if you want to, of course. No pressure).

leroycep added 7 commits April 6, 2020 21:51
For some reason, this breaks file locking on windows. Not sure if this is
a problem with wine.
The share_access bitfield was being ORed with what was supposed to be
parts of the default value, meaning that the share_access would be
more permissive than expected.
Also, make windows share delete access. Rationale: this is how it works
on Unix systems, mostly because locks are (usually) advisory on Unix.
@leroycep
Copy link
Contributor Author

leroycep commented Apr 8, 2020

@andrewrk The requested changes have been implemented, and there is one more test for nonblocking locks.

Users can now choose whether the lock will be exclusive or shared by passing .lock = .Exclusive or .lock = .Shared to dir.openFile() or dir.createFile(). There is also a new flag named lock_nonblocking that will make the function return with error.WouldBlock when it is set true.

Note that BSD systems that are using the O_SHLOCK/O_EXLOCK flags will pass O_NONBLOCK | O_SYNC. O_NONBLOCK will make all I/O on a file nonblocking, and O_SYNC will make it so only opening the file is nonblocking, and not all the I/O operations.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the requested changes! I think this is ready to merge (pending CI tests passing).

leroycep added 3 commits April 7, 2020 21:26
That removes the other switch cases (`error.WouldBlock` here) from the
error set, I think.
@andrewrk
Copy link
Member

andrewrk commented Apr 8, 2020

It hung on:

24/1357 fs.test "std-native-Debug--multi open file with exclusive nonblocking lock twice"...

Might need to examine that test case for race conditions.

@leroycep
Copy link
Contributor Author

leroycep commented Apr 8, 2020

That test is supposed to be testing the lock_nonblocking flag, so it tries to open the same file twice. Apparently on MacOS the 'nonblocking' part isn't working.

I'm trying to find a way to develop/test for MacOS . Do you know of any good services that provide a MacOS environment? The best thing I've found so far is Macincloud.

Looking at the manual page for open on MacOS, it appears that, while it supports the O_SHLOCK and O_EXLOCK flags, it does not support the O_SYNC flag. I'll try forcing MacOS to use flock instead, and see if that stops the deadlock from occurring.

@andrewrk
Copy link
Member

andrewrk commented Apr 8, 2020

I'm sorry I don't know of any good options. Personally, I have a macos laptop for testing. If you need me to run any specific tests I can do that for you and report back.

The tests were put into a deadlock, and it seems that darwin doesn't
support `O_SYNC`, though it supports `O_NONBLOCK`. It shouldn't block
even with that, but I'm not sure why else it would fail.
@leroycep leroycep force-pushed the feature-file-locks branch from 6341a3c to 772bb1a Compare April 8, 2020 22:39
@leroycep
Copy link
Contributor Author

leroycep commented Apr 8, 2020

Could you try running the following test? https://github.com/ziglang/zig/pull/4711/files#diff-76cb704531736c8a6ea118029a935865R1689-R1703

I've forced the darwin systems to use the flock API instead of the O_SHLOCK/O_EXLOCK. Hopefully, that fixes the nonblocking flag on darwin.

@andrewrk
Copy link
Member

andrewrk commented Apr 8, 2020

The tests all pass on my macOS laptop, with this PR merged. However, I noticed that some of the fs lock tests take a few seconds to complete. Do you think we could reduce the sleep() times to just be a few milliseconds instead of seconds?

@andrewrk
Copy link
Member

andrewrk commented Apr 8, 2020

Even better would be: no sleeping at all. Instead of sleeping, a thread would wait on a std.ResetEvent and get awoken at the right time by the other thread.

@leroycep
Copy link
Contributor Author

leroycep commented Apr 9, 2020

Reducing the time is reasonable, so I've gone ahead and done that.

I don't think std.ResetEvent will work for this test though. The sleeping isn't to synchronize the threads, it's to make sure that the threads are trying to to access the file at the same time. Using std.ResetEvent would replace the functionality we are trying to test; that two process will not get a lock on the file if some other process (or thread) has a lock on it.

With a different type of test, one that doesn't try to make the locks overlap, I could get rid of the call to sleep. But I'm sure if it's worth the effort.

@leroycep
Copy link
Contributor Author

leroycep commented Apr 9, 2020

Actually, another option is to remove threads entirely, and only run tests with lock_nonblocking set.

@andrewrk
Copy link
Member

andrewrk commented Apr 9, 2020

I think that is reasonable test coverage. It would be nice to have a test that tests the blocking behavior, with no race conditions possible and no unconditional sleep(), but I don't consider that to be a merge blocker. I'll probably end up doing that as part of integration with async I/O.

Also I think the tests that use sleep() (with a low number) are fine for now. If the test ends up being flaky, the path forward will be to disable it and file an issue to track it.

@andrewrk andrewrk merged commit a6e288d into ziglang:master Apr 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants