Skip to content

copy_file_range: use FICLONERANGE when possible #12489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion lib/std/fs.zig
Original file line number Diff line number Diff line change
Expand Up @@ -2805,7 +2805,7 @@ fn copy_file(fd_in: os.fd_t, fd_out: os.fd_t) CopyFileRawError!void {
// The kernel checks the u64 value `offset+count` for overflow, use
// a 32 bit value so that the syscall won't return EINVAL except for
// impossibly large files (> 2^64-1 - 2^32-1).
const amt = try os.copy_file_range(fd_in, offset, fd_out, offset, math.maxInt(u32), 0);
const amt = try os.copy_file_range(fd_in, offset, fd_out, offset, math.maxInt(u32));
// Terminate when no data was copied
if (amt == 0) break :cfr_loop;
offset += amt;
Expand Down
2 changes: 1 addition & 1 deletion lib/std/fs/file.zig
Original file line number Diff line number Diff line change
Expand Up @@ -1219,7 +1219,7 @@ pub const File = struct {

pub fn copyRange(in: File, in_offset: u64, out: File, out_offset: u64, len: u64) CopyRangeError!u64 {
const adjusted_len = math.cast(usize, len) orelse math.maxInt(usize);
const result = try os.copy_file_range(in.handle, in_offset, out.handle, out_offset, adjusted_len, 0);
const result = try os.copy_file_range(in.handle, in_offset, out.handle, out_offset, adjusted_len);
return result;
}

Expand Down
74 changes: 53 additions & 21 deletions lib/std/os.zig
Original file line number Diff line number Diff line change
Expand Up @@ -6260,32 +6260,64 @@ pub const CopyFileRangeError = error{

var has_copy_file_range_syscall = std.atomic.Atomic(bool).init(true);

/// Transfer data between file descriptors at specified offsets.
/// Returns the number of bytes written, which can less than requested.
/// Transfer data between file descriptors at specified offsets. Returns the
/// number of bytes written, which can be less than requested.
/// The `copy_file_range` call copies `len` bytes from one file descriptor to
/// another. When possible, this is done within the operating system kernel,
/// which can provide better performance characteristics than transferring data
/// from kernel to user space and back.
///
/// The `copy_file_range` call copies `len` bytes from one file descriptor to another. When possible,
/// this is done within the operating system kernel, which can provide better performance
/// characteristics than transferring data from kernel to user space and back, such as with
/// `pread` and `pwrite` calls.
///
/// `fd_in` must be a file descriptor opened for reading, and `fd_out` must be a file descriptor
/// opened for writing. They may be any kind of file descriptor; however, if `fd_in` is not a regular
/// file system file, it may cause this function to fall back to calling `pread` and `pwrite`, in which case
/// atomicity guarantees no longer apply.
/// `fd_in` must be a file descriptor opened for reading, and `fd_out` must be
/// a file descriptor opened for writing. They may be any kind of file
/// descriptor; however, if `fd_in` is not a regular file system file, it may
/// cause this function to fall back to calling `pread` and `pwrite`, in which
/// case atomicity guarantees no longer apply.
///
/// If `fd_in` and `fd_out` are the same, source and target ranges must not overlap.
/// The file descriptor seek positions are ignored and not updated.
/// When `off_in` is past the end of the input file, it successfully reads 0 bytes.
/// If `fd_in` and `fd_out` are the same, source and target ranges must not
/// overlap. The file descriptor seek positions are ignored and not updated.
/// When `off_in` is past the end of the input file, it successfully reads 0
/// bytes.
///
/// `flags` has different meanings per operating system; refer to the respective man pages.
///
/// These systems support in-kernel data copying:
/// * Linux 4.5 (cross-filesystem 5.3)
/// Depending on the system, a few mechanisms are tried:
///
/// Other systems fall back to calling `pread` / `pwrite`.
/// * Linux 4.5+: ioctl.FICLONERANGE: atomic, O(1), the fastest method. Uses
/// copy-on-write, therefore saves disk space and time. As of Linux 5.19,
/// available on btrfs, cifs, nfs, ocfs2, overlayfs and xfs. The source and
/// destination must be on the same file system.
/// * Linux 4.5+: `copy_file_range(2)` via a libc wrapper (if libc is linked)
/// or a syscall. This works at the block layer, so cross-filesystem
/// in-kernel copying (Linux 5.3+) is possible.
/// * Everything else: `pread`/`pwrite`.
///
/// Maximum offsets on Linux are `math.maxInt(i64)`.
pub fn copy_file_range(fd_in: fd_t, off_in: u64, fd_out: fd_t, off_out: u64, len: usize, flags: u32) CopyFileRangeError!usize {
pub fn copy_file_range(fd_in: fd_t, off_in: u64, fd_out: fd_t, off_out: u64, len: usize) CopyFileRangeError!usize {
const ficlone_range = comptime builtin.os.isAtLeast(.linux, .{ .major = 4, .minor = 5 }) orelse true;

if (ficlone_range) {
const arg = linux.FICLONERANGE_arg{
.src_fd = fd_in,
.src_offset = off_in,
.src_length = len,
.dest_offset = off_out,
};
while (true) {
const rc = system.ioctl(fd_out, linux.T.FICLONERANGE, @ptrToInt(&arg));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you benchmark the impact when used on filesystems which do not support CoW clones? In that case, this adds an extra system call for each run

As of at least Ubuntu 20.04, the default filesystem is ext4 which does not support CoW clones

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have put this to the commit message; I am not overly concerned with the extra syscall, since coreutils is doing reflinking by default for cp; so I punted on the costs.

I see some other issues with the PR, marking as draft.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the number of syscalls. We may get rid of one in #12491

switch (system.getErrno(rc)) {
.SUCCESS => return @intCast(usize, rc),
.INTR => continue,
.BADF => return error.FilesOpenedWithWrongFlags,
// may not be regular files, try fallback
.INVAL => break,
.ISDIR => return error.IsDir,
// not regular files or FS does not support reflinking; fallback worthy
.OPNOTSUPP => break,
.PERM => return error.PermissionDenied,
.TXTBSY => return error.FileBusy,
else => |err| return unexpectedErrno(err),
}
}
}

const call_cfr = comptime if (builtin.os.tag == .wasi)
// WASI-libc doesn't have copy_file_range.
false
Expand All @@ -6298,7 +6330,7 @@ pub fn copy_file_range(fd_in: fd_t, off_in: u64, fd_out: fd_t, off_out: u64, len
var off_in_copy = @bitCast(i64, off_in);
var off_out_copy = @bitCast(i64, off_out);

const rc = system.copy_file_range(fd_in, &off_in_copy, fd_out, &off_out_copy, len, flags);
const rc = system.copy_file_range(fd_in, &off_in_copy, fd_out, &off_out_copy, len, 0);
switch (system.getErrno(rc)) {
.SUCCESS => return @intCast(usize, rc),
.BADF => return error.FilesOpenedWithWrongFlags,
Expand Down
8 changes: 8 additions & 0 deletions lib/std/os/linux.zig
Original file line number Diff line number Diff line change
Expand Up @@ -2760,6 +2760,13 @@ pub const DT = struct {
pub const WHT = 14;
};

pub const FICLONERANGE_arg = extern struct {
src_fd: i64,
src_offset: u64,
src_length: u64,
dest_offset: u64,
};

pub const T = struct {
pub const CGETS = if (is_mips) 0x540D else 0x5401;
pub const CSETS = 0x5402;
Expand Down Expand Up @@ -2794,6 +2801,7 @@ pub const T = struct {
pub const IOCGSERIAL = 0x541E;
pub const IOCSSERIAL = 0x541F;
pub const IOCPKT = 0x5420;
pub const FICLONERANGE = IOCTL.IOW(0x94, 13, FICLONERANGE_arg);
pub const FIONBIO = 0x5421;
pub const IOCNOTTY = 0x5422;
pub const IOCSETD = 0x5423;
Expand Down