Skip to content

ACP Implement fmt::Write wrapper for &mut [u8] (and maybe &mut str?) #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmillikin opened this issue Oct 13, 2023 · 10 comments
Closed
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@jmillikin
Copy link

jmillikin commented Oct 13, 2023

Proposal

Problem statement

The ability to place formatted text into a buffer seems like a useful primitive to have, so I propose adding types to core::fmt that implement Write on top of a user-provided &mut [u8]. These types would provide a safe way to access the portion of the buffer that has been written to.

Additional variants that seem useful to add at the same time:

  • A wrapper of &mut str that uses str::as_bytes_mut(), with careful attention toward handling the EOF case. This would let the user avoid UTF8 revalidation.
  • Writers to &mut [MaybeUnint<u8>] would involve a bit more setup boilerplate but enable bypassing the cost of buffer initialization.

Motivating examples or use cases

I frequently find myself writing a BufWriter { &mut [u8] } wrapper struct in no_std code when I want to do the Rust equivalent of snprintf(), for example when constructing Linux mount() data options in rust-fuse: https://github.com/jmillikin/rust-fuse/blob/47ca3bc5cda263543cf6a71155115a85dd1d094e/fuse/os/linux.rs#L539-L560.

Solution sketch

Something roughly like this -- I'm not attached to the names or the exact API, and the real implementation would leverage unsafe-unchecked methods to avoid bounds checks.

pub struct BufWriter<'a> {
	buf: &'a mut [u8],
	count: usize,
}

impl<'a> BufWriter<'a> {
	pub fn new(buf: &'a mut [u8]) -> BufWriter<'a> {
		BufWriter { buf, count: 0 }
	}

	pub fn bytes(&self) -> &[u8] {
		&self.buf[..self.count]
	}

	pub fn into_bytes(self) -> &'a [u8] {
		&self.buf[..self.count]
	}

	// maybe? I find this method useful when glomming together chunks of &CStr, but
	// I'd be fine leaving it out if libs-team dislikes it.
	//
	// pub fn write_bytes(&mut self, b: &[u8]) -> core::fmt::Result { ... }
}

impl core::fmt::Write for BufWriter<'_> {
	fn write_str(&mut self, s: &str) -> core::fmt::Result {
		let b = s.as_bytes();
		let avail = &mut self.buf[self.count..];
		if b.len() > avail.len() {
			return Err(core::fmt::Error);
		}
		avail[..b.len()].copy_from_slice(b);
		self.count += b.len();
		Ok(())
	}
}

Alternatives

  1. Add a function like write_to(buf: &mut [u8], w: &impl Write) -> Result<usize, fmt::Error> to core::fmt.

    • This API feels unergonomic to me, because the user must keep track of write offsets themselves.
  2. Add a type modeled on std::io::Cursor, which implements additional methods for reading and seeking.

    • Unclear where this new type would go -- core::io::Cursor isn't possible because that would cause name conflicts with std::io::Cursor, and core::fmt would be a weird place to put a type that does more than formatting.
    • More logic to implement than writer-only wrappers, but maybe code could be relocated from std::io::Cursor ?

Links and related work

This pattern is a popular StackOverflow question:

There's another open proposal for fmt::Write that seems related:

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@jmillikin jmillikin added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Oct 13, 2023
@m-ou-se
Copy link
Member

m-ou-se commented Oct 17, 2023

Does arrayvec::ArrayString work for your use cases?

I'd be interested in getting something like ArrayVec and ArrayString into core.

@jmillikin
Copy link
Author

I think ArrayString might be solving a different problem, although I've not used it before so it could be that I've missed something in the docs.

First, an ArrayString seems to use a constant-sized underlying buffer (const CAP: usize). I don't see a way to construct one from a &mut [u8], so there's no way to (for example) format a value into a fixed-size slice of a larger buffer. The user would have to use an intermediate ArrayString, format into it, then do a slice-to-slice copy into the target buffer.

Second, this feature would be closer to ArrayVec<u8>, since otherwise there's no way to mix UTF8 and non-UTF8 data in the buffer. For example when working with older C APIs, sometimes the text encoding is raw byte values (ISO-8859-1) or a legacy encoding (e.g. Shift_JIS) so the flexibility of &mut [u8] is helpful.

@jmillikin
Copy link
Author

I put together a draft implementation: rust-lang/rust#117931

It's not large (~120 lines including docs), and enables formatting data into a &mut [u8] or &mut [MaybeUninit<u8>]. It works well as a replacement for format!() in no-alloc code.

@ChayimFriedman2
Copy link

Can we re-use core::io::BorrowedBuf and core::io::BorrowedCursor for this?

@jmillikin
Copy link
Author

Can we re-use core::io::BorrowedBuf and core::io::BorrowedCursor for this?

Implementing fmt::Write for those types also seems like a reasonable idea, but they're not a substitute for this ACP because their API doesn't provide the same affordances -- in particular there's no way to obtain the written portion of their buffer via deconstruction.

You could use a BorrowedBuf in the implementation of fmt::WriteCursor, but the implementation is already so trivial that it doesn't seem worth the effort.

@ChayimFriedman2
Copy link

in particular there's no way to obtain the written portion of their buffer via deconstruction

What do you mean? We can get the written portion via BorrowedBuf::filled(). The only problem is that we cannot return a &str using from_utf8_unchecked(), since the user might write invalid UTF-8 bytes manually.

@jmillikin
Copy link
Author

in particular there's no way to obtain the written portion of their buffer via deconstruction

What do you mean? We can get the written portion via BorrowedBuf::filled(). The only problem is that we cannot return a &str using from_utf8_unchecked(), since the user might write invalid UTF-8 bytes manually.

That's not deconstruction, it's a reborrow.

@ChayimFriedman2
Copy link

We can add a method for deconstruction if it's a common need (if not, it can be emulated by taking the len() of the BorrowedBuf, dropping it then slicing the original data according to it, but it will require unsafe code if the data is MaybeUninit).

@jmillikin
Copy link
Author

jmillikin commented Nov 30, 2023

We can add a method for deconstruction if it's a common need (if not, it can be emulated by taking the len() of the BorrowedBuf, dropping it then slicing the original data according to it, but it will require unsafe code if the data is MaybeUninit).

I think the ACP covers what I'm about to say, but just to clarify: I know a BorrowedBuf (or any &mut [u8]) can be used to store the result of write!() by writing a dozen lines of glue code. The goal of this ACP is to have that glue code be part of the standard library, not ad-hoc reimplemented in each crate that needs to do alloc-free formatting.

Consider C, which has the sprintf() family of functions. It is very easy to use sprintf() to format into a buffer:

#include <assert.h>
#include <string.h>
#include <stdint.h>
#include <stdio.h>

void fmt_u16(char *buf, size_t buf_len, uint16_t v) {
    assert(buf_len >= 6); 
    sprintf(buf, "%hu", v); 
}

int main() {
    char buf[6];
    fmt_u16(buf, sizeof buf, 12345);
    printf("formatted: %s\n", buf);
    return 0;
}

The equivalent in Rust would be written like this in current stable (lifetimes explicit for clarity):

use core::fmt;
use core::mem::MaybeUninit;

fn fmt_u16<'a>(buf: &'a mut [MaybeUninit<u8>], v: u16) -> &'a str {
    struct W<'a> { buf: &'a mut [MaybeUninit<u8>], idx: usize }
    impl fmt::Write for W<'_> {
        fn write_str(&mut self, s: &str) -> fmt::Result {
            self.buf[self.idx..self.idx+s.len()].copy_from_slice(unsafe {
                core::mem::transmute(s.as_bytes())
            });
            self.idx += s.len();
            Ok(())
        }
    }
    let mut w = W { buf, idx: 0 };
    let _ = fmt::Write::write_fmt(&mut w, format_args!("{v}"));
    unsafe {
        let utf8: &[u8] = core::mem::transmute(&w.buf[..w.idx]);
        core::str::from_utf8_unchecked(utf8)
    }
}

fn main() {
    let mut buf = [MaybeUninit::uninit(); 6];
    println!("formatted: {:?}", fmt_u16(&mut buf, 12345));
}

And it would be really nice if all that extra stuff wasn't needed -- if non-allocating format look like this:

fn fmt_u16<'a>(buf: &'a mut [MaybeUninit<u8>], v: u16) -> &'a str {
    let mut cursor = fmt::WriteCursor::new_uninit(buf);
    write!(cursor, "{v}");
    cursor.into_str()
}

fn main() {
    let mut buf = [MaybeUninit::uninit(); 6];
    println!("formatted: {:?}", fmt_u16(&mut buf, 12345));
}

@SUPERCILEX
Copy link

I just put out a crate that tries to solve the composition problem in a generalized fashion: https://github.com/SUPERCILEX/io-adapters

Example usage that (I believe) solves this proposal's use case:

//! ```cargo
//! [dependencies]
//! io-adapters = "0.1"
//! ```
#![feature(core_io_borrowed_buf)] // Or just use a normal array if you want to keep things simple

use core::fmt;
use std::{io::BorrowedBuf, mem::MaybeUninit, str::from_utf8};

use io_adapters::WriteExtension;

fn main() {
    let mut buf = [MaybeUninit::uninit(); 128];
    let mut buf = BorrowedBuf::from(buf.as_mut_slice());

    foo(buf.unfilled().write_adapter());

    println!("{:?}", from_utf8(buf.filled()));
}

fn foo(mut output: impl fmt::Write) {
    writeln!(output, "Hello, World!").unwrap();
}

@jmillikin jmillikin closed this as not planned Won't fix, can't repro, duplicate, stale Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

4 participants