Skip to content

Make std::fs::initial_buffer_size a method, rename it and make it public #85084

Closed
@wooster0

Description

@wooster0

There is cases where using std::fs::read_to_string or std::fs::read isn't the best thing to do or really cumbersome to do when your code is already shaped to actually create its own String and read a file into that. For instance imagine you have a file that you opened a while ago, did some processing on it like reading the metadata and now you decide to read it all into a String. std::fs::read_to_string would obviously be a very bad choice because it would open the file again. Instead it would be better to use std::Read::read_to_string which lets me read the content of an already existing File to a String. But then I wouldn't have the benefit that std::fs::read_to_string provides: an optimally allocated String.
In that case I would really like to use the private std::fs::initial_buffer_size function that the std already provides and uses internally in std::fs::read_to_string and std::fs::read so that I can efficiently preallocate my very own String in order to be as efficient as std::fs::read_to_string would be.
The biggest reason that ultimately drove me to open this issue is the fact that std::fs::read_to_string doesn't let me keep my file that it opened. There might even be people that would open the file twice to solve this. It's obviously very bad and leads to inefficiencies.

So, in summary, there is definitely cases where you want to do this:

use std::io::Read;
use std::fs;

fn main() -> std::io::Result<()> {
    let mut file = fs::File::open("hello")?;
    let mut string = String::new();
    file.read_to_string(&mut string)?;
    
    Ok(())
}

The problem with this is that it's not efficient because String is not preallocating with the file's size. But as I mentioned the std already provides us with a function that lets us compute just that, the optimal size, but we can't use it because it's private.

All in all what I'm saying is that initial_buffer_size being private leads to inefficiency, inconvenience, restrictiveness and possible boilerplate code because you might just copy-paste initial_buffer_size into your code just to be able to preallocate efficiently. And with that, it can lead to error-proneness as well.

I propose to

  • Make initial_buffer_size a method on fs::File taking &self because currently it takes &fs::File as its argument and I think it's just nicer like that.
  • Rename it to possibly optimal_buffer_size because I feel like that's going to be a bit clearer. Or get_optimal_buffer_size or get_optimal_buffer_capacity or optimal_buffer_capacity? I'm not 100% sure on which would be the most correct or appropriate here so I appreciate suggestions.
    In those names buffer refers to whatever structure you want to optimally allocate, not just limited to String of course.
  • Make it public.

If this is accepted, I would be willing to work on it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: A feature request, i.e: not implemented / a PR.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions