Skip to content

String Guide suggestions #15994

Closed
Closed
@steveklabnik

Description

@steveklabnik
Member

From reddit: http://www.reddit.com/r/rust/comments/2bpenl/confused_by_the_purpose_of_str_and_string/cj7rt1u

  • add a section for indexing - i.e. how do I compare just the first 3 characters of two strings? Or fetch the character in position 3 in the string? Or iterate through the characters in the string?
  • comparing - i.e. how do I know if one string is greater than another - and on what basis is the ordering done (binary value of the strings, or based on character set)?
  • applying regular expressions to strings, is &str preferred over String for this?

I like all of these, and they should be in the guide.

Activity

changed the title [-]String Guide suggetsions[/-] [+]String Guide suggestions[/+] on Jul 26, 2014
chris-morgan

chris-morgan commented on Jul 26, 2014

@chris-morgan
Member

Anything about indexing should be minimal and should be very strongly suggesting that you shouldn’t be doing this in the first place. Almost always, a string should be opaque data. Iteration is just about the only way you should ever do such things, and even iteration should very seldom be done.

Alternatives that may serve certain purposes are begins_with and ends_with, and graphemes() will need to be mentioned.


As for comparison, UTF-8 has the convenient property that bitwise comparison yields the same answers as codepoint comparison. Of course there is still the question of composed versus decomposed characters and so on and so forth… the simple summary is that you really shouldn’t be doing comparisons, either.

People want to do all these operations on strings, but it seems to me that the more experienced you get, the more you realise that these sorts of operations are all unsound and should never really be done.

steveklabnik

steveklabnik commented on Jul 26, 2014

@steveklabnik
MemberAuthor

Agreed. This is a good place to explain that.

nielsle

nielsle commented on Jul 26, 2014

@nielsle
Contributor

In #15997 I tried to rearrange the sections to introduce String before &str. That makes it easier to introduce &str as a view into String. (I agree that indexing should be discouraged, but indexing makes it easy to explain how &str is different from String)

This PR is mostly meant as an experiment. Feel free to close it if you are already editing the chapter or if you are heading in a different direction.

pcn

pcn commented on Jul 27, 2014

@pcn
Contributor

The string guide should have some common use cases and the rust-ish (rusty? oxidi-shous?) way described. My case is this:

I want to take a string (e.g. a url) and use it to do something not too complex (e.g. authenticate to the AWS S3 api). This involves taking the url, and deciding based on the url which of the two available formats will be used, and returning the string that will be used to determine the signature of the request.

This means some slicing and dicing. Coming from python/ruby/go/clojure (even C) the easiest answer is to split the string and compare to known values (e.g. does the hostname bit of the URL start with "s3.amazonaws.com") which lends itself naturally to a match. The odd part is that I pass in one kind of string (an &str) , and get another kind out (a String) where I need to be familiar with a whole different set of traits vs. &str. My understanding is that I should prefer String types, and I can see this being a common idiom - so much so that there should probably be some agreement on how something like this could be made more obvious:

fn bucket_name_from_path <'a>(path: &'a str) -> String {
    let parts: Vec<&str> = path.split_str("/").collect();
    return match parts.get(0).slice_from(0) {
        "s3.amazonaws.com" =>  parts.get(1).to_string(),
        _ => name_from_vhost_style(*parts.get(0))
    }
}
fn name_from_vhost_style <'a>(vhostname: &'a str) -> String {
    let hostname_parts: Vec<&str> = vhostname.split_str(".").collect::<Vec<&str>>();
    let bucket_parts = hostname_parts.slice(0, hostname_parts.len() - 2);
    return bucket_parts.connect("");
}

I would like to have documented where the convention should be to place type conversions via to_str() and collect() etc. It would be nice to just be able to say that e.g. I should just convert &str strings to String and document which operations on a String are similar to common string operations in other languages (comparisons, splitting, joining, tokenizing, etc.), explain the slice types and how to operate with them (and why they exist) and just overall make it so that there is a clear path to doing common things the easy way.

samdoshi

samdoshi commented on Jul 28, 2014

@samdoshi

Would it be a good idea to discuss std::str::MaybeOwned here? When it's appropriate to use it and when it isn't.

steveklabnik

steveklabnik commented on Jul 28, 2014

@steveklabnik
MemberAuthor

@samdoshi it might. I know nothing about it.

lee-b

lee-b commented on Jul 30, 2014

@lee-b

How come the strings guide doesn't mention the char type (utf32) at all? ;)

steveklabnik

steveklabnik commented on Jul 30, 2014

@steveklabnik
MemberAuthor

Strings are UTF8, not UTF32.

lee-b

lee-b commented on Jul 30, 2014

@lee-b

I know, but that makes it even more confusing and in need of explanation. Why is an str a u8 slice, rather than chars, and why IS there a char that's 32-bit, but not part of string, etc.? ;)

I get it, at a low level: char is a 32-bit value, capable of representing all (most?) unicode codepoints as a fixed-length binary value. But it's not clear why they called that char, why there's no "byte" type, why string is essentially a vector of bytes, but already converted from bytes to unicode (rather than using stronger typing, and calling it char8, for example). The low-level stuff is understandable, but the high-level design / reasoning, and how to use char along with all this... that stuff's not so clear.

reem

reem commented on Jul 30, 2014

@reem
Contributor

It might be a good idea to bring up Str, which makes writing APIs that are agnostic to the type of string they receive better.

pcn

pcn commented on Aug 5, 2014

@pcn
Contributor

From the current state of the guide:

In general, you should prefer String when you need ownership, and &str when you just need to borrow a string.

Insight into examples of both would be helpful.

This means starting off with this:

fn foo(s: &str) {

and only moving to this:

fn foo(s: String) {

I'd like to know what you'd think about this language:

This means starting off with a string slice like this:

fn foo(s: &str) {

and only moving to this:

fn foo(s: String) {

If you have good reason such as <some examples in code somewhere and a description of why those examples are good uses of using String?>. 

Just below that it says:

Furthermore, you can pass either kind of string into foo by using .as_slice() on any String you need to pass in, so the &str version is more flexible.

That reads as a bit confusing to me. If I understand it, would this preserve the meaning and provide some more clarity?

Furthermore, the version of foo that accepts a &str argument can be seen as more flexible because it can be passed an &str or a String.  How is that possible?  A String has the .as_slice() trait, which presents it to the function as a string slice, so you can invoke foo(some_String.as_slice()) if it accepts an &str.
steveklabnik

steveklabnik commented on Aug 5, 2014

@steveklabnik
MemberAuthor

Yes, that means the same thing. I feel they're about equally clear, but if you feel that it's more...

pcn

pcn commented on Aug 5, 2014

@pcn
Contributor

Maybe there's a better phrasing? I feel like from the perspective of the un-initiated reader, the extra information helps by describing the mechanism and the context.

steveklabnik

steveklabnik commented on Aug 11, 2014

@steveklabnik
MemberAuthor

Adding a section on c_str and FFI would be good as well.

http://doc.rust-lang.org/std/c_str/index.html

l0kod

l0kod commented on Aug 18, 2014

@l0kod
Contributor

The guide should maybe add a note to highlight the Str trait who can be used as a generic parameter if the function doesn't care about owning (or not) the string. This way, it's possible to use &str or String, which might be convenient:

fn foo<T: Str>(msg: T) {
    std::io::stdio::print(msg.as_slice());
}
foo("hello");
foo(" world".to_string());
l0kod

l0kod commented on Aug 18, 2014

@l0kod
Contributor

In general, the guide should encourage traits as function parameter instead of types.

steveklabnik

steveklabnik commented on Jan 12, 2015

@steveklabnik
MemberAuthor

I think that most of this has been tackled, if there are specific improvements, please open new issues with one per issue.

added a commit that references this issue on Dec 4, 2023

Auto merge of rust-lang#15994 - ChayimFriedman2:err-comma-after-fus, …

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @steveklabnik@samdoshi@nielsle@chris-morgan@pcn

        Issue actions

          String Guide suggestions · Issue #15994 · rust-lang/rust