Skip to content

error for too many trailing #s in raw string literals could be improved #60762

@fbstj

Description

@fbstj
Contributor

for example (play)

let foo = r##"bar"###;

results in

error: expected one of `.`, `;`, `?`, or an operator, found `#`
 --> src/main.rs:2:25
  |
2 |     let foo = r##"bar"###;
  |                         ^ expected one of `.`, `;`, `?`, or an operator here

while too few has a much nicer error:

error: unterminated raw string
 --> src/main.rs:3:15
  |
3 |     let baz = r##"quxx"#;
  |               ^ unterminated raw string
  |
  = note: this raw string should be terminated with `"##`

This issue has been assigned to @rcoh via this comment.

Activity

added
A-diagnosticsArea: Messages for errors, warnings, and lints
C-enhancementCategory: An issue proposing an enhancement or a PR with one.
E-easyCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.
on May 12, 2019
hellow554

hellow554 commented on May 13, 2019

@hellow554
Contributor

Gonna work on it (my first work on the compiler internals, so please bear with me :) )

estebank

estebank commented on May 13, 2019

@estebank
Contributor

Feel free to ask for help!


I feel both of these errors could be tweaked slightly:

error: expected one of `.`, `;`, `?`, or an operator, found `#`
 --> src/main.rs:2:25
  |
2 |     let foo = r##"bar"###;
  |                --       ^
  |                |        |
  |                |        the raw string needs 2 trailing `#`, but found 3
  |                |        help: remove this `#` (this should be a hidden suggestion for rustfix' sake)
  |                the raw string has 2 leading `#`

and

error: unterminated raw string
 --> src/main.rs:3:15
  |
3 |     let baz = r##"quxx"#;
  |               ^        -
  |               |        |
  |               |        the raw string needs two trailing `#`, but found 1
  |               |        help: close the raw string: `##`
  |               unterminated raw string

The later might be harder to accomplish, needing to keep track of possible but incorrect closing spans (probably just looking for "# in the string being processed). This seems like a different enough task, and harder enough to be worthy of a separate PR.

hellow554

hellow554 commented on May 14, 2019

@hellow554
Contributor

I indeed have a question @estebank. The fatal error itself is emitted at

pub fn expect_one_of(
but my logic is currently somewhere around
'r' => {
let start_bpos = self.pos;
self.bump();
let mut hash_count: u16 = 0;
while self.ch_is('#') {
if hash_count == 65535 {
let bpos = self.next_pos;
self.fatal_span_(start_bpos,
bpos,
"too many `#` symbols: raw strings may be \
delimited by up to 65535 `#` symbols").raise();
}
self.bump();
hash_count += 1;
}
if self.is_eof() {
self.fail_unterminated_raw_string(start_bpos, hash_count);
} else if !self.ch_is('"') {
let last_bpos = self.pos;
let curr_char = self.ch.unwrap();
self.fatal_span_char(start_bpos,
last_bpos,
"found invalid character; only `#` is allowed \
in raw string delimitation",
curr_char).raise();
}
self.bump();
let content_start_bpos = self.pos;
let mut content_end_bpos;
let mut valid = true;
'outer: loop {
if self.is_eof() {
self.fail_unterminated_raw_string(start_bpos, hash_count);
}
// if self.ch_is('"') {
// content_end_bpos = self.pos;
// for _ in 0..hash_count {
// self.bump();
// if !self.ch_is('#') {
// continue 'outer;
let c = self.ch.unwrap();
match c {
'"' => {
content_end_bpos = self.pos;
for _ in 0..hash_count {
self.bump();
if !self.ch_is('#') {
continue 'outer;
}
}
break;
}
'\r' => {
if !self.nextch_is('\n') {
let last_bpos = self.pos;
self.err_span_(start_bpos,
last_bpos,
"bare CR not allowed in raw string, use \\r \
instead");
valid = false;
}
}
_ => (),
}
self.bump();
}
self.bump();
let id = if valid {
self.name_from_to(content_start_bpos, content_end_bpos)
} else {
Symbol::intern("??")
};
let suffix = self.scan_optional_raw_name();
Ok(token::Literal(token::StrRaw(id, hash_count), suffix))
}

I don't see how I can add labels to that error message without actually creating my own fatal which would most likely will end up in code duplication.

Solutions I see:

  1. Emit my own fatal which is very similar to the one already emitted
    * downside would be code duplication and increased complexity
    * plus would be, same error message as before
  2. Emit a warning with the suggested solution
    * downside, more text,
    * plus, very little additional code needed
  3. Emit a different error than we do now
    * downside, another error, which might ne not suitable, because the current one is already very good
    * plus, very little additional code needed

This is what my solution currently looks like.

error: The raw string needs 2 trailing `#`, but found 3
  --> /home/op/me/rust2/src/test/ui/parser/raw/raw-literal-too-many.rs:2:26
   |
LL |     let _foo = r##"bar"###;
   |                          ^ remove this `#`

error: expected one of `.`, `;`, `?`, or an operator, found `#`
  --> /home/op/me/rust2/src/test/ui/parser/raw/raw-literal-too-many.rs:2:26
   |
LL |     let _foo = r##"bar"###;
   |                          ^ expected one of `.`, `;`, `?`, or an operator here

error: aborting due to 2 previous errors
hellow554

hellow554 commented on May 14, 2019

@hellow554
Contributor

In addition to too many #, I was looking at the case where there are less # than expected, but not in the same line, but over a lot of lines (in my newly added testcase there are 18 lines now).
I would like to improve that error message as well, something like:

error: unterminated raw string
  --> $DIR/raw-string-long.rs:2:13
   |
LL |     let a = r##"This
   |             ^
   |             started here with 2 `#`
...
   |
LL |        ends"#;
   |             ^ ended here with only 1 `#`
   |             help: close the raw string with `"##`

error: aborting due to previous error

I'm not sure about how to do that, if I just can use a span from the very beginning to the end and it will leave out the intermediate lines automatically or do I have to do something about that?

On the other hand, is that even possible to detect? I don't know if one can find the end of this, except there an EOF. What do you think about this?

hellow554

hellow554 commented on May 14, 2019

@hellow554
Contributor

I looked at the first implementation of this cc #48546 @GuillaumeGomez . What do you think about this?

GuillaumeGomez

GuillaumeGomez commented on May 14, 2019

@GuillaumeGomez
Member

That might make things more clear, indeed. 👍

estebank

estebank commented on May 14, 2019

@estebank
Contributor

I'm not sure about how to do that, if I just can use a span from the very beginning to the end and it will leave out the intermediate lines automatically or do I have to do something about that?

@hellow554, you should be able to synthesize a span that covers the entire closing sigil ("####) where we look for it and break if we don't find it. We should hold a vec of spans that we add to every time we find them, because it is possible that we indeed have a raw string that contains "# multiple times and we can't assume that the first we find is correct. We could have some extra smarts in case we were writing something like r###"r##"r#""#"##"### or r###"r##"r#""#"#"# to avoid suggesting all three closing spans, but for now the naïve way should be fine.

When supplying a span that covers multiple lines, the cli output will show the first 5(?) and last two lines of the covered span and draw the ascii art pointing at the start and end. The output you envision would happen by providing independent spans, and it's what I would prefer.

For the too many # at the end, you could proactively look for them before the break, by following case 1 and using something like

let lo = self.span;
let mut too_many = false;
while self.ch_is('#') {
    too_many = true;
    self.bump();
}
if too_many {
    let sp = lo.to(self.span);
    let mut err = self.sess.span_diagnostic.struct_span_err(sp, "too many `#` when terminating raw string");
    err.span_label(sp, "too many `#` for raw string");
    err.hidden_span_suggestion(
        sp,
        "remove the unneeded `#`"
        String::new(),
        Applicability::MachineApplicable,
    );
    err.emit();
}

By doing this, the parser (the lexer, rather) will continue on its merry way and parse the rest of the file as if nothing had happened.

hellow554

hellow554 commented on May 15, 2019

@hellow554
Contributor

Thanks estebank. That was very helpful indeed. I haven't thought of consuming the bad characters so that the lexer can do the rest for me. Very nice.
I'm a bit undecided where the journey goes. Currently I have this:

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |               ^^^^^^^^^^--
  |               |         |
  |               |         help: remove the unneeded `#`
  |               The raw string needs 2 trailing `#`, but found 4

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs 
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
   |
2  |       let a = r##"This
   |  _____________^
3  | |     is
4  | |     a
5  | |     very
...  |
19 | |     lines
20 | |     "###;
   | |        ^
   | |        |
   | |________help: remove the unneeded `#`
   |          The raw string needs 2 trailing `#`, but found 3

This is nice and looks very neat :) This is not a hidden suggestion as you see, is that okay for you as well? I mean, yes, it's kind of obvious which ones to remove, but I also like the expressiveness here.

hellow554

hellow554 commented on May 15, 2019

@hellow554
Contributor

I also tried to add the text "The raw string has {} leading #", but it looks very crowded to me, when the raw string only is one line.

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
   |
2  |       let a = r##"This
   |               ^-- The raw string has 2 leading `#`
   |  _____________|
   | |
3  | |     is
4  | |     a
5  | |     very
...  |
19 | |     lines
20 | |     "###;
   | |        ^
   | |        |
   | |________help: remove the unneeded `#`
   |          The raw string needs 2 trailing `#`, but found 3

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |               ^--^^^^^^^--
  |               ||        |
  |               ||        help: remove the unneeded `#`
  |               |The raw string has 2 leading `#`
  |               The raw string needs 2 trailing `#`, but found 4

Should I add an check if the raw string only spans over one line? If yes, how to do that? ^^

estebank

estebank commented on May 15, 2019

@estebank
Contributor

There are two options, either we don't use the full string's span (showing only the start and the end), or use is_multiline to provide different output depending on the visual result.

I think a good output using the former approach would be:

error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |                --     ^^^^
  |                |      |
  |                |      ...but it's closed with 4
  |                the raw string has 2 leading `#`...

With a hidden suggestion pointing at the last two # to avoid clutter.

This is not a hidden suggestion as you see, is that okay for you as well? I mean, yes, it's kind of obvious which ones to remove, but I also like the expressiveness here.

I'm ok with using visible suggestions when possible, but for borderline cases where the course of action is obvious and the output would be too cluttered, I'd lean towards hiding the suggestion. They will still be visible in VSCode and actionable by rustfix.

hellow554

hellow554 commented on May 16, 2019

@hellow554
Contributor

That looks very promising!

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:2:13
   |
2  |       let a = r##"This
   |               ^-- The raw string has 2 leading `#`...
   |  _____________|
   | |
3  | |     is
4  | |     a
5  | |     very
...  |
19 | |     lines
20 | |     "###;
   | |______--^
   |        |
   |        ...but is closed with 3.
   = help: remove the unneeded `#`

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:15
  |
2 |     let foo = r##"bar"####;
  |               ^--^^^^^----
  |                |      |
  |                |      ...but is closed with 4.
  |                The raw string has 2 leading `#`...
  = help: remove the unneeded `#`

The only thing is (that bugs me a little bit) are the ^^^ between the ----.

grafik
but I haven't found a way to get rid of them. I think it's okay, but they are not needed IMHO. Time to work on the other suggestion!

Reducing the error span to only the ending `# symbols does not look nice for multi line string literals.

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many.rs
error: too many `#` when terminating raw string
 --> src/test/ui/parser/raw/raw-literal-too-many.rs:2:23
  |
2 |     let foo = r##"bar"####; //~ERROR too many `#` when terminating raw string
  |                --     ^^^^ ...but is closed with 4.
  |                |
  |                The raw string has 2 leading `#`...
  = help: remove the unneeded `#`

error: aborting due to previous error

op@VBOX ~/m/rust2> build/x86_64-unknown-linux-gnu/stage1/bin/rustc src/test/ui/parser/raw/raw-literal-too-many-long.rs 
error: too many `#` when terminating raw string
  --> src/test/ui/parser/raw/raw-literal-too-many-long.rs:20:6
   |
2  |     let a = r##"This
   |              -- The raw string has 2 leading `#`...
...
20 |     "###; //~ERROR too many `#` when terminating raw string"
   |      ^^^ ...but is closed with 3.
   = help: remove the unneeded `#`

error: aborting due to previous error

Therefore I'm afraid that's not an option. Any other way to get a nice result?

16 remaining items

added a commit that references this issue on Mar 31, 2020
5e7e5ec
added a commit that references this issue on Apr 1, 2020
88ff077
added a commit that references this issue on Apr 1, 2020
c739465
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-diagnosticsArea: Messages for errors, warnings, and lintsC-enhancementCategory: An issue proposing an enhancement or a PR with one.E-easyCall for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @fbstj@rcoh@hellow554@estebank@pietroalbini

    Issue actions

      error for too many trailing #s in raw string literals could be improved · Issue #60762 · rust-lang/rust