Closed
Description
rustfmt panics on a codebase with multibyte characters in certain situations. Here is a backtrace
stack backtrace:
0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
1: std::panicking::default_hook::{{closure}}
2: std::panicking::default_hook
3: std::panicking::rust_panic_with_hook
4: std::panicking::begin_panic
5: std::panicking::begin_panic_fmt
6: rust_begin_unwind
7: core::panicking::panic_fmt
8: core::str::slice_error_fail
9: rustfmt::utils::trim_newlines
10: rustfmt::visitor::FmtVisitor::visit_item
11: rustfmt::visitor::FmtVisitor::walk_mod_items
12: rustfmt::visitor::FmtVisitor::format_separate_mod
13: rustfmt::format_ast
14: rustfmt::run
15: rustfmt::execute
16: rustfmt::main
17: __rust_maybe_catch_panic
18: std::rt::lang_start
It appears that panic happens in utils::trim_newlines
function at &input[start..end]
line because of improperly calculated let end = input.rfind(|c| c != '\n' && c != '\r').unwrap_or(0) + 1;
. Obviously, + 1
part won't work in case of unicode symbols which spans over several bytes.
I've tried to fix the problem, but don't know how to get the width of character at given offset in &str
using stable rust features. When unicode-rs/unicode-segmentation#21 will be resolved it would be possible to use GraphemeCursor
for that purpose.
To test if proper addend resolves the issue i've used this hacky solution:
let end = input.rfind(|c| c != '\n' && c != '\r').unwrap_or(0);
let rest = ::std::str::from_utf8(&input.as_bytes()[end..]).unwrap();
let char_len = rest.chars().next().unwrap().len_utf8();
let end = end + char_len;
and it did help.