-
-
Notifications
You must be signed in to change notification settings - Fork 4
Fix the fast path for nameprep #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/lib.rs
Outdated
@@ -126,7 +126,7 @@ fn is_prohibited_bidirectional_text(s: &str) -> bool { | |||
pub fn nameprep(s: &str) -> Result<Cow<'_, str>, Error> { | |||
// fast path for ascii text | |||
if s.chars() | |||
.all(|c| c.is_ascii_lowercase() && !tables::ascii_control_character(c)) | |||
.all(|c| c.is_ascii() && !c.is_ascii_uppercase() && !tables::ascii_control_character(c)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we also need to reject ascii spaces, which are prohibited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have simplified this expression, as ASCII domains only allow letters, digits, '-' and '.' actually.
The nodeprep fast path seems like it could be cleaned up as well - is_ascii_lowercase and ascii_control_character should be disjoint. |
char::is_ascii_lowercase() only returns true for alphabetical characters which are lowercase, so also add digits, '.' and '-' which are the only characters allowed in a non-IDN domain name. I have also added a test to not have that regress. This was found with this merge request in the jid crate: https://gitlab.com/xmpp-rs/xmpp-rs/-/merge_requests/205
Not every character is included, it’s missing '!', ';', '=' and '?' which each add one branch, for almost no usage in the wild. From a very unscientific dataset formed by my personal roster + my bookmarks, this reduces the time to parse all JIDs once from 128 µs to 35 µs.
The two expressions are equivalent, but the new one decreases the time spent parsing full JIDs by 1.2%..11.6% depending on the size of their resource.
9504a5d
to
914d6ff
Compare
I have also cleaned and optimised both nodeprep and resourceprep, the time spent parsing my entire roster and bookmarks has now been reduced from 127.9 µs to 35.2 µs on my i7-8700K, with only two occurrences using characters from higher codepoints than ASCII. This will obviously differ from dataset to dataset, but it should already be a pretty nice improvement in almost all cases. |
Thanks! |
char::is_ascii_lowercase()
only returnstrue
for alphabetical characters which are lowercase, which makes very common domain characters like'.'
miss out on this optimisation. Instead we usechar::is_ascii() && !char::is_ascii_uppercase()
to reach the expected outcome.I have also added a test to not have that regress.
This was found with this commit in the jid crate:
https://gitlab.com/xmpp-rs/xmpp-rs/-/merge_requests/205