Fix the fast path for nameprep #5

linkmauve · 2023-07-15T18:03:46Z

char::is_ascii_lowercase() only returns true for alphabetical characters which are lowercase, which makes very common domain characters like '.' miss out on this optimisation. Instead we use char::is_ascii() && !char::is_ascii_uppercase() to reach the expected outcome.

I have also added a test to not have that regress.

This was found with this commit in the jid crate:
https://gitlab.com/xmpp-rs/xmpp-rs/-/merge_requests/205

sfackler · 2023-07-15T19:12:23Z

src/lib.rs

@@ -126,7 +126,7 @@ fn is_prohibited_bidirectional_text(s: &str) -> bool {
 pub fn nameprep(s: &str) -> Result<Cow<'_, str>, Error> {
    // fast path for ascii text
    if s.chars()
-        .all(|c| c.is_ascii_lowercase() && !tables::ascii_control_character(c))
+        .all(|c| c.is_ascii() && !c.is_ascii_uppercase() && !tables::ascii_control_character(c))


It looks like we also need to reject ascii spaces, which are prohibited.

I have simplified this expression, as ASCII domains only allow letters, digits, '-' and '.' actually.

sfackler · 2023-07-15T19:13:24Z

The nodeprep fast path seems like it could be cleaned up as well - is_ascii_lowercase and ascii_control_character should be disjoint.

char::is_ascii_lowercase() only returns true for alphabetical characters which are lowercase, so also add digits, '.' and '-' which are the only characters allowed in a non-IDN domain name. I have also added a test to not have that regress. This was found with this merge request in the jid crate: https://gitlab.com/xmpp-rs/xmpp-rs/-/merge_requests/205

Not every character is included, it’s missing '!', ';', '=' and '?' which each add one branch, for almost no usage in the wild. From a very unscientific dataset formed by my personal roster + my bookmarks, this reduces the time to parse all JIDs once from 128 µs to 35 µs.

The two expressions are equivalent, but the new one decreases the time spent parsing full JIDs by 1.2%..11.6% depending on the size of their resource.

linkmauve · 2023-07-15T23:35:23Z

I have also cleaned and optimised both nodeprep and resourceprep, the time spent parsing my entire roster and bookmarks has now been reduced from 127.9 µs to 35.2 µs on my i7-8700K, with only two occurrences using characters from higher codepoints than ASCII. This will obviously differ from dataset to dataset, but it should already be a pretty nice improvement in almost all cases.

sfackler · 2023-07-16T13:27:57Z

Thanks!

linkmauve mentioned this pull request Jul 15, 2023

Add a fast path returning Cow::Borrowed for ASCII-only prepping #4

Merged

sfackler reviewed Jul 15, 2023

View reviewed changes

linkmauve added 3 commits July 16, 2023 01:08

Improve resourceprep ASCII matching

914d6ff

The two expressions are equivalent, but the new one decreases the time spent parsing full JIDs by 1.2%..11.6% depending on the size of their resource.

linkmauve force-pushed the fix-nameprep-ascii-check branch from 9504a5d to 914d6ff Compare July 15, 2023 23:30

sfackler merged commit 1da0b55 into sfackler:master Jul 16, 2023

linkmauve deleted the fix-nameprep-ascii-check branch July 16, 2023 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix the fast path for nameprep #5

Fix the fast path for nameprep #5

Uh oh!

linkmauve commented Jul 15, 2023

Uh oh!

sfackler Jul 15, 2023

Uh oh!

linkmauve Jul 15, 2023

Uh oh!

sfackler commented Jul 15, 2023

Uh oh!

linkmauve commented Jul 15, 2023 •

edited

Loading

Uh oh!

sfackler commented Jul 16, 2023

Uh oh!

Uh oh!

Uh oh!

Fix the fast path for nameprep #5

Fix the fast path for nameprep #5

Uh oh!

Conversation

linkmauve commented Jul 15, 2023

Uh oh!

sfackler Jul 15, 2023

Choose a reason for hiding this comment

Uh oh!

linkmauve Jul 15, 2023

Choose a reason for hiding this comment

Uh oh!

sfackler commented Jul 15, 2023

Uh oh!

linkmauve commented Jul 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfackler commented Jul 16, 2023

Uh oh!

Uh oh!

linkmauve commented Jul 15, 2023 •

edited

Loading