Skip to content

Update tests to Unicode 16.0 #1045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 8, 2025
Merged

Update tests to Unicode 16.0 #1045

merged 3 commits into from
May 8, 2025

Conversation

hsivonen
Copy link
Collaborator

@hsivonen hsivonen commented May 8, 2025

This updates the tests to Unicode 16.0. The test harness needs changes, because the earlier test suite had a bug concerning trailing dots. Now the test suite matches the spec text, but the deprecated idna API retains the behavior that was written to the old test suite bug.

It is somewhat unfortunate the test suite is in this repo, but whether the code performs Unicode 16.0 behavior is up to the dependencies. Therefore, the expected landing sequence is this:

  1. This PR (hopefully!) gets approved.
  2. Publish idna_adapter 1.2.1 from its main branch.
  3. Publish idna_mapping 1.1.0 from its main branch.
  4. Land this PR.

@hsivonen hsivonen requested a review from valenting May 8, 2025 07:22
@hsivonen
Copy link
Collaborator Author

hsivonen commented May 8, 2025

And the tests here are, of course, failing, because the new versions of the dependencies haven't been published, yet.

Also, once idna_adapter 1.2.1 is published, the rust-url CI Rust version with default idna_adapter needs to be raised to 1.82.

@hsivonen
Copy link
Collaborator Author

hsivonen commented May 8, 2025

And, of course ICU4X 1.x doesn't work with Unicode 16.0 test data, so that can't be tested.

@hsivonen hsivonen marked this pull request as draft May 8, 2025 08:00
@hsivonen
Copy link
Collaborator Author

hsivonen commented May 8, 2025

And, indeed, there are enough changes that the old test suite does not pass with Unicode 16.0 implementation internals.

@hsivonen hsivonen marked this pull request as ready for review May 8, 2025 08:29
@hsivonen
Copy link
Collaborator Author

hsivonen commented May 8, 2025

Timings building reqwest trunk on M3 Pro:

Back end Debug Release
ICU4X 2.0 6.0 s 7.7 s
ICU4X 1.5 6.2 s 8.1 s
unicode-rs 5.2 s 7.1 s
no-unicode 5.0 s 6.8 s

@hsivonen
Copy link
Collaborator Author

hsivonen commented May 8, 2025

Updating from ICU4X 1.5 (Unicode 15.1) to 2.0 (Unicode 16.0) increases the Brotli-compressed wasm-opt-optimized wasm footprint of rust-url by 2273 bytes.

@hsivonen
Copy link
Collaborator Author

hsivonen commented May 8, 2025

ICU4X 1.5:

test to_ascii_already_puny_label ... bench:         114 ns/iter (+/- 1)
test to_ascii_cow_hyphen         ... bench:          30 ns/iter (+/- 1)
test to_ascii_cow_leading_digit  ... bench:          57 ns/iter (+/- 0)
test to_ascii_cow_plain          ... bench:          11 ns/iter (+/- 0)
test to_ascii_cow_punycode_ltr   ... bench:         253 ns/iter (+/- 7)
test to_ascii_cow_punycode_mixed ... bench:         155 ns/iter (+/- 6)
test to_ascii_cow_punycode_rtl   ... bench:         242 ns/iter (+/- 4)
test to_ascii_cow_unicode_ltr    ... bench:         293 ns/iter (+/- 14)
test to_ascii_cow_unicode_mixed  ... bench:         208 ns/iter (+/- 9)
test to_ascii_cow_unicode_rtl    ... bench:         283 ns/iter (+/- 3)
test to_ascii_merged             ... bench:         227 ns/iter (+/- 4)
test to_ascii_puny_label         ... bench:         133 ns/iter (+/- 1)
test to_ascii_simple             ... bench:          28 ns/iter (+/- 1)
test to_unicode_ascii            ... bench:          26 ns/iter (+/- 0)
test to_unicode_merged_label     ... bench:         257 ns/iter (+/- 1)
test to_unicode_puny_label       ... bench:         120 ns/iter (+/- 1)

ICU4X 2.0:

test to_ascii_already_puny_label ... bench:         104 ns/iter (+/- 3)
test to_ascii_cow_hyphen         ... bench:          26 ns/iter (+/- 0)
test to_ascii_cow_leading_digit  ... bench:          52 ns/iter (+/- 1)
test to_ascii_cow_plain          ... bench:           8 ns/iter (+/- 0)
test to_ascii_cow_punycode_ltr   ... bench:         223 ns/iter (+/- 12)
test to_ascii_cow_punycode_mixed ... bench:         141 ns/iter (+/- 1)
test to_ascii_cow_punycode_rtl   ... bench:         225 ns/iter (+/- 13)
test to_ascii_cow_unicode_ltr    ... bench:         261 ns/iter (+/- 3)
test to_ascii_cow_unicode_mixed  ... bench:         197 ns/iter (+/- 4)
test to_ascii_cow_unicode_rtl    ... bench:         261 ns/iter (+/- 4)
test to_ascii_merged             ... bench:         199 ns/iter (+/- 4)
test to_ascii_puny_label         ... bench:         122 ns/iter (+/- 0)
test to_ascii_simple             ... bench:          23 ns/iter (+/- 0)
test to_unicode_ascii            ... bench:          22 ns/iter (+/- 0)
test to_unicode_merged_label     ... bench:         202 ns/iter (+/- 2)
test to_unicode_puny_label       ... bench:         112 ns/iter (+/- 1)

Copy link

codecov bot commented May 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@7cff874). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1045   +/-   ##
=======================================
  Coverage        ?   80.11%           
=======================================
  Files           ?       24           
  Lines           ?     4355           
  Branches        ?        0           
=======================================
  Hits            ?     3489           
  Misses          ?      866           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hsivonen hsivonen added this pull request to the merge queue May 8, 2025
Merged via the queue into servo:main with commit 68f151c May 8, 2025
22 of 33 checks passed
@hsivonen hsivonen deleted the unicode16 branch May 8, 2025 15:57
@hsivonen hsivonen mentioned this pull request May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants