Skip to content

Add Neon mld_polyvecl_pointwise_acc_montgomery_l{4,5,7}_native #281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mkannwischer
Copy link
Contributor

These are basically written from scratch inspired by the same functions in mlkem-native.
Resolves #257

@mkannwischer mkannwischer changed the title Add Neon mld_polyvecl_pointwise_acc_montgomery_l{4,5,7}_native Add Neon mld_polyvecl_pointwise_acc_montgomery_l{4,5,7}_native May 23, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 98795 cycles 100259 cycles 0.99
ML-DSA-44 sign 220838 cycles 225382 cycles 0.98
ML-DSA-44 verify 100723 cycles 102348 cycles 0.98
ML-DSA-65 keypair 187512 cycles 181582 cycles 1.03
ML-DSA-65 sign 355226 cycles 365151 cycles 0.97
ML-DSA-65 verify 165873 cycles 168247 cycles 0.99
ML-DSA-87 keypair 293129 cycles 296190 cycles 0.99
ML-DSA-87 sign 495585 cycles 504189 cycles 0.98
ML-DSA-87 verify 290593 cycles 293610 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Mac Mini (M1, 2020) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-65 keypair 187512 cycles 181582 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 115144 cycles 115133 cycles 1.00
ML-DSA-44 sign 354603 cycles 354678 cycles 1.00
ML-DSA-44 verify 124702 cycles 124700 cycles 1.00
ML-DSA-65 keypair 202234 cycles 202233 cycles 1.00
ML-DSA-65 sign 563400 cycles 563367 cycles 1.00
ML-DSA-65 verify 199679 cycles 199698 cycles 1.00
ML-DSA-87 keypair 324059 cycles 324064 cycles 1.00
ML-DSA-87 sign 727103 cycles 727079 cycles 1.00
ML-DSA-87 verify 332139 cycles 332161 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 98819 cycles 98683 cycles 1.00
ML-DSA-44 sign 282436 cycles 282233 cycles 1.00
ML-DSA-44 verify 102909 cycles 103404 cycles 1.00
ML-DSA-65 keypair 165585 cycles 164971 cycles 1.00
ML-DSA-65 sign 449412 cycles 446621 cycles 1.01
ML-DSA-65 verify 163626 cycles 163228 cycles 1.00
ML-DSA-87 keypair 274172 cycles 274227 cycles 1.00
ML-DSA-87 sign 587370 cycles 588128 cycles 1.00
ML-DSA-87 verify 272295 cycles 272530 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 152098 cycles 152095 cycles 1.00
ML-DSA-44 sign 444995 cycles 444105 cycles 1.00
ML-DSA-44 verify 161410 cycles 161258 cycles 1.00
ML-DSA-65 keypair 254856 cycles 254909 cycles 1.00
ML-DSA-65 sign 692231 cycles 691112 cycles 1.00
ML-DSA-65 verify 254858 cycles 254797 cycles 1.00
ML-DSA-87 keypair 424938 cycles 424453 cycles 1.00
ML-DSA-87 sign 916489 cycles 917430 cycles 1.00
ML-DSA-87 verify 427940 cycles 427318 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 98528 cycles 98671 cycles 1.00
ML-DSA-44 sign 282918 cycles 282749 cycles 1.00
ML-DSA-44 verify 102905 cycles 103271 cycles 1.00
ML-DSA-65 keypair 165420 cycles 165680 cycles 1.00
ML-DSA-65 sign 448721 cycles 449512 cycles 1.00
ML-DSA-65 verify 163567 cycles 163709 cycles 1.00
ML-DSA-87 keypair 273790 cycles 274502 cycles 1.00
ML-DSA-87 sign 588743 cycles 588030 cycles 1.00
ML-DSA-87 verify 272629 cycles 272042 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 152111 cycles 152325 cycles 1.00
ML-DSA-44 sign 444140 cycles 444936 cycles 1.00
ML-DSA-44 verify 161220 cycles 161297 cycles 1.00
ML-DSA-65 keypair 254909 cycles 254742 cycles 1.00
ML-DSA-65 sign 693678 cycles 691709 cycles 1.00
ML-DSA-65 verify 254684 cycles 254653 cycles 1.00
ML-DSA-87 keypair 424795 cycles 424479 cycles 1.00
ML-DSA-87 sign 916121 cycles 917945 cycles 1.00
ML-DSA-87 verify 427729 cycles 427515 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 136196 cycles 136220 cycles 1.00
ML-DSA-44 sign 437198 cycles 437480 cycles 1.00
ML-DSA-44 verify 147615 cycles 147439 cycles 1.00
ML-DSA-65 keypair 224159 cycles 224408 cycles 1.00
ML-DSA-65 sign 673932 cycles 675932 cycles 1.00
ML-DSA-65 verify 227650 cycles 228096 cycles 1.00
ML-DSA-87 keypair 374913 cycles 374879 cycles 1.00
ML-DSA-87 sign 886192 cycles 886615 cycles 1.00
ML-DSA-87 verify 382383 cycles 382821 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 119944 cycles 119848 cycles 1.00
ML-DSA-44 sign 369702 cycles 372574 cycles 0.99
ML-DSA-44 verify 128077 cycles 128111 cycles 1.00
ML-DSA-65 keypair 199526 cycles 199629 cycles 1.00
ML-DSA-65 sign 562421 cycles 563339 cycles 1.00
ML-DSA-65 verify 200935 cycles 201022 cycles 1.00
ML-DSA-87 keypair 332568 cycles 332393 cycles 1.00
ML-DSA-87 sign 735105 cycles 740390 cycles 0.99
ML-DSA-87 verify 335144 cycles 335127 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 136105 cycles 136234 cycles 1.00
ML-DSA-44 sign 437119 cycles 436841 cycles 1.00
ML-DSA-44 verify 147784 cycles 147385 cycles 1.00
ML-DSA-65 keypair 224174 cycles 224115 cycles 1.00
ML-DSA-65 sign 673368 cycles 674617 cycles 1.00
ML-DSA-65 verify 227647 cycles 227473 cycles 1.00
ML-DSA-87 keypair 374787 cycles 374645 cycles 1.00
ML-DSA-87 sign 885681 cycles 885689 cycles 1.00
ML-DSA-87 verify 382192 cycles 382513 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 195813 cycles 195538 cycles 1.00
ML-DSA-44 sign 467780 cycles 468288 cycles 1.00
ML-DSA-44 verify 198346 cycles 198500 cycles 1.00
ML-DSA-65 keypair 349532 cycles 349924 cycles 1.00
ML-DSA-65 sign 766754 cycles 769221 cycles 1.00
ML-DSA-65 verify 327977 cycles 328753 cycles 1.00
ML-DSA-87 keypair 574513 cycles 574288 cycles 1.00
ML-DSA-87 sign 1039442 cycles 1041264 cycles 1.00
ML-DSA-87 verify 559160 cycles 561305 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 120139 cycles 119894 cycles 1.00
ML-DSA-44 sign 371046 cycles 370018 cycles 1.00
ML-DSA-44 verify 127996 cycles 128095 cycles 1.00
ML-DSA-65 keypair 199546 cycles 199529 cycles 1.00
ML-DSA-65 sign 568638 cycles 562876 cycles 1.01
ML-DSA-65 verify 200784 cycles 201061 cycles 1.00
ML-DSA-87 keypair 332459 cycles 332423 cycles 1.00
ML-DSA-87 sign 736425 cycles 734871 cycles 1.00
ML-DSA-87 verify 335152 cycles 334513 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 126643 cycles 126462 cycles 1.00
ML-DSA-44 sign 284818 cycles 285843 cycles 1.00
ML-DSA-44 verify 127276 cycles 127639 cycles 1.00
ML-DSA-65 keypair 219410 cycles 219551 cycles 1.00
ML-DSA-65 sign 464942 cycles 466644 cycles 1.00
ML-DSA-65 verify 209992 cycles 210189 cycles 1.00
ML-DSA-87 keypair 373933 cycles 373515 cycles 1.00
ML-DSA-87 sign 642871 cycles 644031 cycles 1.00
ML-DSA-87 verify 360443 cycles 360543 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 120124 cycles 119481 cycles 1.01
ML-DSA-44 sign 268999 cycles 270117 cycles 1.00
ML-DSA-44 verify 120179 cycles 120339 cycles 1.00
ML-DSA-65 keypair 206894 cycles 207331 cycles 1.00
ML-DSA-65 sign 432500 cycles 433348 cycles 1.00
ML-DSA-65 verify 198203 cycles 198268 cycles 1.00
ML-DSA-87 keypair 351026 cycles 351426 cycles 1.00
ML-DSA-87 sign 595565 cycles 595083 cycles 1.00
ML-DSA-87 verify 337999 cycles 338116 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 214446 cycles 214463 cycles 1.00
ML-DSA-44 sign 628958 cycles 628572 cycles 1.00
ML-DSA-44 verify 228777 cycles 228771 cycles 1.00
ML-DSA-65 keypair 376155 cycles 376467 cycles 1.00
ML-DSA-65 sign 1011977 cycles 1011699 cycles 1.00
ML-DSA-65 verify 370499 cycles 370770 cycles 1.00
ML-DSA-87 keypair 615981 cycles 615520 cycles 1.00
ML-DSA-87 sign 1356920 cycles 1355773 cycles 1.00
ML-DSA-87 verify 628956 cycles 629617 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 196062 cycles 195490 cycles 1.00
ML-DSA-44 sign 469150 cycles 468458 cycles 1.00
ML-DSA-44 verify 198553 cycles 198578 cycles 1.00
ML-DSA-65 keypair 349372 cycles 350052 cycles 1.00
ML-DSA-65 sign 766723 cycles 769817 cycles 1.00
ML-DSA-65 verify 328014 cycles 328702 cycles 1.00
ML-DSA-87 keypair 574007 cycles 573635 cycles 1.00
ML-DSA-87 sign 1038975 cycles 1040820 cycles 1.00
ML-DSA-87 verify 562328 cycles 562639 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 138804 cycles 138763 cycles 1.00
ML-DSA-44 sign 392926 cycles 393228 cycles 1.00
ML-DSA-44 verify 146696 cycles 146558 cycles 1.00
ML-DSA-65 keypair 236587 cycles 236680 cycles 1.00
ML-DSA-65 sign 627492 cycles 628105 cycles 1.00
ML-DSA-65 verify 237075 cycles 236790 cycles 1.00
ML-DSA-87 keypair 398147 cycles 397573 cycles 1.00
ML-DSA-87 sign 828379 cycles 829008 cycles 1.00
ML-DSA-87 verify 396970 cycles 397656 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 132260 cycles 132229 cycles 1.00
ML-DSA-44 sign 386511 cycles 386680 cycles 1.00
ML-DSA-44 verify 140942 cycles 140904 cycles 1.00
ML-DSA-65 keypair 226509 cycles 226499 cycles 1.00
ML-DSA-65 sign 624613 cycles 625373 cycles 1.00
ML-DSA-65 verify 227469 cycles 227445 cycles 1.00
ML-DSA-87 keypair 375517 cycles 375635 cycles 1.00
ML-DSA-87 sign 811770 cycles 812807 cycles 1.00
ML-DSA-87 verify 375381 cycles 378722 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 214565 cycles 215403 cycles 1.00
ML-DSA-44 sign 629195 cycles 629588 cycles 1.00
ML-DSA-44 verify 228932 cycles 229804 cycles 1.00
ML-DSA-65 keypair 376226 cycles 376318 cycles 1.00
ML-DSA-65 sign 1011213 cycles 1012301 cycles 1.00
ML-DSA-65 verify 370222 cycles 370692 cycles 1.00
ML-DSA-87 keypair 616464 cycles 615892 cycles 1.00
ML-DSA-87 sign 1358962 cycles 1357262 cycles 1.00
ML-DSA-87 verify 630201 cycles 632135 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 380460 cycles 379624 cycles 1.00
ML-DSA-44 sign 1001498 cycles 1004065 cycles 1.00
ML-DSA-44 verify 397993 cycles 398120 cycles 1.00
ML-DSA-65 keypair 658524 cycles 659069 cycles 1.00
ML-DSA-65 sign 1634484 cycles 1625403 cycles 1.01
ML-DSA-65 verify 637527 cycles 637846 cycles 1.00
ML-DSA-87 keypair 1104235 cycles 1098780 cycles 1.00
ML-DSA-87 sign 2188006 cycles 2189409 cycles 1.00
ML-DSA-87 verify 1086286 cycles 1086531 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 264475 cycles 263857 cycles 1.00
ML-DSA-44 sign 734976 cycles 687983 cycles 1.07
ML-DSA-44 verify 265297 cycles 266253 cycles 1.00
ML-DSA-65 keypair 492277 cycles 489525 cycles 1.01
ML-DSA-65 sign 1174015 cycles 1066611 cycles 1.10
ML-DSA-65 verify 443088 cycles 440971 cycles 1.00
ML-DSA-87 keypair 773427 cycles 766135 cycles 1.01
ML-DSA-87 sign 1464879 cycles 1446596 cycles 1.01
ML-DSA-87 verify 760362 cycles 748425 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 sign 734976 cycles 687983 cycles 1.07
ML-DSA-65 sign 1174015 cycles 1066611 cycles 1.10

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 469899 cycles 470052 cycles 1.00
ML-DSA-44 sign 1778220 cycles 1777818 cycles 1.00
ML-DSA-44 verify 538870 cycles 539568 cycles 1.00
ML-DSA-65 keypair 784129 cycles 783732 cycles 1.00
ML-DSA-65 sign 2814346 cycles 2812726 cycles 1.00
ML-DSA-65 verify 834144 cycles 833767 cycles 1.00
ML-DSA-87 keypair 1270161 cycles 1278540 cycles 0.99
ML-DSA-87 sign 3545617 cycles 3554561 cycles 1.00
ML-DSA-87 verify 1346236 cycles 1355321 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 297306 cycles 296045 cycles 1.00
ML-DSA-44 sign 1039798 cycles 945323 cycles 1.10
ML-DSA-44 verify 317739 cycles 318588 cycles 1.00
ML-DSA-65 keypair 536213 cycles 537722 cycles 1.00
ML-DSA-65 sign 1510993 cycles 1509477 cycles 1.00
ML-DSA-65 verify 514003 cycles 514381 cycles 1.00
ML-DSA-87 keypair 832452 cycles 832883 cycles 1.00
ML-DSA-87 sign 1957313 cycles 1950150 cycles 1.00
ML-DSA-87 verify 847156 cycles 848568 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 sign 1039798 cycles 945323 cycles 1.10

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: e9c3850 Previous: 062f811 Ratio
ML-DSA-44 keypair 952104 cycles 952452 cycles 1.00
ML-DSA-44 sign 3693692 cycles 3679288 cycles 1.00
ML-DSA-44 verify 1079215 cycles 1079440 cycles 1.00
ML-DSA-65 keypair 1574608 cycles 1573598 cycles 1.00
ML-DSA-65 sign 5847661 cycles 5851330 cycles 1.00
ML-DSA-65 verify 1698671 cycles 1699344 cycles 1.00
ML-DSA-87 keypair 2546671 cycles 2546316 cycles 1.00
ML-DSA-87 sign 7303532 cycles 7248273 cycles 1.01
ML-DSA-87 verify 2711419 cycles 2710426 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

These are  basically written from scratch inspired by the same functions
in mlkem-native.
Resolves #257

Signed-off-by: Matthias J. Kannwischer <[email protected]>
@@ -44,15 +47,20 @@ static int cmp_uint64_t(const void *a, const void *b)

static int bench(void)
{
int32_t data0[256];
MLD_ALIGN int32_t data0[256];
MLD_ALIGN int32_t data1[MLDSA_K * 256];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be MLDSA_L? I'm surprised that no valgrind test is failing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K is strictly larger than L - so it is not surprising that this does not fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, true.

add a3_ptr, a0_ptr, #(3 * 1024)
add a4_ptr, a0_ptr, #(4 * 1024)
add a5_ptr, a4_ptr, #(1 * 1024)
add a6_ptr, a5_ptr, #(1 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use a4 here to avoid prolonging dependency chain

Copy link
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mkannwischer! I support this change -- there should also be a small speedup from using SLOTHY on the code.

Left some comments

add b3_ptr, b0_ptr, #(3 * 1024)
add b4_ptr, b0_ptr, #(4 * 1024)
add b5_ptr, b4_ptr, #(1 * 1024)
add b6_ptr, b5_ptr, #(1 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use b4 here to avoid prolonging dependency chain

#if defined(MLD_USE_NATIVE_POLYVECL_POINTWISE_ACC_MONTGOMERY)

#if MLDSA_L == 4
/*************************************************
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Indentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Neon polyvecl_pointwise_acc_montgomery
3 participants