-
Notifications
You must be signed in to change notification settings - Fork 80
Vector2.clone()
performance hit for x86-64 and ARM64?
#319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is using The benchmarks seem to get scheduled on the performance cores of the machine judging by the observed load pattern in the CPU history in the activity monitor, which should perform 128-bit NEON/SIMD instructions more efficiently than the efficiency cores. Just initializing class Vector2CloneBenchmark extends BenchmarkBase {
const Vector2CloneBenchmark() : super('CloneVector2');
@override
void run() {
for (double i = -500; i <= 500; i += 0.75) {
for (double j = -500; j <= 500; j += 0.75) {
final _ = Vector2(j, i).clone();
}
}
}
} AOT:
It could be completely unrelated, but the other day I observed another heavy performance hit for the EDIT: Compiled and ran the benchmark (AOT) on x86-64 and it seems to be an issue there too ( |
Vector2
performance hit on ARM64?Vector2.clone()
performance hit for x86-64 and ARM64?
This should probably be a Dart VM bug as there is nothing that we can do here. |
Guys, you are writing benchmarks absolutely incorrectly! // ignore_for_file: avoid_print
import 'package:benchmark_harness/benchmark_harness.dart';
import 'package:vector_math/vector_math_64.dart';
// dart run benchmarks/iterator_benchmark.dart
void main() {
(<BenchmarkBase>[
const Vector2CloneBenchmark$Bad(),
const Vector2CloneBenchmark$Good(),
].map<({String name, double us})>(_measure).toList(growable: false)
..sort((a, b) => a.us.compareTo(b.us)))
.map<String>((e) => 'Benchmark ${e.name}: ${e.us.toStringAsFixed(2)} us')
.forEach(print);
}
({String name, double us}) _measure(BenchmarkBase benchmark) =>
(name: benchmark.name, us: benchmark.measure());
class Vector2CloneBenchmark$Bad extends BenchmarkBase {
const Vector2CloneBenchmark$Bad() : super('CloneVector2$Bad');
@override
void run() {
for (var i = -500.0; i <= 500; i += 0.75) {
for (var j = -500.0; j <= 500; j += 0.75) {
final _ = Vector2(j, i).clone();
}
}
}
}
class Vector2CloneBenchmark$Good extends BenchmarkBase {
const Vector2CloneBenchmark$Good() : super('CloneVector2$Good');
@override
void run() {
Vector2? vec;
for (var i = -500.0; i <= 500; i += 0.75) {
for (var j = -500.0; j <= 500; j += 0.75) {
vec = Vector2(j, i).clone();
}
}
if (vec == null) throw StateError('Vector2 is null');
}
} This is why you get strange results.
This is not about the processor architecture, this is about compiler optimization. The way you write the benchmark simply tells the compiler that all the code inside the loop can be simply thrown away, since it does nothing. At the very least, always check the result, this will disable optimizations. The same about /benchmark directory |
@PlugFox this doesn't seem related to this issue, I think you should open a separate issue for this. |
@spydon I can fix benchmarks and open new PR, is that okay? |
Absolutely! |
Hello! I'm experiencing a pretty heavy performance hit with additions using
Vector2
on ARM64 (M3) in Dart 3.3.3 on macOS. Haven't tested other arithmetics.For comparison I have used
math.Point
and a customPointDouble
to avoid potential issues with generics and monomorphization (as detailed in dart-lang/sdk#53912):Where
PointDouble
is:AOT (
dart compile exe
):JIT (
dart run
):The text was updated successfully, but these errors were encountered: