Skip to content

double.remainder() performance issues? #55479

Closed
@JosefWN

Description

@JosefWN

Hello everyone! Maybe I'm doing something wrong here, but I'm experiencing a pretty heavy performance hit with double.remainder() on ARM64 (M3) in Dart 3.3.3 on macOS.

As per the documentation, a.remainder(b) is mathematically equivalent to:

double remainder(final double a, final num b) => a - (a ~/ b) * b;

But when I compare the two, the custom implementation is somewhere between one and two orders of magnitude faster:

class RemainderBenchmark extends BenchmarkBase {
  const RemainderBenchmark() : super('Remainder');

  @override
  void run() {
    for (double i = -500; i <= 500; i += 0.75) {
      for (double j = -500; j <= 500; j += 0.75) {
        final _ = i.remainder(j);
      }
    }
  }
}

class RemainderCustomBenchmark extends BenchmarkBase {
  const RemainderCustomBenchmark() : super('RemainderCustom');

  @override
  void run() {
    for (double i = -500; i <= 500; i += 0.75) {
      for (double j = -500; j <= 500; j += 0.75) {
        final _ = remainder(i, j);
      }
    }
  }
}

AOT (dart compile exe):

RemainderCustom(RunTime): 38108.96153846154 us.
Remainder(RunTime): 478124.0 us. <- About 13x slower!

JIT (dart run):

RemainderCustom(RunTime): 9825.791469194313 us.
Remainder(RunTime): 507507.0 us. <- About 52x slower!

Any ideas why this is the case?

Activity

lrhn

lrhn commented on Apr 16, 2024

@lrhn
Member

Similar results on (..checks...) intel CPU.

An up to 40x slowdown suggests to me that the code is either using a very inefficient assembler instruction, or it's calling into the C runtime for each .remainder call. Even using % is faster, but still much slower than the custom version.

Or maybe remainder on doubles is just amazingly expensive to do correctly, and the approximation won't give the same result in all cases. (And if that's OK, it's a good optimization.)

I do think double.remainder is compiled as a runtime call. The code for it is:

  double remainder(num other) {
    return _remainder(other.toDouble());
  }

  @pragma("vm:external-name", "Double_remainder")
  external double _remainder(double other);

which means calling a function in double.cc. There could be an intrinsified version that replaces this with generated assembler, but I can't find it.

added
area-vmUse area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
type-performanceIssue relates to performance or code size
on Apr 16, 2024
JosefWN

JosefWN commented on Apr 16, 2024

@JosefWN
Author

An up to 40x slowdown suggests to me that the code is either using a very inefficient assembler instruction, or it's calling into the C runtime for each .remainder call. Even using % is faster, but still much slower than the custom version.

Yes, I was also thinking it could be some form of call overhead, it seems unlikely that disastrous assembler is produced across multiple platforms? I actually re-implemented % as well, but it is quite performant on ARM64. I didn't manage to beat it...

Or maybe remainder on doubles is just amazingly expensive to do correctly, and the approximation won't give the same result in all cases. (And if that's OK, it's a good optimization.)

I think it's not a mathematical approximation, but an equivalence. I also considered that the two could be different numerically, but they seem to produce identical results over a pretty large number of inputs, i.e. i.remainder(j) == remainder(i, j) holds, in accordance with the docs (although i.remainder(j) produces a more straight-forward exception when j = 0). Of course, there could be non-trivial edge cases I have overlooked.

a-siva

a-siva commented on Apr 17, 2024

@a-siva
Contributor
added
P2A bug or feature request we're likely to work on
on Apr 17, 2024
self-assigned this
on Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

P2A bug or feature request we're likely to work onarea-vmUse area-vm for VM related issues, including code coverage, and the AOT and JIT backends.triagedIssue has been triaged by sub teamtype-performanceIssue relates to performance or code size

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @lrhn@a-siva@JosefWN@alexmarkov

      Issue actions

        `double.remainder()` performance issues? · Issue #55479 · dart-lang/sdk