Skip to content

What is the correct way of resetting the heap when using "stub" or "minimal" runtime? #2736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
UstymUkhman opened this issue Aug 31, 2023 · 10 comments
Labels

Comments

@UstymUkhman
Copy link

Question

First of all, I'd like to thank you for maintaining this awesome project. I've really enjoyed working with AssemblyScript, great job!

After implementing what I thought was a perfect algorithm for WASM, I realized that the result was a lot slower than it's counter part written in TypeScript. I made some research and tested several tricks aimed to improve performance. As mentioned here and here some things did help. The unchecked() function most of them all, but I do use @inline whenever it makes sense and always cache arrays' length and use StaticArrays (even though it does not make much difference in my tests comparing to Float32Arrays or Uint8ClampedArrays, I decided to stick with it in order to avoid missing some optimization technique). I compile with --optimize -O3 and converge and noAssert are both set to true. Unfortunately none of that helped and the AS version is running ~10 times slower than TS counterpart.

Then I've read this fantastic article and it seamed like one last thing to do is to play with runtime options. Indeed using minimal helped a bit. For instance, TS version takes on average 0.08 seconds to render a frame, incremental runtime takes ~0.55 seconds, minimal takes ~0.31 and with stub it's about 0.17 seconds. Now, I know that with minimal & stub runtimes I'm supposed to clean memory manually and according to this, "free all memory at once" would probably work best for me, but I'm not sure how to do that since there is a heap.reset() method in documentation but it seems to be missing in the APIs. Maybe it's deprecated and I'm supposed to use only heap.free(ptr: usize), but I'm not sure how that works? How can I get the ptr to my StaticArray let's say..?

Another possible optimization I was thinking about is to use a less OO approach (maybe starting from Vector3 class) since it seems to me like WASM doesn't play nice with objects so maybe it's worth testing with raw functions even though the original idea was to mimic as much as possible TS version and I don't even know if that might help, maybe you guys have some tips on that?

Sorry if that was too long, I'll leave some more info here if someone has time to take a look and hopefully suggest some further optimization, it would be much appreciated, thanks a lot!

@MaxGraey
Copy link
Member

Hi! AssemblyScript can't optimize simple structures like Vector3 or Color via scalar replacement pass and always use mem allocations in this scenarios. So in critical parts like loops better use scalars manually. For example here instead

this.color.set(
  unchecked(pixels[p    ]),
  unchecked(pixels[p + 1]),
  unchecked(pixels[p + 2])
);

this.color.add(ray.getColor(ray, this.world.objects, this.depth));

unchecked(pixels[p    ] = this.color.xf32);
unchecked(pixels[p + 1] = this.color.yf32);
unchecked(pixels[p + 2] = this.color.zf32);

better to use:

let objects =  this.world.objects;
let depth = this.depth;

for ... {
  for ... {
    let r = unchecked(pixels[p + 0]);
    let g = unchecked(pixels[p + 1]);
    let b = unchecked(pixels[p + 2]);

    let color = ray.getColor(ray, objects, depth);
    r += color.r; 
    g += color.g;
    b += color.b;
    
    unchecked(pixels[p + 0] = r);
    unchecked(pixels[p + 1] = g);
    unchecked(pixels[p + 2] = b);
  }
}

Also note: you can use global asc --uncheckedBehavior always flag instead series of unchecked(...).

I hope this helps.

@UstymUkhman
Copy link
Author

Thanks for the hints! I will definitely check them out, especially scalars whenever possible. ❤️
So basically, what you're saying is that it's better to stick with default incremental runtime and work towards improving performance in this way? Otherwise, if there's a possibility to squeeze some more loops per second by managing memory manually, I would very much like to know how to do so. Thanks again!

@CountBleck
Copy link
Member

The best way to increase performance is to minimize allocations. You can also try to use SIMD where it makes sense.

Also, in the minimal runtime, you should call __collect() from the JS side at a time when a delay is acceptable, and heap.reset() is only for the stub runtime IIRC.

@UstymUkhman
Copy link
Author

Thanks guys! With (a lot) of refactoring, tricks from @MaxGraey and a __collect() call with minimal runtime as suggested by @CountBleck, I was able to run the whole thing at performance level I expected. Now it takes more or less time then the original implementation and sometimes runs even faster. Probably there's room for other optimizations, but for now I'm happy with the result.

By the way, I've also tried to use stub and call heap.reset(). Didn't worked for me 'cause I have some states that should be preserved between frames, but the point is that, as I mentioned above, this method is missing in the APIs. Weird thing, even with the following error, asc did compile and called this method, so I'm guessing this might be a bug related to available / exported types.

heap_reset

@CountBleck
Copy link
Member

Yeah, sometimes // @ts-ignore is your friend.

@MaxGraey
Copy link
Member

MaxGraey commented Sep 1, 2023

Next simple optimization could be replacing all Math (f64) methods to Mathf (f32). Like Math.abs -> Mathf.abs and etc

@UstymUkhman
Copy link
Author

@CountBleck Yeah, but shouldn't it be exported in that interface in the first place?

@MaxGraey Thanks, I guess I'm already doing it. The only function I need for my f32 numbers is clamp, so I have 2 versions of it: one with Math and another one with Mathf with respective min and max methods. 🙂

Or do you mean getting rid of f64 completely? Interesting.., my guess would be that it will reduce memorie usage, but I didn't know it might help with performance as well.

@MaxGraey
Copy link
Member

MaxGraey commented Sep 1, 2023

Or do you mean getting rid of f64 completely?

Yes. Switching to f32 instead f64 and Mathf instead Math everywhere if it's possible and don't cause visual result to degradation.

I guess I'm already doing it

I didn't see this. For example Ray.ts still uses f64
https://github.com/UstymUkhman/ray-tracing/blob/main/assembly/Ray.ts

@UstymUkhman
Copy link
Author

Yeah, I thought you were talking about using Mathf just for i32 numbers already in the program. If switching all floats to use 32 bits might increase performance (per frame), I will definitely try that out! Thanks again! Really appreciate your help.

@MaxGraey
Copy link
Member

MaxGraey commented Sep 1, 2023

Also, instead StaticArray<f64>(3) better to use x, y, z scalar f32 values:
https://github.com/UstymUkhman/ray-tracing/blob/main/assembly/utils/Vector3.ts#L5

Before:

export default class Vector3
{
  private readonly vec: StaticArray<f64> = new StaticArray<f64>(3);

  public constructor (x: f64 = 0.0, y: f64 = x, z: f64 = x) {
    this.vec[0] = x;
    this.vec[1] = y;
    this.vec[2] = z;
  }
  ...
}

After:

export class Vector3 {
   constructor(
     public x: f32 = 0,
     public y: f32 = 0,
     public z: f32 = 0,
   ) {}
   
   ...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants