Skip to content

Function is 3X slower than regular JS #1184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nischayv opened this issue Mar 25, 2020 · 10 comments
Closed

Function is 3X slower than regular JS #1184

nischayv opened this issue Mar 25, 2020 · 10 comments

Comments

@nischayv
Copy link

The following function is roughly 3x slower for me compared to regular js for large values. Am I doing something wrong or is there something I can do to speed it up? Any kind of help is appreciated!

function map_page_rank(pages: Array<Int32Array>, page_ranks: Float64Array, maps: Array<Float64Array>, noutlinks: Int32Array, n: i32): void {
  const t1 = performanceNow();
  for (let i=0; i < n; ++i) {
    const outbound_rank = unchecked(page_ranks[i])/unchecked(noutlinks[i]);
    for(let j=0; j < n; ++j) {
      if (unchecked(pages[i][j]) === 0) {
        unchecked(maps[i][j] = 0);
      } else {
        unchecked(maps[i][j] = unchecked(pages[i][j])*outbound_rank);
      }
    }
  }
  const t2 = performanceNow();
  consoleLog(((t2 - t1)/1000).toString());
}
@jtenner
Copy link
Contributor

jtenner commented Mar 25, 2020

Using unchecked() will get you some of the way, but it does appear that JS is good at doing JS things.

You can probably speed up your code a lot by using StaticArray<T> instead of Array<T>, Int32Array and Float64Array

If pages must be resized, you will need regular Array, instead.

@jtenner
Copy link
Contributor

jtenner commented Mar 25, 2020

  for (let i=0; i < n; ++i) {
    let outbound_rank = unchecked(page_ranks[i])/unchecked(noutlinks[i]);
    let map = unchecked(maps[i]); // this will speed you up in the inner for loop

@dcodeIO
Copy link
Member

dcodeIO commented Mar 25, 2020

Would imagine two reasons for the slowdown there: One is the level of indirection of normal and typed arrays, as mentioned by jtenner (using StaticArray can help there), and the other is reference counting overhead inside the loop. For instance let map = unchecked(maps[i]) will retain a reference to the i'th Float64Array within maps, use it and release it again when continuing or breaking. In general it's fair to say that AssemblyScript isn't super fast when it comes to idiomatic TypeScript code like in this snippet and will require two things to become better at it: Wasm GC, so we can get rid of reference counting overhead, and an optimization pass to directize array accesses (for example by representing arrays as multi-value structs, with one value being a direct pointer to the backing buffer).

@nischayv
Copy link
Author

Thanks this helped a lot! let map = unchecked(maps[i]) had a much more significant impact on the performance than StaticArray, but both improved the performance!

@nischayv
Copy link
Author

I had a follow up question with a similar issue. I'm doing a transpose of a matrix, I need to access the j'th element in the inner loop like matrix[j][i], so I can't maintain a reference for the inner loop like the above situation. Is there anything else that can be done to speed it up? Exact code is below. I'm working on building some benchmarks to see how Assemblyscript compiled wasm performs in comparison to js for scientific programming.

class PolarArray {
  constructor(
    public r: Float64Array,
    public i: Float64Array
  ) { }
}

function transpose(m: StaticArray<PolarArray>): void {
  let tempr: f64, tempi: f64;
  const N = m.length;
  
  for(let i = 0; i < N; ++i){
    let r_array = unchecked(m[i].r);
    let i_array = unchecked(m[i].i);

    for(let j = 0; j < i; ++j){
      tempr = unchecked(r_array[j]);
      tempi = unchecked(i_array[j]);

      unchecked(r_array[j] = m[j].r[i]);
      unchecked(i_array[j] = m[j].i[i]);

      unchecked(m[j].r[i] = tempr);
      unchecked(m[j].i[i] = tempi);
    }
  }
}

@nischayv nischayv reopened this Mar 25, 2020
@dcodeIO
Copy link
Member

dcodeIO commented Mar 25, 2020

Looks like the ideal thing to to there is to map the multi dimensional StaticArray<PolarArray> to a one dimensional StaticArray<Float64Array> with two subsequent elements per entry.

@MaxGraey
Copy link
Member

MaxGraey commented Mar 26, 2020

also try cache all matrix elements:

function transpose(m: StaticArray<PolarArray>): void {
  let tempr: f64, tempi: f64;
  const N = m.length;
  
  for(let i = 0; i < N; ++i) {
    let mi   = unchecked(m[i]);
    let mi_r = mi.r;
    let mi_i = mi.i;

    for(let j = 0; j < i; ++j) {
      let mj   = unchecked(m[j]);
      let mj_r = mj.r;
      let mj_i = mj.i;

      tempr = unchecked(mi_r[j]);
      tempi = unchecked(mi_i[j]);

      unchecked(mi_r[j] = mj_r[i]);
      unchecked(mi_i[j] = mj_i[i]);

      unchecked(mj_r[i] = tempr);
      unchecked(mj_i[i] = tempi);
    }
  }
}

@nischayv
Copy link
Author

Caching all elements helped a bit, but the best solution is probably turning it into a 1D array. Before I close the issue, @dcodeIO could you explain this statement a bit more - an optimization pass to directize array accesses (for example by representing arrays as multi-value structs, with one value being a direct pointer to the backing buffer).? How are arrays being represented currently then?

Also regarding the reference counting overhead just to be clear, is the reference to the ith element released after each iteration of j? Appreciate the help!

@dcodeIO
Copy link
Member

dcodeIO commented Apr 13, 2020

How are arrays being represented currently then?

Unless using StaticArray, the representation currently involves one level of indirection when accessing an array element. This leads to two loads, Array#buffer first, then loading the element from that buffer, on each access. Lowering the array struct to multiple values (one we have multi value support), passing these around instead of just a pointer to the array, might help to get rid of indirection, but whether this also has a caveat in actual Wasm engines needs to be investigated.

is the reference to the ith element released after each iteration of j?

I might not be understanding the question fully, but references are not re-retained when entering an inner code block. So, what's referenced in the outer block remains retained during execution of the inner block. Perhaps looking at the generated text format will give a few hints what's exactly happening there. Instructions to look out for are calls to __retain and __release, modifying reference counts. There are usually fewer of those after optimizations, because there is a pass eliminating unnecessary ones.

@nischayv
Copy link
Author

I just misunderstood. I looked at the array code and it makes sense now. Thanks for the help! Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants