Parallel indexed iterator #279

msiglreith · 2017-03-21T16:38:39Z

IndexedIter and IndexedIterMut should have parallel iterator support.

Often it's desirable to have the element coordinates additionally to the element itself for accessing fields in other arrays. Example use case (part of fluid simulation code):

div.indexed_iter_mut().map(|((y, x), div)| {
    *div = -(vel.x[(y, x+1)] - vel.x[(y, x)] + vel.y[(y+1, x)] - vel.y[(y, x)]);
});

I quickly tried to implement it myself, similar to the AxisIter implementation, but unsure how to properly split the iterator.

The text was updated successfully, but these errors were encountered:

bluss · 2017-03-21T21:51:52Z

Right. I think the indexed iterator is already slow, I fear.

mokasin · 2017-03-23T15:40:20Z

Related to that, I'm wondering how to compute something like a gradient on a field efficiently in parallel. This means, I need to assign a value to every array element, that depends on values in neighbouring indices. I tried this very (silly)naive approach, here simplified to 1D:

extern crate rayon;
extern crate ndarray;

use rayon::prelude::*;
use ndarray::Array;

fn main() {
    let a = Array::from_elem(10, 1);
    let mut b = Array::zeros(10);

    (1usize..8)
        .into_par_iter()
        .map(|i|
             b[i] = a[i+1] - a[i-1]
        );

    println!("array: {:?}", b);
}

which fails miserably:

error[E0387]: cannot borrow data mutably in a captured outer variable in an `Fn` closure
  --> src/main.rs:14:14
   |
14 |              b[i] = a[i+1] - a[i-1]
   |              ^
   |
help: consider changing this closure to take self by mutable reference
  --> src/main.rs:13:14
   |
13 |           .map(|i|
   |  ______________^ starting here...
14 | |              b[i] = a[i+1] - a[i-1]
   | |___________________________________^ ...ending here

I'd like to do something like this for an n-dimensional array. Besides the point, that this might split very cache unfriendly: How to do that?

bluss · 2017-03-23T16:38:37Z

Right and I would shoot for vectorization (SIMD parallelization) before thread based parallelization.

@mokasin We want to add a parallel version of array zips, which should address this on a basic level. For 1D we can of course use collect_into on a Vec and then make an array, since Vec → Array conversion has no allocation/copy cost.

bluss · 2017-03-23T17:57:18Z

The example here is not about threading, but it shows an example that can inspire a solution that computes the gradients using array wide operations. That will achieve some limited degree of simd parallelization at least.

https://bluss.github.io/rust-ndarray/master/ndarray/macro.s.html

bluss · 2017-03-28T17:17:34Z

The n-ary Zip work brings some new perspectives. It's not very simple to split an index iterator, but we can split a producer (Think of it like an IntoIterator, it doesn't have the problem of having a partially iterated state).

bluss · 2017-03-31T18:34:43Z

Ok, Zip now exists.

@msiglreith Let's try the idea that we don't need indices! The mapping x, y → y, x is a transpose, so that's easy. And with some slicing we can get both the current and the next element along an axis. Something like the following code. Maybe one should double check that the computation is correct, and also think hard about how to handle borders.

One can play with the memory layouts here to have it perform well. Div and vx, vy need to have opposite layouts since we access vx, vy transposed. I picked vx, vy to have f order, so that the sliced transposed arrays have contiguous rows, which is the case ndarray optimizes for by preference (It will be more open minded in the future)

Zip parallelization is basically ready, but needs rayon-git: #288 (It's possible I'll merge this soon enough, but can't publish before rayon has a new version.) Using parallelization is as simple as switching from apply to par_apply, when ndarray-parallel has the changes.

extern crate rand;
#[macro_use(s, azip)]
extern crate ndarray;
extern crate ndarray_rand;

use ndarray::prelude::*;
use ndarray_rand::RandomExt;
use rand::distributions::Range;
use ndarray::Zip;

fn main() {
    type Arrayf64<D> = Array<f64, D>;
    const N: usize = 10;
    let vx = Arrayf64::random((N, N).f(), Range::new(0., 100.));
    let vy = Arrayf64::random((N, N).f(), Range::new(0., 100.));
    println!("{:8.3}", vx);
    println!("{:8.3}", vy);
    let mut div = Arrayf64::zeros((N - 1, N - 1));

    Zip::from(&mut div)
        .and(vx.slice(s![..-1, ..-1]).t())
        .and(vx.slice(s![1.., ..-1]).t())
        .and(vy.slice(s![..-1, ..-1]).t())
        .and(vy.slice(s![..-1, 1..]).t())
        .apply(|div, &vx1, &vx2, &vy1, &vy2| {
            *div = -(vx1 - vx2 + vy1 - vy2);
        });
    println!("{:8.3}", div);
}

bluss · 2017-03-31T18:51:07Z

@mokasin We're going to add a window producer/iterable but I actually think something array wide is much faster that iterating the array in windows around each element.

Something array wide would indeed be something like:

b += a[2..] - a[..-2]   // pseudocode because Rust slicing is not this nice

msiglreith · 2017-03-31T19:42:36Z

Great work!
Tried it in my code and seems to work fine (no visual difference in the resulting flow).

ndarray::Zip::from(div)
        .and(vel.x.slice(s![.., 1..]))
        .and(vel.x.slice(s![.., ..-1]))
        .and(vel.y.slice(s![1.., ..]))
        .and(vel.y.slice(s![..-1, ..]))
        .apply(|div, &vx1, &vx2, &vy1, &vy2| {
            *div = -(vx1 - vx2 + vy1 - vy2);
        });

I'm using a staggered grid, so boundary conditions are not really an issue, expect I needed to adjust the dimensions of the grids (vel.x and vel.y are larger than div). Therefore I added the additional slice operations .slice(s![.., ..-1]) and .slice(s![..-1, ..]).

A drawback of this approach is slightly less readability imo. I had to read up the documentation first to grasp the whole concept and even tough direct indexing is easier to read.
Currently most of my code is based on tuple based indexing, some computational heavy parts are optimized by direct array index calculation. Zip based approach might serve as intermediate optimization step.

Divergence calculation isn't a performance heavy step for my fluid simulation. Even tough, is there an overhead for using this zip based approach in comparison to array indexing?
Direct indexing sample:

let (h, w) = div.dim();

let mut div = div.as_slice_mut().unwrap();
let mut vx = vel.x.as_slice().unwrap();
let mut vy = vel.y.as_slice().unwrap();

let mut idx = 0;
    
for y in 0 .. h {
   for x in 0 .. w {
       div[idx] = -(vx[idx+y+1] - vx[idx+y] + vy[idx+w] - vy[idx]);
       idx += 1;
     }
 }

Thanks again for your work on this issue, I appreciate it!

bluss · 2017-03-31T19:50:40Z

Zip is created so that we have a safe Rust way to do this efficiently. It looks at the memory layouts of the inputs and tries to do its best. (If all inputs are C/F contiguous it uses a linear index through all array memories, otherwise it “unrolls” the innermost dimension and uses each array's stride × an index to loop over it).

That means if you take care with the memory layouts (see previous message about that), Zip should always win. There are no bounds checks, like there will be in your slice indexing example. I'm pretty sure the bounds checks don't optimize out in your example (but I would be happy to be positively surprised).

Note that this:

Zip::from(&mut div)
        .and(vx.slice(s![..-1, ..-1]).t())
        .and(vx.slice(s![1.., ..-1]).t())
        .and(vy.slice(s![..-1, ..-1]).t())
        .and(vy.slice(s![..-1, 1..]).t())
        .apply(|div, &vx1, &vx2, &vy1, &vy2| {
            *div = -(vx1 - vx2 + vy1 - vy2);
        })

is equivalent to:

div -= vx.slice(s![..-1, ..-1].t();
div += vx.slice(s![1.., ..-1]).t();
div -= vy.slice(s![..-1, ..-1]).t();
div += vy.slice(s![..-1, 1..]).t();

But I don't have a parallel version of the latter!

bluss · 2017-04-06T23:29:38Z

Zip::indexed exists and is parallelized in master, just not in a proper (non-alpha) release yet. It enters there, rather than among the iterators. We're expanding and extending the NdProducer concept now. Splitting is something that a producer or intoiterator can implement, not iterator, is our current idea.

bluss · 2017-04-06T23:30:17Z

Example use with rayon https://github.com/bluss/rust-ndarray/blob/994df3d2129da0f17af8db68129a747b5fc08297/parallel/tests/zip.rs#L65-L83

msiglreith · 2017-04-07T00:16:11Z

Amazing work! Thanks for the quick implementation, will play around with it in the next days :)

Renmusxd · 2022-05-25T22:51:22Z

Have there been any updates on the IndexedIter and IndexedIterMut having parallelization support?

bluss · 2022-05-25T23:23:12Z

rayon support is through Zip, see the previous comment #279 (comment)

bluss added the enhancement label Mar 23, 2017

bluss mentioned this issue Apr 1, 2017

Update dependencies lumol-org/lumol#130

Merged

bluss self-assigned this Apr 5, 2017

This was referenced Apr 6, 2017

Index Producer (Parallelization); Lanes rename, More docs #305

Merged

Rayon parallelization for Zip / azip! #288

Merged

bluss closed this as completed in #288 Apr 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallel indexed iterator #279

Parallel indexed iterator #279

msiglreith commented Mar 21, 2017

bluss commented Mar 21, 2017

Uh oh!

mokasin commented Mar 23, 2017 •

edited

Loading

Uh oh!

bluss commented Mar 23, 2017

Uh oh!

bluss commented Mar 23, 2017

Uh oh!

bluss commented Mar 28, 2017

Uh oh!

bluss commented Mar 31, 2017 •

edited

Loading

Uh oh!

bluss commented Mar 31, 2017 •

edited

Loading

Uh oh!

msiglreith commented Mar 31, 2017

Uh oh!

bluss commented Mar 31, 2017 •

edited

Loading

Uh oh!

bluss commented Apr 6, 2017

Uh oh!

bluss commented Apr 6, 2017

Uh oh!

msiglreith commented Apr 7, 2017

Uh oh!

Renmusxd commented May 25, 2022

Uh oh!

bluss commented May 25, 2022

Uh oh!

Parallel indexed iterator #279

Parallel indexed iterator #279

Comments

msiglreith commented Mar 21, 2017

bluss commented Mar 21, 2017

Uh oh!

mokasin commented Mar 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bluss commented Mar 23, 2017

Uh oh!

bluss commented Mar 23, 2017

Uh oh!

bluss commented Mar 28, 2017

Uh oh!

bluss commented Mar 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bluss commented Mar 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msiglreith commented Mar 31, 2017

Uh oh!

bluss commented Mar 31, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bluss commented Apr 6, 2017

Uh oh!

bluss commented Apr 6, 2017

Uh oh!

msiglreith commented Apr 7, 2017

Uh oh!

Renmusxd commented May 25, 2022

Uh oh!

bluss commented May 25, 2022

Uh oh!

mokasin commented Mar 23, 2017 •

edited

Loading

bluss commented Mar 31, 2017 •

edited

Loading

bluss commented Mar 31, 2017 •

edited

Loading

bluss commented Mar 31, 2017 •

edited

Loading