-
Notifications
You must be signed in to change notification settings - Fork 329
Parallel indexed iterator #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Right. I think the indexed iterator is already slow, I fear. |
Related to that, I'm wondering how to compute something like a gradient on a field efficiently in parallel. This means, I need to assign a value to every array element, that depends on values in neighbouring indices. I tried this very (silly)naive approach, here simplified to 1D: extern crate rayon;
extern crate ndarray;
use rayon::prelude::*;
use ndarray::Array;
fn main() {
let a = Array::from_elem(10, 1);
let mut b = Array::zeros(10);
(1usize..8)
.into_par_iter()
.map(|i|
b[i] = a[i+1] - a[i-1]
);
println!("array: {:?}", b);
} which fails miserably:
I'd like to do something like this for an n-dimensional array. Besides the point, that this might split very cache unfriendly: How to do that? |
Right and I would shoot for vectorization (SIMD parallelization) before thread based parallelization. @mokasin We want to add a parallel version of array zips, which should address this on a basic level. For 1D we can of course use collect_into on a Vec and then make an array, since Vec → Array conversion has no allocation/copy cost. |
The example here is not about threading, but it shows an example that can inspire a solution that computes the gradients using array wide operations. That will achieve some limited degree of simd parallelization at least. https://bluss.github.io/rust-ndarray/master/ndarray/macro.s.html |
The n-ary Zip work brings some new perspectives. It's not very simple to split an index iterator, but we can split a producer (Think of it like an IntoIterator, it doesn't have the problem of having a partially iterated state). |
Ok, Zip now exists. @msiglreith Let's try the idea that we don't need indices! The mapping x, y → y, x is a transpose, so that's easy. And with some slicing we can get both the current and the next element along an axis. Something like the following code. Maybe one should double check that the computation is correct, and also think hard about how to handle borders. One can play with the memory layouts here to have it perform well. Div and vx, vy need to have opposite layouts since we access vx, vy transposed. I picked vx, vy to have f order, so that the sliced transposed arrays have contiguous rows, which is the case ndarray optimizes for by preference (It will be more open minded in the future) Zip parallelization is basically ready, but needs rayon-git: #288 (It's possible I'll merge this soon enough, but can't publish before rayon has a new version.) Using parallelization is as simple as switching from apply to par_apply, when ndarray-parallel has the changes. extern crate rand;
#[macro_use(s, azip)]
extern crate ndarray;
extern crate ndarray_rand;
use ndarray::prelude::*;
use ndarray_rand::RandomExt;
use rand::distributions::Range;
use ndarray::Zip;
fn main() {
type Arrayf64<D> = Array<f64, D>;
const N: usize = 10;
let vx = Arrayf64::random((N, N).f(), Range::new(0., 100.));
let vy = Arrayf64::random((N, N).f(), Range::new(0., 100.));
println!("{:8.3}", vx);
println!("{:8.3}", vy);
let mut div = Arrayf64::zeros((N - 1, N - 1));
Zip::from(&mut div)
.and(vx.slice(s![..-1, ..-1]).t())
.and(vx.slice(s![1.., ..-1]).t())
.and(vy.slice(s![..-1, ..-1]).t())
.and(vy.slice(s![..-1, 1..]).t())
.apply(|div, &vx1, &vx2, &vy1, &vy2| {
*div = -(vx1 - vx2 + vy1 - vy2);
});
println!("{:8.3}", div);
} |
@mokasin We're going to add a window producer/iterable but I actually think something array wide is much faster that iterating the array in windows around each element. Something array wide would indeed be something like: b += a[2..] - a[..-2] // pseudocode because Rust slicing is not this nice |
Great work! ndarray::Zip::from(div)
.and(vel.x.slice(s![.., 1..]))
.and(vel.x.slice(s![.., ..-1]))
.and(vel.y.slice(s![1.., ..]))
.and(vel.y.slice(s![..-1, ..]))
.apply(|div, &vx1, &vx2, &vy1, &vy2| {
*div = -(vx1 - vx2 + vy1 - vy2);
}); I'm using a staggered grid, so boundary conditions are not really an issue, expect I needed to adjust the dimensions of the grids (vel.x and vel.y are larger than div). Therefore I added the additional slice operations A drawback of this approach is slightly less readability imo. I had to read up the documentation first to grasp the whole concept and even tough direct indexing is easier to read. Divergence calculation isn't a performance heavy step for my fluid simulation. Even tough, is there an overhead for using this zip based approach in comparison to array indexing? let (h, w) = div.dim();
let mut div = div.as_slice_mut().unwrap();
let mut vx = vel.x.as_slice().unwrap();
let mut vy = vel.y.as_slice().unwrap();
let mut idx = 0;
for y in 0 .. h {
for x in 0 .. w {
div[idx] = -(vx[idx+y+1] - vx[idx+y] + vy[idx+w] - vy[idx]);
idx += 1;
}
} Thanks again for your work on this issue, I appreciate it! |
Zip is created so that we have a safe Rust way to do this efficiently. It looks at the memory layouts of the inputs and tries to do its best. (If all inputs are C/F contiguous it uses a linear index through all array memories, otherwise it “unrolls” the innermost dimension and uses each array's stride × an index to loop over it). That means if you take care with the memory layouts (see previous message about that), Zip should always win. There are no bounds checks, like there will be in your slice indexing example. I'm pretty sure the bounds checks don't optimize out in your example (but I would be happy to be positively surprised). Note that this: Zip::from(&mut div)
.and(vx.slice(s![..-1, ..-1]).t())
.and(vx.slice(s![1.., ..-1]).t())
.and(vy.slice(s![..-1, ..-1]).t())
.and(vy.slice(s![..-1, 1..]).t())
.apply(|div, &vx1, &vx2, &vy1, &vy2| {
*div = -(vx1 - vx2 + vy1 - vy2);
}) is equivalent to: div -= vx.slice(s![..-1, ..-1].t();
div += vx.slice(s![1.., ..-1]).t();
div -= vy.slice(s![..-1, ..-1]).t();
div += vy.slice(s![..-1, 1..]).t(); But I don't have a parallel version of the latter! |
|
Amazing work! Thanks for the quick implementation, will play around with it in the next days :) |
Have there been any updates on the |
rayon support is through Zip, see the previous comment #279 (comment) |
IndexedIter
andIndexedIterMut
should have parallel iterator support.Often it's desirable to have the element coordinates additionally to the element itself for accessing fields in other arrays. Example use case (part of fluid simulation code):
I quickly tried to implement it myself, similar to the
AxisIter
implementation, but unsure how to properly split the iterator.The text was updated successfully, but these errors were encountered: