Skip to content

Remove elements from std.ArrayList while iterating over it #3037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

Tetralux
Copy link
Contributor

@Tetralux Tetralux commented Aug 9, 2019

You can now remove an element from a std.ArrayList, while iterating over one.

By ordered-removal:

var itr = list.iterator(); // this is a `std.ArrayList`.
while (itr.next()) |e| {
    // This visits every element, including the one we remove.
    if (e == 2) {
        var removed = itr.orderedRemove();
        assert(removed == e);
    }
}
// Array now contains: [0, 1, 3, 4].  (the 2 is gone.)

By swap-removal:

var itr = list.iterator(); // this is a `std.ArrayList`.
while (itr.next()) |e| {
    // This visits every element, including the one we remove.
    if (e == 2) {
        var removed = itr.swapRemove();
        assert(removed == e);
    }
}
// Array now contains: [0, 1, 4, 3].  (the 2 was replaced with the 4.)

If you only have a *const std.ArrayList, there's .iteratorConst(), which gives you an iterator with only the old behavior - i.e: which cannot remove elements.

@daurnimator daurnimator added the standard library This issue involves writing Zig code for the standard library. label Aug 10, 2019
@Tetralux
Copy link
Contributor Author

Huh. That's weird.

This should have been a compile error:

https://github.com/Tetralux/zig/blob/24aea24c22627c415d81339749e8a94a7239af1a/std/http/headers.zig#L173

.iteratorConst returns a completely different struct to HttpList.Iterator...

@Tetralux
Copy link
Contributor Author

@DutchGhost
Copy link

Should't the normal iterator be like it currently is, and then add a removable iterator? Changing the current iterator breaks everywhere its used, while adding a new one doesn't.

Also, perhaps one could consider taking a function in the initialization function of this removing iterator, that has a signature like fn(*T) bool. if the function returns true, the element is removed, otherwise nothing happens.

@Tetralux
Copy link
Contributor Author

@DutchGhost
This PR does not break the previous usage of it.
All it does is add swapRemove and orderedRemove to the iterators; it only adds additional functionality which only has any effect if it is actually used.

If you specifically want only the old behavior---you don't want the user to be able to modify the elements---then you'd call iteratorConst which makes that intention explicit.

As to the predicate idea, I think the way it is here is clearer.

  • You might want to process all elements in some way, regardless of whether you remove them or not.
    Would you then call the predicate twice?
    Would you then make a standalone function as the predicate and then call it in the loop body as well?
    Would having to go to that length of boilerplate be worth it when you could just call the removal procedures in the body instead?
  • Passing the predicate to the iterator seems to indicate that you want to remove anything matching that circumstance, when in fact you might only want to remove the first occurrence.
  • Passing the predicate to the iterator means you have to rename .iterator, which would break things, to make it clear that you want to remove things, not just filter them.

I think the way I have it right now is more flexible, and perfectly clear.

I also asked in the IRC while making this PR as to whether I should make the current situation the default or not (i.e: .iterator, .iteratorMut vs. .iterator, .iteratorConst).
It was brought up that the convention seems to be mutable by default: toSlice(), toSliceConst().

@DutchGhost
Copy link

  1. if you want to process the elements without removing them, just call the iterator function as it is now. If you don't need removals, why would you get it?
    It just increases the struct size.
  2. you wouldn't call the predicate in the loop body, as its called by the iterator in each .next call

I expect an Iterator function to just simply return me an iterator without fancy things.
If I wanted the fancy things, I'd call a method that hints it does a fancy thing alongside iterating, like "iterRemove", or "removing", something along those lines

@data-man
Copy link
Contributor

This PR does not break the previous usage of it.

But decreases performance.

@Tetralux
Copy link
Contributor Author

@DutchGhost

just simply return me an iterator without fancy things.

That is the case with PR. It only does "fancy" things (I'd argue that being able to remove an element while iterating over it isn't very fancy) when you call the procedures. It otherwise doesn't do anything different.

If I wanted the fancy things, I'd call a method that hints it does a fancy thing alongside iterating

What you're describing is exactly what this PR does. You literally call the "fancy" procedure alongside iterating.


@data-man

But decreases performance.

I have a patch for that. 😉

@Tetralux
Copy link
Contributor Author

Tetralux commented Aug 12, 2019

Did some rudimentary performance test of [debug mode] using this branch and compared to master.

iteratorConst is the same performance as master.
iterator is ~1.5% slower than master.

Of course, benchmarking is a dark art, so... pinch of salt, and all that.

const std = @import("std");
const warn = std.debug.warn;
const time = std.time;

pub fn main() !void {
    var a = std.heap.direct_allocator;
    var l = std.ArrayList(usize).init(a);

    var r = std.rand.DefaultPrng.init(32); // Xoroshiro128

    var i: usize = 0;
    while (i < 10000000) : (i += 1) {
        var x = r.next();
        try l.append(x);
    }

    var j: usize = 0;
    var average_sum: f64 = 0;
    while (j < 100) : (j += 1) {
        var timer = try time.Timer.start();
        var itr = l.iterator();
        var sum: usize = 0;
        while (itr.next()) |e| {
            sum +%= e;
        }
        var took = timer.read();
        warn("took {} ns\n", took);
        average_sum += @intToFloat(f64, took);
    }

    var average_time = @floatToInt(u64, std.math.round(average_sum / @intToFloat(f64, j+1)));

    warn("average {}\n", average_time);
}

@DutchGhost
Copy link

@Tetralux, what I ment was that if I wanted an iterator that is able to do fancy thing, I'd call a method on the ArrayList that suggests the iterator I'm getting from it can do more than just iterating.

With your change, the default iterator would do more than just iterating, which you pay for, even if its just a little. You might argue that you moved the old one to a new struct and new method, but then already existing code has to opt out of your change, and call something new, while if you only added ".removingIter" (or whatever name you'd like) to the list of methods, that isn't the case.

So,

  • Keep the old Iterator as it was,
  • Add yours as a new method with a good name (like removingIter)

// the next element's index was incremented when we
// called `next` before, so we decrement it here, before we get the item.
// NOTE: volatile code! don't use a ternary if, or stack variable for new index here!
if (it.removed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid this branch by putting the decrement into the remove function.
Then this can be a non-branching it.removed = false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, in debug mode at least, what you suggest is actually slower than the way it currently is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rudimentary test above seems to indicate that if you remove the branch, you lose around >5%.
I did move the decrement out of next though. Good call on that.

@Tetralux
Copy link
Contributor Author

Tetralux commented Aug 12, 2019

  • .iterator now has the same performance as the old .iterator if you don't remove anything.
  • If you do remove things, in [release-safe],
    it's within 1% of if you did while (i < list.len) : (i += 1) and swapRemoved by index.
    In [release-fast], it matches.
  • orderedRemove appears to be about 1% faster than if you do that kind of loop in both
    [debug mode] and [release-fast] mode. pinch of salt

@Tetralux
Copy link
Contributor Author

(Updated comment.)

@daurnimator daurnimator added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Aug 14, 2019
@Tetralux
Copy link
Contributor Author

Tetralux commented Aug 16, 2019

Had some trouble figuring out how to make the semantics that removal procedures always remove the last element returned from next without suffering slowdown in [release] modes, but got there in the end.

Unfortauntely, adding that behavior gives [debug mode] around a 7% slowdown.
I'm not sure that I'm happy with that honestly.

@Tetralux
Copy link
Contributor Author

Actually scratch that.
I think I just got unlucky with thread scheduling or something.
After running it a few times, it seems like it actually sped up ~1% with the latest commit so....
🤷‍♂ 🤣

@daurnimator
Copy link
Contributor

I'm confused. The change should have just been erroring in remove if next_index == 0. Why is cursor now an optional?

@Tetralux
Copy link
Contributor Author

@daurnimator What you suggest is the first thing I tried.

But as explained above, this is faster for [release-safe] and [debug mode].
This way, the extra check doesn't affect the performance of the routine, as much as I'd like to say that just adding one line is sufficient.

@Tetralux
Copy link
Contributor Author

Rebased to master.

@andrewrk andrewrk removed this from the 0.5.0 milestone Sep 19, 2019
@andrewrk andrewrk added this to the 0.6.0 milestone Sep 19, 2019
@andrewrk andrewrk removed this from the 0.6.0 milestone Oct 1, 2019
@andrewrk andrewrk closed this in c3d8b1f Dec 10, 2019
@andrewrk andrewrk mentioned this pull request Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants