Remove elements from std.ArrayList while iterating over it #3037

Tetralux · 2019-08-09T19:58:33Z

You can now remove an element from a std.ArrayList, while iterating over one.

By ordered-removal:

var itr = list.iterator(); // this is a `std.ArrayList`.
while (itr.next()) |e| {
    // This visits every element, including the one we remove.
    if (e == 2) {
        var removed = itr.orderedRemove();
        assert(removed == e);
    }
}
// Array now contains: [0, 1, 3, 4].  (the 2 is gone.)

By swap-removal:

var itr = list.iterator(); // this is a `std.ArrayList`.
while (itr.next()) |e| {
    // This visits every element, including the one we remove.
    if (e == 2) {
        var removed = itr.swapRemove();
        assert(removed == e);
    }
}
// Array now contains: [0, 1, 4, 3].  (the 2 was replaced with the 4.)

If you only have a *const std.ArrayList, there's .iteratorConst(), which gives you an iterator with only the old behavior - i.e: which cannot remove elements.

Tetralux · 2019-08-10T15:47:14Z

Huh. That's weird.

This should have been a compile error:

https://github.com/Tetralux/zig/blob/24aea24c22627c415d81339749e8a94a7239af1a/std/http/headers.zig#L173

.iteratorConst returns a completely different struct to HttpList.Iterator...

Tetralux · 2019-08-10T16:05:50Z

Note that iteratorConst is defined here:

https://github.com/Tetralux/zig/blob/24aea24c22627c415d81339749e8a94a7239af1a/std/array_list.zig#L278

DutchGhost · 2019-08-11T14:56:01Z

Should't the normal iterator be like it currently is, and then add a removable iterator? Changing the current iterator breaks everywhere its used, while adding a new one doesn't.

Also, perhaps one could consider taking a function in the initialization function of this removing iterator, that has a signature like fn(*T) bool. if the function returns true, the element is removed, otherwise nothing happens.

Tetralux · 2019-08-11T20:21:36Z

@DutchGhost
This PR does not break the previous usage of it.
All it does is add swapRemove and orderedRemove to the iterators; it only adds additional functionality which only has any effect if it is actually used.

If you specifically want only the old behavior---you don't want the user to be able to modify the elements---then you'd call iteratorConst which makes that intention explicit.

As to the predicate idea, I think the way it is here is clearer.

You might want to process all elements in some way, regardless of whether you remove them or not.
Would you then call the predicate twice?
Would you then make a standalone function as the predicate and then call it in the loop body as well?
Would having to go to that length of boilerplate be worth it when you could just call the removal procedures in the body instead?
Passing the predicate to the iterator seems to indicate that you want to remove anything matching that circumstance, when in fact you might only want to remove the first occurrence.
Passing the predicate to the iterator means you have to rename .iterator, which would break things, to make it clear that you want to remove things, not just filter them.

I think the way I have it right now is more flexible, and perfectly clear.

I also asked in the IRC while making this PR as to whether I should make the current situation the default or not (i.e: .iterator, .iteratorMut vs. .iterator, .iteratorConst).
It was brought up that the convention seems to be mutable by default: toSlice(), toSliceConst().

DutchGhost · 2019-08-11T20:33:09Z

if you want to process the elements without removing them, just call the iterator function as it is now. If you don't need removals, why would you get it?
It just increases the struct size.
you wouldn't call the predicate in the loop body, as its called by the iterator in each .next call

I expect an Iterator function to just simply return me an iterator without fancy things.
If I wanted the fancy things, I'd call a method that hints it does a fancy thing alongside iterating, like "iterRemove", or "removing", something along those lines

data-man · 2019-08-11T22:15:17Z

This PR does not break the previous usage of it.

But decreases performance.

Tetralux · 2019-08-12T00:17:13Z

@DutchGhost

just simply return me an iterator without fancy things.

That is the case with PR. It only does "fancy" things (I'd argue that being able to remove an element while iterating over it isn't very fancy) when you call the procedures. It otherwise doesn't do anything different.

If I wanted the fancy things, I'd call a method that hints it does a fancy thing alongside iterating

What you're describing is exactly what this PR does. You literally call the "fancy" procedure alongside iterating.

@data-man

But decreases performance.

I have a patch for that. 😉

Tetralux · 2019-08-12T00:30:39Z

Did some rudimentary performance test of [debug mode] using this branch and compared to master.

iteratorConst is the same performance as master.
iterator is ~1.5% slower than master.

Of course, benchmarking is a dark art, so... pinch of salt, and all that.

const std = @import("std");
const warn = std.debug.warn;
const time = std.time;

pub fn main() !void {
    var a = std.heap.direct_allocator;
    var l = std.ArrayList(usize).init(a);

    var r = std.rand.DefaultPrng.init(32); // Xoroshiro128

    var i: usize = 0;
    while (i < 10000000) : (i += 1) {
        var x = r.next();
        try l.append(x);
    }

    var j: usize = 0;
    var average_sum: f64 = 0;
    while (j < 100) : (j += 1) {
        var timer = try time.Timer.start();
        var itr = l.iterator();
        var sum: usize = 0;
        while (itr.next()) |e| {
            sum +%= e;
        }
        var took = timer.read();
        warn("took {} ns\n", took);
        average_sum += @intToFloat(f64, took);
    }

    var average_time = @floatToInt(u64, std.math.round(average_sum / @intToFloat(f64, j+1)));

    warn("average {}\n", average_time);
}

DutchGhost · 2019-08-12T04:20:35Z

@Tetralux, what I ment was that if I wanted an iterator that is able to do fancy thing, I'd call a method on the ArrayList that suggests the iterator I'm getting from it can do more than just iterating.

With your change, the default iterator would do more than just iterating, which you pay for, even if its just a little. You might argue that you moved the old one to a new struct and new method, but then already existing code has to opt out of your change, and call something new, while if you only added ".removingIter" (or whatever name you'd like) to the list of methods, that isn't the case.

So,

Keep the old Iterator as it was,
Add yours as a new method with a good name (like removingIter)

daurnimator · 2019-08-12T05:26:59Z

std/array_list.zig

+                // the next element's index was incremented when we
+                // called `next` before, so we decrement it here, before we get the item.
+                // NOTE: volatile code! don't use a ternary if, or stack variable for new index here!
+                if (it.removed) {


Avoid this branch by putting the decrement into the remove function.
Then this can be a non-branching it.removed = false.

Interestingly, in debug mode at least, what you suggest is actually slower than the way it currently is.

The rudimentary test above seems to indicate that if you remove the branch, you lose around >5%.
I did move the decrement out of next though. Good call on that.

Tetralux · 2019-08-12T14:22:54Z

.iterator now has the same performance as the old .iterator if you don't remove anything.
If you do remove things, in [release-safe],
it's within 1% of if you did while (i < list.len) : (i += 1) and swapRemoved by index.
In [release-fast], it matches.
orderedRemove appears to be about 1% faster than if you do that kind of loop in both
[debug mode] and [release-fast] mode. pinch of salt

Tetralux · 2019-08-12T15:08:26Z

(Updated comment.)

std/array_list.zig

Tetralux · 2019-08-16T16:44:52Z

Had some trouble figuring out how to make the semantics that removal procedures always remove the last element returned from next without suffering slowdown in [release] modes, but got there in the end.

Unfortauntely, adding that behavior gives [debug mode] around a 7% slowdown.
I'm not sure that I'm happy with that honestly.

Tetralux · 2019-08-16T16:58:29Z

Actually scratch that.
I think I just got unlucky with thread scheduling or something.
After running it a few times, it seems like it actually sped up ~1% with the latest commit so....
🤷‍♂ 🤣

daurnimator · 2019-08-17T00:46:46Z

I'm confused. The change should have just been erroring in remove if next_index == 0. Why is cursor now an optional?

Tetralux · 2019-08-17T13:56:31Z

@daurnimator What you suggest is the first thing I tried.

But as explained above, this is faster for [release-safe] and [debug mode].
This way, the extra check doesn't affect the performance of the routine, as much as I'd like to say that just adding one line is sufficient.

Tetralux · 2019-08-18T13:51:25Z

Rebased to master.

std/array_list.zig

… iterating std.ArrayList.

Add tests for removing the first element

…n 1% of normal removal procedures.

This makes removal procedures alwayts remove the last element returned from next. Before, they'd remove the first element if you called for removal before any calls to next.

Tetralux force-pushed the iterator-remove branch from c506046 to 24aea24 Compare August 9, 2019 20:24

daurnimator added the standard library This issue involves writing Zig code for the standard library. label Aug 10, 2019

Tetralux mentioned this pull request Aug 10, 2019

No compile error when returning incorrect aliased struct #3039

Closed

daurnimator reviewed Aug 12, 2019

View reviewed changes

Tetralux force-pushed the iterator-remove branch from da552ff to a66810f Compare August 12, 2019 13:04

daurnimator suggested changes Aug 13, 2019

View reviewed changes

std/array_list.zig Outdated Show resolved Hide resolved

std/array_list.zig Outdated Show resolved Hide resolved

std/array_list.zig Outdated Show resolved Hide resolved

Tetralux force-pushed the iterator-remove branch from f4d9040 to 63d3bd0 Compare August 13, 2019 20:08

daurnimator reviewed Aug 14, 2019

View reviewed changes

std/array_list.zig Outdated Show resolved Hide resolved

daurnimator added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Aug 14, 2019

Tetralux force-pushed the iterator-remove branch from 74413e3 to e27d1bc Compare August 18, 2019 13:49

andrewrk added this to the 0.5.0 milestone Aug 19, 2019

Tetralux mentioned this pull request Aug 21, 2019

proposal: Streamline loops, and enhance iteration #3110

Closed

andrewrk requested changes Sep 19, 2019

View reviewed changes

std/array_list.zig Outdated Show resolved Hide resolved

std/array_list.zig Outdated Show resolved Hide resolved

std/array_list.zig Outdated Show resolved Hide resolved

std/array_list.zig Outdated Show resolved Hide resolved

andrewrk removed this from the 0.5.0 milestone Sep 19, 2019

andrewrk added this to the 0.6.0 milestone Sep 19, 2019

Tetralux and others added 12 commits September 20, 2019 16:42

Add ability to remove elements, by swapRemove or orderedRemove, while…

582f16c

… iterating std.ArrayList.

Fix typo that should have been a compile error?..

36e8c7d

_Actually_ fix it this time?

e10379d

Make it ~5% faster

d2369be

Fix a typo, and fix bug with removing the first element

75256ee

Add tests for removing the first element

Match old performance when not removing; get removal procedures withi…

392ec68

…n 1% of normal removal procedures.

Cleanup

29db863

Fix comment; and retry worker job in the process

33f17b9

Make it illegal to call remove without calling next

1acafdd

This makes removal procedures alwayts remove the last element returned from next. Before, they'd remove the first element if you called for removal before any calls to next.

Seems reasonable

c740409

Add comment

5195781

Refactor

ddccf82

Tetralux force-pushed the iterator-remove branch from e27d1bc to ddccf82 Compare September 20, 2019 16:46

andrewrk removed this from the 0.6.0 milestone Oct 1, 2019

andrewrk closed this in c3d8b1f Dec 10, 2019

andrewrk mentioned this pull request Dec 16, 2019

Add ArrayList.sort #3917

Closed

Uh oh!

Remove elements from std.ArrayList while iterating over it #3037

Remove elements from std.ArrayList while iterating over it #3037

Uh oh!

Conversation

Tetralux commented Aug 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tetralux commented Aug 10, 2019

Uh oh!

Tetralux commented Aug 10, 2019

Uh oh!

DutchGhost commented Aug 11, 2019

Uh oh!

Tetralux commented Aug 11, 2019

Uh oh!

DutchGhost commented Aug 11, 2019

Uh oh!

data-man commented Aug 11, 2019

Uh oh!

Tetralux commented Aug 12, 2019

Uh oh!

Tetralux commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DutchGhost commented Aug 12, 2019

Uh oh!

daurnimator Aug 12, 2019

Choose a reason for hiding this comment

Uh oh!

Tetralux Aug 12, 2019

Choose a reason for hiding this comment

Uh oh!

Tetralux Aug 12, 2019

Choose a reason for hiding this comment

Uh oh!

Tetralux commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tetralux commented Aug 12, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tetralux commented Aug 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tetralux commented Aug 16, 2019

Uh oh!

daurnimator commented Aug 17, 2019

Uh oh!

Tetralux commented Aug 17, 2019

Uh oh!

Tetralux commented Aug 18, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Tetralux commented Aug 9, 2019 •

edited

Loading

Tetralux commented Aug 12, 2019 •

edited

Loading

Tetralux commented Aug 12, 2019 •

edited

Loading

Tetralux commented Aug 16, 2019 •

edited

Loading