Skip to content

Add new iterator adapator .filter_scan() #14425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lilyball
Copy link
Contributor

filter_scan() is like scan() but it skips None values. It also has a
method .chain_state() that adds one final iteration value by mapping the
state through a closure.

filter_scan() is like scan() but it skips None values. It also has a
method .chain_state() that adds one final iteration value by mapping the
state through a closure.
@alexcrichton
Copy link
Member

How is this different than foo.iter().filter(...).scan(...)?

The extra value added can't be expressed today, but why was it added for just one iterator and no others?

@lilyball
Copy link
Contributor Author

lilyball commented Jun 2, 2014

@alexcrichton foo.iter().filter().scan() cannot do what filter_scan() does, because it cannot adjust its filtering based on the current state, and conversely, it can't adjust its state based on things it filtered.

I added filter_scan() because I thought this is how scan() was supposed to behave and was quite surprised when it doesn't do this. Note that I had to fix the doc comment for scan() in my commit. I checked and the current behavior of Scan is being explicitly relied-upon, so we can't change it.

filter_scan() allows building custom once-off iterators easily without requiring brand new types. See the example that uses it to build a simple tokenizer to extract words from a string (this example was directly inspired by someone on IRC who needed this, and that's how I discovered the current behavior of scan()).

@lilyball
Copy link
Contributor Author

lilyball commented Jun 2, 2014

As for chain_state(), I added that because I could, and because it turns out the tokenizer example needs it. It could probably be added to scan() as well but I didn't want to deal with that in this commit, because it's not immediately obvious how to behave there. Would it run the chain_state() closure the first time it returns None from the function? Or would it run it once the underlying iterator returns None? The latter seems like the obvious choice, but doing it that way means your scan() iterator can return None without ever chaining the state (if the contained closure returns a None). Because of this issue, I felt it was better to add separately if desired.

Another option is to build a separate iterator adaptor that chains a single result by executing a closure on a &mut reference to the contained iterator, once the contained iterator returns None. This would let you do the same thing as chain_state(), but I don't know if it would really be useful for any other iterators. Also if it did this I'd perhaps want to be able to chain multiple times, or to be able to return an Iterator from the closure and have it continue running that iterator. It's worth thinking about, but again it's not entirely straightforward as to what the correct solution is.

@erickt
Copy link
Contributor

erickt commented Jun 3, 2014

There's a lot of similarity with this approach and my .batch() iterator from #14271. For fun I ported over your example to it, and it results in pretty much the same code: https://gist.github.com/erickt/ae99de71309c3a194830. I would lean towards batch for this instance, as I feel it's a bit more general purpose. It's also a teensy bit faster, but that's probably due to the #11084 bug.

@alexcrichton
Copy link
Member

Closing due to inactivity, but feel free to reopen again.

@lilyball
Copy link
Contributor Author

@alexcrichton I still think this is useful. All it's waiting on is a review and/or a comparison with @erickt's .batch(). I agree that .batch() can fill the same role, but I do not know which approach is better.

FWIW, .batch() appears to be completely standalone, despite being part of a PR that includes other functionality. I think either filter_scan() or batch() should be accepted.

@lilyball lilyball reopened this Jun 30, 2014
@nielsle
Copy link
Contributor

nielsle commented Jul 9, 2014

AFAICS you can almost solve your code example using scan and filtermap, but I didn't manage to get the last word, and I don't know how well this will be optimized.

fn main() {
    let s = "this is a sentence";
    let mut it = s.char_indices().scan(0u, |i1, (i2, c)| {
        if c.is_whitespace() {
            let word = s.slice(*i1, i2);
            *i1 = i2+1;
            Some(Some(word))
        } else {
            Some(None)
        }
    }).filter_map(|x| x);
    assert_eq!(it.next(), Some("this"));
    assert_eq!(it.next(), Some("is"));
    assert_eq!(it.next(), Some("a"));
    // assert_eq!(it.next(), Some("sentence")); # NOT WORKING
    assert_eq!(it.next(), None);
}

Perhaps chain_state could be renamed to finally and added as a method to the normal ScanIterator. Then the code example could be changed into something like the following

let mut it = s.char_indices()
              .scan(0u, |i1, (i2, c)| {
                         ....  // Same code as above
              })
              // last word
              .finally(|i1, (i2, _)|  Some(Some( s.slice(*i1, i2+1))))
              .filter_map(|x| x);

Alternatively you can add a whitespace at the end of the string, but that feels like a hack.

EDIT: Ideally a Peekable iterator should allow you to peek ahead inside a for-loop. That would would solve the problem of getting the last word. Perhaps i2 could have a method named peek(), but also coerce to uint.

@brson
Copy link
Contributor

brson commented Jul 14, 2014

Closing. Between this PR and #14271, this subject has been closed several times with little agreement. Please agree on a design and convince others that it is needed.

@brson brson closed this Jul 14, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants