-
Notifications
You must be signed in to change notification settings - Fork 21
foldRight broken for large lists #3295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Imported From: https://issues.scala-lang.org/browse/SI-3295?orig=1 |
@pchiusano said: def foldRight[A,B](z: B)(op: (A,B) => B): B = {
import scala.collection.mutable.ArrayStack
val s = new ArrayStack[A]
foreach(a => s += a)
s.elements.foldLeft(z)((b,a) => op(a,b))
} Here's another implementation that I think could have better gc performance: def foldRight[A,B](z: B)(op: (A,B) => B): B = {
import scala.collection.mutable.ArrayStack
val s = new ArrayStack[A]
foreach(a => s += a)
var r = z
while (!s.isEmpty) { r = op(s.pop, r) }
r
} |
@odersky said: xs.reverse.foldLeft instead of xs.foldRight |
@pchiusano said: Also, can you explain - how does it "confuse the computation model"? Think about what foldRight (the recursive version) is doing - it builds a stack of call frames containing all the elements of the input list, then pops elements from the stack, combining with the current accumulated value. The second version I gave does almost exactly the same thing, except it uses an explicit stack. The suggestion that you should "just use foldLeft + reverse for large sequences" is not a good api decision, IMO. It's much more common for sequence processing code to be generally ignorant of the expected size of input sequences. Incidentally, the standard library could use a version of foldRight where the operator is non-strict (this WOULD actually be a different computation model). Maybe I'll file a separate enhancement for that. :) Note that this was just done in the interactive prompt - I'm not sure if the classes compiled there will get JIT'ed, so more controlled tests should probably be done. scala> def foldRight[A,B](i: Iterable[A], z: B, op: (A,B) => B): B = {
| import scala.collection.mutable.ArrayStack
| val s = new ArrayStack[A]
| i.foreach(a => s += a)
| var r = z
| while (!s.isEmpty) { r = op(s.pop, r) }
| r
| }
foldRight: [A,B](Iterable[A],B,(A, B) => B)B
scala> def time(block: => Any) = {
| for (i <- 1 to 1600) block // trigger JIT on block
| var goodSample = false
| var start = 0L; var stop = 0L;
| var N = 500
| while (!goodSample) {
| start = System.currentTimeMillis
| for (i <- 1 to N) block
| stop = System.currentTimeMillis
| if (stop - start < 500) N = N*4 // require at least half a second for a decent sample
| else goodSample = true
| }
| val perOp = (stop.toDouble - start.toDouble) / N
| perOp * 10000
| }
time: (=> Any)Double
scala> var N = 5000
N: Int = 5000
scala> time(foldRight[Int,Int](0 to N, 0, _ + _))
res8: Double = 2890.0
scala> time((0 to N).foldRight(0)(_ + _))
res9: Double = 5625.0
scala> N = 100
N: Int = 100
scala> time((0 to N).foldRight(0)(_ + _))
res11: Double = 82.96875
scala> time(foldRight[Int,Int](0 to N, 0, _ + _))
res12: Double = 59.84375
scala> N = 50
N: Int = 50
scala> time(foldRight[Int,Int](0 to N, 0, _ + _))
res14: Double = 32.05078125
scala> time((0 to N).foldRight(0)(_ + _))
res15: Double = 41.5625 |
@pchiusano said: Also, in general, what is the policy as far as closing tickets? Closing a ticket without any discussion or agreement from the submitter that the issue has been addressed (or the bug really is invalid, shouldn't be fixed, etc.) is a little discouraging for the submitter. :) |
johnconnor said: Unfortunate Martin did not like that either. As it stands now the function is useless -- using it, is a bug waiting to happen. |
@pchiusano said: |
@odersky said: Generally, I am not super keen on tickets being reopened if there was a reasoned discussion before. Given the number of tickets, and the fact that so far I have fixed personally about one third of all opened tickets, you can do the math, and deduce how much time I have per ticket. So I really prefer using this scarce time fixing real problems than continuing dicussions. I do not mean this to be rude, but it's just the reality of things. |
@pchiusano said: One thing in your latest response that did not make sense to me and makes me think we are not on the same page - you say it won't work for Streams? Why not? Since the operator of foldRight is strict in its second argument, it cannot be computed in constant space even if the collection is non-strict: Stream.from(1).foldRight(Stream[Int]())(Stream.cons(_,_))
java.lang.StackOverflowError
at scala.Stream$$.from(Stream.scala:168)
at scala.Stream$$$$anonfun$$from$$1.apply(Stream.scala:168)
at scala.Stream$$$$anonfun$$from$$1.apply(Stream.scala:168)
at scala.Stream$$cons$$$$anon$$2.tail(Stream.scala:69)
at scala.Stream.foldRight(Stream.scala:433)
at scala.Stream.foldRight(Stream.scala:433)
at scala.Stream.foldRight(Stream.scala:433)
at scala.Stream... So there's really no reason why the default implementation I gave wouldn't work for Streams. As I mentioned earlier, the standard lib could use a version of foldRight where the operator is non-strict in its second argument. This could be computed in constant space for collections with efficient head and tail methods. All right, I'm going to stop trying to convince you. :) If you do change your mind I'd be happy to submit a patch for this and also reduceRight and any other relevant methods. |
@paulp said:
Such a change is among the easiest to do at any time (implementation change without breaking any compat) and for that reason the best thing to do with this and any issues with similar characteristics is wait. The only important thing right now is shipping 2.8. I wish I could get people applying this level of attention to the things in 2.8 which won't be so easy to change later (which is quite a lot of it.) |
@pchiusano said: |
@paulp said: |
@pchiusano said: |
Ben Wing (benwing) said (edited on Aug 13, 2012 8:29:35 PM UTC): I strongly believe we should either (1) preferably, fix this issue as Paul C. proposed; At the very least, clearly document functions that aren't stack-safe, and indicate how to avoid the issue (e.g. through reverse.foldLeft). In reality, although Martin may personally have a clear idea which functions are and aren't tail-recursively-safe, the average Scala programmer has little or no idea. As a simple example, I verified experimentally that appending to the end of a very large list does not cause a stack overflow -- but how could I possibly know that? My intuition tells me that the most straightforward implementation will be recursive in a non tail-recursive way. In general, programmers should not have to be aware of implementation details like this, and on top of this, currently there's no documentation as to which functions are stack-safe and which ones aren't. BTW, I see that others have run into this same issue: http://oldfashionedsoftware.com/2009/07/10/scala-code-review-foldleft-and-foldright/ http://stackoverflow.com/questions/4085118/why-foldright-and-reduceright-are-not-tail-recursive |
@paulp said: |
Ben Wing (benwing) said: |
@SethTisue said: |
Is there a good reason not to implement l.foldRight(z)(f) as l.reverse.foldLeft(z)(flip(f)), or some variation? This would avoid the stack overflow that results when using foldRight with large sequences. As it is implemented, the function is not very useful except for toy examples.
The text was updated successfully, but these errors were encountered: