Add forM #592

hasufell · 2019-01-17T04:17:55Z

I'm not entirely sure if forM s f = fmap fromList . (flip T.forM f) . toList $ s would be faster.

treeowl · 2019-01-17T05:54:20Z

It seems a bit odd to offer the flipped version of mapM and not mapM. And it seems quite odd to offer a version of mapM without offering a version of traverse. The foldr' is extremely surprising; I'd be astonished if that was what you really intended. The toList version should surely tolerate an Applicative constraint.

traverseSet :: (Ord b, Applicative f) => (a -> f b) -> Set a -> f (Set b)
traverseSet f = fmap fromListN . traverse f . toList

I'm not sure whether we can do better than that. The monadic version should probably look like this:

mapMSet :: (Ord b, Monad m) => (a -> m b) -> Set a -> m (Set b)
mapMSet f s0 = foldr go return s0 empty
  where
     go x r s = f x >>= \y -> r $! insert y s

or alternatively perhaps

mapMSet :: (Ord b, Monad m) => (a -> m b) -> Set a -> m (Set b)
mapMSet f = foldM go empty
  where
    go s x = f x >>= \y -> pure $! insert y s

Note: you should definitely email the libraries list with a formal proposal; in particular, naming these things can be a tad tricky. traverse and mapM would probably be "right", but have the potential to break existing code. I don't feel strongly about the names.

hasufell · 2019-02-01T03:51:31Z

I cannot really judge what is more efficient or more performant, since reasoning about those things are not easy in haskell.

treeowl · 2019-02-01T04:21:58Z

The monadic versions I propose insert into the accumulator set as actions are performed, which will likely be considerably more efficient in many cases. Do you understand why the foldr'-based version you proposed does not do so? The toList/fromList version also does not insert incrementally, but it doesn't need a full Monad; do you understand why Applicative isn't sufficient to work incrementally? I don't know much about your Haskell background.

treeowl · 2019-02-01T04:34:54Z

Ah, I see now that you've written some very similar code yourself, in http://hackage.haskell.org/package/hpath-0.9.2/docs/src/System.Posix.Directory.Traversals.html#traverseDirectory

hasufell · 2019-02-08T07:25:37Z

Do you understand why the foldr'-based version you proposed does not do so?

Yes, becauce b in (\a b -> f a >>= (\x -> fmap (insert x) b)) is the recursion, so we are fmapping into an action into an action...

However, if we never care about the Set, but only about the actions, then foldr would be more "efficient" no?

do you understand why Applicative isn't sufficient to work incrementally?

I'm not sure what you mean by that.

hasufell · 2020-01-07T14:01:25Z

ping

treeowl · 2020-01-07T14:07:44Z

You still haven't come up with an implementation I consider reasonable. foldr' still has no business there.

treeowl · 2020-01-07T14:49:09Z

Much better. But you'll still need to propose this on the libraries list to get community input.

hasufell · 2020-01-07T14:51:06Z

Much better. But you'll still need to propose this on the libraries list to get community input.

I did

treeowl · 2020-01-07T14:51:54Z

Can you link to the discussion thread here? Unfortunately I don't remember it.

hasufell · 2020-01-07T14:55:58Z

Can you link to the discussion thread here? Unfortunately I don't remember it.

Actually, not I did, but someone else: https://mail.haskell.org/pipermail/libraries/2019-January/029335.html

hasufell · 2020-01-07T14:58:36Z

I'm also interested in benchmarks and will test a bit I think.

treeowl · 2020-01-07T14:58:59Z

Ah, yes, that "someone else" was actually me. But there wasn't any real discussion, aside from one comment by Henning. Could you try to reignite the conversation?

hasufell · 2020-01-07T15:25:22Z

I compared

forM :: (Ord b, Monad m) => Set a -> (a -> m b) -> m (Set b)
forM s0 f = fmap (GHCExts.fromListN (size s0)) . traverse f . toList $ s0

with

forM :: (Ord b, Monad m) => Set a -> (a -> m b) -> m (Set b)
forM s0 f = foldr go return s0 empty
  where
     go x r s = f x >>= \y -> r $! insert y s

and

forM :: (Ord b, Monad m) => Set a -> (a -> m b) -> m (Set b)
forM s0 f = foldM go empty $ s0
  where
    go s x = f x >>= \y -> pure $! insert y s

test was:

         bench "forM" $ whnf (\s -> runIdentity $ S.forM s (\e -> pure $ e + 1)) s

toList version:

benchmarked forM
time                 125.3 μs   (124.1 μs .. 126.2 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 124.8 μs   (123.9 μs .. 125.7 μs)
std dev              3.237 μs   (2.729 μs .. 4.030 μs)
variance introduced by outliers: 11% (moderately inflated)

foldr version:

benchmarked forM
time                 1.548 ms   (1.544 ms .. 1.553 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.539 ms   (1.535 ms .. 1.543 ms)
std dev              12.80 μs   (10.27 μs .. 16.49 μs)

foldM version:

benchmarked forM
time                 2.591 ms   (2.571 ms .. 2.624 ms)
                     0.998 R²   (0.994 R² .. 0.999 R²)
mean                 2.608 ms   (2.590 ms .. 2.638 ms)
std dev              74.54 μs   (51.70 μs .. 116.5 μs)
variance introduced by outliers: 11% (moderately inflated)

hasufell · 2020-01-07T15:25:43Z

Ah, yes, that "someone else" was actually me. But there wasn't any real discussion, aside from one comment by Henning. Could you try to reignite the conversation?

Given the disinterest, does this justify two threads?

hasufell · 2020-01-07T15:39:58Z

I find the benchmark results interesting and confusing.

treeowl · 2020-01-07T15:46:09Z

What did your actual benchmark look like?

…

On Tue, Jan 7, 2020, 10:40 AM Julian Ospald ***@***.***> wrote: I find the benchmark results interesting and confusing. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#592?email_source=notifications&email_token=AAOOF7IPCECQWP76D6MTI4TQ4SO47A5CNFSM4GQRXBCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIJIURA#issuecomment-571640388>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOOF7P3K7SY2SJ37B6D4BLQ4SO47ANCNFSM4GQRXBCA> .

hasufell · 2020-01-07T15:47:10Z

         bench "forM" $ whnf (\s -> runIdentity $ S.forM s (\e -> pure $ e + 1)) s

treeowl · 2020-01-07T15:49:32Z

You're probably running into the weird special-case optimization in fromList for (strictly?) increasing lists.

hasufell · 2020-01-07T15:55:50Z

bench "forM" $ whnf (\s -> runIdentity $ S.forM s (\e -> pure $ e * 12 `mod` 4)) s

makes the foldr version go faster:

benchmarked forM
time                 173.3 μs   (171.7 μs .. 175.2 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 173.2 μs   (172.5 μs .. 174.2 μs)
std dev              2.743 μs   (2.188 μs .. 3.780 μs)

fromList one is still the fastest.

andreasabel · 2020-01-10T16:30:16Z

whnf only computes the weak head normal form, you need to use nf, the full normal form. (I fell into this trap before, see agda/agda@8dae71a#commitcomment-36008099 ).

andreasabel · 2020-01-10T16:36:36Z

My opinion on this PR:

I think that we should have mapM = traverse rather than two different implementation. Also, the effects should occur in order from the least set element to the greatest. Spec:

mapM f s = traverse f s = Set.fromList <$> do mapM f $ Set.toList s

I think we don't need for; this is anyway a misnamed function in Applicative because the reasonable analogy of forM = flip mapM is for = flip map and this is violated here.

hasufell · 2020-01-10T16:55:24Z

whnf only computes the weak head normal form, you need to use nf, the full normal form. (I fell into this trap before, see agda/agda@8dae71a#commitcomment-36008099 ).

Thanks, will adjust!

* I think that we should have `mapM = traverse` rather than two different implementation.

I did that, because they have different performance characteristics I think. It appears the applicative version is faster. See above. But I don't know in which circumstances. I guess it makes sense to unify those.

* I think we don't need `for`; this is anyway a misnamed function in `Applicative` because the reasonable analogy of `forM = flip mapM` is `for = flip map` and this is violated here.

I prefer consistency here. for is already well known and I use it a lot. I don't think this PR is the place to get rid of historical decisions.

treeowl · 2020-01-10T17:03:40Z

@andreasabel, could you do me a favor and keep the discussion of what to include and by what name to the libraries list? It's useful to have it all in one place and to get more exposure to comments from other users.

hasufell · 2020-01-10T17:21:14Z

containers/src/Data/Set/Internal.hs

+-- Like 'traverse' from Data.Traversable. This is less generic, since 'Set'
+-- does not have a Traversable instance.
+traverse :: (Ord b, Applicative m) => (a -> m b) -> Set a -> m (Set b)
+traverse f s0 = fmap (GHCExts.fromListN (size s0)) . DT.traverse f . toList $ s0


I know there are some subtle differences on whether to use traverse f s0 vs traverse f, e.g. inlining. What's the verdict here? Do we want it half-pointfree or fully pointfree even?

@massudaw any idea?

treeowl · 2020-01-10T17:47:52Z

@hasufell, you should bring back the previous mapM at least long enough to benchmark it using a monad other than Identity! Try ST s, for example.

hasufell · 2020-01-10T17:48:34Z

* I think that we should have `mapM = traverse` rather than two different implementation.  Also, the effects should occur in order from the least set element to the greatest. Spec:

In fact, this breaks ancient GHC versions where Monad is not a superclass of Applicative:
https://travis-ci.org/haskell/containers/jobs/635377304#L737

hasufell · 2020-01-10T18:02:11Z

@treeowl the results are when keeping the two different implementations. This looks significant.

benchmarked forM (Identity)
time                 1.542 ms   (1.536 ms .. 1.549 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.539 ms   (1.536 ms .. 1.546 ms)
std dev              16.83 μs   (10.46 μs .. 28.81 μs)

benchmarked for (Identity)
time                 163.3 μs   (161.7 μs .. 165.0 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 163.8 μs   (162.7 μs .. 165.1 μs)
std dev              3.992 μs   (3.276 μs .. 5.028 μs)

benchmarked forM (ST)
time                 1.758 ms   (1.750 ms .. 1.764 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.765 ms   (1.760 ms .. 1.774 ms)
std dev              23.95 μs   (15.22 μs .. 38.66 μs)

benchmarked for (ST)
time                 277.3 μs   (274.9 μs .. 279.5 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 279.8 μs   (278.0 μs .. 281.2 μs)
std dev              5.724 μs   (4.727 μs .. 7.641 μs)

treeowl · 2020-01-10T18:39:41Z

Please let me see your ST benchmark code, and please include it in the benchmarks you add to the package.

hasufell · 2020-01-10T18:47:17Z

Please let me see your ST benchmark code, and please include it in the benchmarks you add to the package.

ST is already included.

treeowl · 2020-01-10T18:55:04Z

It looks like your ST benchmark produces elements in strictly increasing order, which bakes in an extra advantage for fromListN. Would you mind tweaking it to scatter elements around more to get something more realistic?

treeowl · 2020-01-11T02:08:38Z

So how did the ordering change affect the benchmarks?

hasufell · 2020-01-11T02:21:29Z

So how did the ordering change affect the benchmarks?

Not at all

treeowl · 2020-01-11T02:22:33Z

Okay, so for now we don't need to have mapM, right?

hasufell · 2020-01-11T02:27:13Z

Actually I was lying. The new benchmarks ran with the unified implementation. Here are the benchmarks for the explicit foldr (forM) and the fromListN (for):

benchmarked forM (Identity)
time                 1.538 ms   (1.509 ms .. 1.561 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 1.494 ms   (1.486 ms .. 1.505 ms)
std dev              31.80 μs   (26.01 μs .. 40.72 μs)

benchmarked for (Identity)
time                 171.3 μs   (167.4 μs .. 176.8 μs)
                     0.996 R²   (0.994 R² .. 0.999 R²)
mean                 171.8 μs   (170.5 μs .. 173.4 μs)
std dev              4.873 μs   (3.835 μs .. 6.653 μs)
variance introduced by outliers: 13% (moderately inflated)

benchmarked forM (ST)
time                 9.340 μs   (9.246 μs .. 9.427 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.413 μs   (9.377 μs .. 9.457 μs)
std dev              132.2 ns   (107.1 ns .. 161.1 ns)

benchmarked for (ST)
time                 5.228 μs   (5.134 μs .. 5.343 μs)
                     0.997 R²   (0.996 R² .. 0.999 R²)
mean                 5.285 μs   (5.237 μs .. 5.355 μs)
std dev              202.2 ns   (151.9 ns .. 294.4 ns)

I can't make any sense out of this. Why the forM with Identity is so slow, but the ST one not... totally escapes me.

this was with:

         bench "forM (Identity)" $ nf (\s -> runIdentity $ S.forM s (\e -> pure $ e + 1)) s
        , bench "for (Identity)" $ nf (\s -> runIdentity $ S.for s (\e -> pure $ e + 1)) s
        , bench "forM (ST)" $ nf (\s -> runST $ do
              ref <- newSTRef 0
              S.forM s (\e -> do
                              modifySTRef ref (+ 1)
                              x <- readSTRef ref
                              pure $ x + e * 36 `mod` 11)) (S.map (`mod` 51) s)
        , bench "for (ST)" $ nf (\s -> runST $ do
              ref <- newSTRef 0
              S.for s (\e -> do
                             modifySTRef ref (+ 1)
                             x <- readSTRef ref
                             pure $ x + e * 36 `mod` 11)) (S.map (`mod` 51) s)

hasufell · 2020-01-11T02:32:18Z

I noticed when I change the test implementation to:

         bench "forM (Identity)" $ nf (\s -> runIdentity $ S.forM s (\e -> pure $ e + 1)) s
        , bench "for (Identity)" $ nf (\s -> runIdentity $ S.for s (\e -> pure $ e + 1)) s
        , bench "forM (ST)" $ nf (\s -> runST $ do
              ref <- newSTRef 0
              S.forM s (\e -> do
                              modifySTRef ref (+ 1)
                              x <- readSTRef ref
                              pure $ x + e * 36 `mod` 11)) s
        , bench "for (ST)" $ nf (\s -> runST $ do
              ref <- newSTRef 0
              S.for s (\e -> do
                             modifySTRef ref (+ 1)
                             x <- readSTRef ref
                             pure $ x + e * 36 `mod` 11)) s

Then the results are:

benchmarked forM (Identity)
time                 1.514 ms   (1.495 ms .. 1.536 ms)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 1.496 ms   (1.490 ms .. 1.504 ms)
std dev              22.49 μs   (17.10 μs .. 31.61 μs)

benchmarked for (Identity)
time                 169.9 μs   (168.5 μs .. 171.5 μs)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 169.1 μs   (168.0 μs .. 170.6 μs)
std dev              4.032 μs   (3.006 μs .. 5.988 μs)

benchmarked forM (ST)
time                 1.744 ms   (1.735 ms .. 1.752 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.740 ms   (1.736 ms .. 1.746 ms)
std dev              16.39 μs   (10.13 μs .. 26.54 μs)

benchmarked for (ST)
time                 1.376 ms   (1.370 ms .. 1.380 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.383 ms   (1.378 ms .. 1.390 ms)
std dev              19.05 μs   (13.22 μs .. 26.90 μs)

It seems small changes to the tests have significant impact.

But the forM and for seem to converge at some point.

hasufell · 2020-01-11T02:33:13Z

Okay, so for now we don't need to have mapM, right?

I think it's better to have all 4 functions. Not my decision though.

treeowl · 2020-01-11T02:45:14Z

If it doesn't add functionality and doesn't add performance, then there's a really high bar to adding it. Generally speaking, this is because it makes something really important significantly more convenient (like (!?) as an infix version of lookup) or because it's rather tricky to figure out how to do it most efficiently (like Data.Sequence.intersperse). I'm a bit skeptical about adding any of these functions, but I'm leaning toward letting you have traverse if others don't speak up, since all the work you've done has lessons to teach about how to make this perform well.

hvr · 2020-01-13T07:12:16Z

containers/src/Data/Set.hs

@@ -136,6 +136,12 @@ module Data.Set (
            -- * Map
            , S.map
            , mapMonotonic
+#if __GLASGOW_HASKELL__ >= 710


This is quite confusing and annoying; please avoid APIs exporting things conditional on GHC or base versions

@hvr, don't worry; I won't let it through like that. Please join the discussion on the libraries list about whether to add some of these.

@hvr there are two problems with that. 7.6 and 7.8 don't have Applicative constraint on Monad class and 7.6 does not have fromListN at all, which would result in two different implementations of the same function having different performance characteristics. How do you propose to solve it?

As there is no response from @hvr can you @treeowl give a definite list of required changes that need to be made in order for this to be merged, including how you want multiple GHC support to happen?

hasufell · 2020-07-04T20:06:53Z

No actionable input. I've lost interest in this PR.

sjakobi · 2020-07-15T13:12:17Z

@treeowl Since you apparently don't want to let this die, I've assigned it to you. :)

hasufell · 2021-06-25T18:39:49Z

This is dead. Please open your own PR if you're interested to continue. I'm cleaning up stale PRs every once in a while.

hasufell force-pushed the forM branch 2 times, most recently from 4b996f0 to ff01a54 Compare January 7, 2020 14:29

hasufell force-pushed the forM branch from ff01a54 to c818a45 Compare January 10, 2020 00:25

hasufell commented Jan 10, 2020

View reviewed changes

hasufell force-pushed the forM branch from e261d78 to c979fd5 Compare January 10, 2020 17:34

hasufell force-pushed the forM branch from c979fd5 to e46edf1 Compare January 10, 2020 18:01

hasufell force-pushed the forM branch from 8700f2a to e44aba1 Compare January 10, 2020 18:46

sjakobi mentioned this pull request Jan 11, 2020

GHC backward compatibility policy / support for GHC 7.6 #699

Closed

Add forM, mapM, for and traverse

febaa4d

hasufell force-pushed the forM branch from 2bacfff to febaa4d Compare January 11, 2020 01:25

hvr suggested changes Jan 13, 2020

View reviewed changes

sjakobi mentioned this pull request May 29, 2020

New release for GHC 8.12 #717

Closed

hasufell closed this Jul 4, 2020

treeowl reopened this Jul 4, 2020

sjakobi assigned treeowl Jul 15, 2020

sjakobi added feature-request Set labels Jul 15, 2020

hasufell closed this Jun 25, 2021

Add forM #592

Add forM #592

Conversation

hasufell commented Jan 17, 2019

treeowl commented Jan 17, 2019

hasufell commented Feb 1, 2019

treeowl commented Feb 1, 2019

treeowl commented Feb 1, 2019

hasufell commented Feb 8, 2019

hasufell commented Jan 7, 2020

treeowl commented Jan 7, 2020

treeowl commented Jan 7, 2020

hasufell commented Jan 7, 2020

treeowl commented Jan 7, 2020

hasufell commented Jan 7, 2020

hasufell commented Jan 7, 2020

treeowl commented Jan 7, 2020

hasufell commented Jan 7, 2020 • edited Loading

hasufell commented Jan 7, 2020

hasufell commented Jan 7, 2020

treeowl commented Jan 7, 2020 via email

hasufell commented Jan 7, 2020

treeowl commented Jan 7, 2020

hasufell commented Jan 7, 2020 • edited Loading

andreasabel commented Jan 10, 2020

andreasabel commented Jan 10, 2020

hasufell commented Jan 10, 2020 • edited Loading

treeowl commented Jan 10, 2020

hasufell Jan 10, 2020 • edited Loading

Choose a reason for hiding this comment

hasufell Jan 10, 2020

Choose a reason for hiding this comment

treeowl commented Jan 10, 2020

hasufell commented Jan 10, 2020 • edited Loading

hasufell commented Jan 10, 2020 • edited Loading

treeowl commented Jan 10, 2020

hasufell commented Jan 10, 2020

treeowl commented Jan 10, 2020

treeowl commented Jan 11, 2020

hasufell commented Jan 11, 2020

treeowl commented Jan 11, 2020

hasufell commented Jan 11, 2020 • edited Loading

hasufell commented Jan 11, 2020

hasufell commented Jan 11, 2020

treeowl commented Jan 11, 2020

hvr Jan 13, 2020

Choose a reason for hiding this comment

treeowl Jan 13, 2020

Choose a reason for hiding this comment

hasufell Jan 13, 2020

Choose a reason for hiding this comment

hasufell Jan 31, 2020

Choose a reason for hiding this comment

hasufell Feb 11, 2020

Choose a reason for hiding this comment

hasufell Feb 16, 2020

Choose a reason for hiding this comment

hasufell commented Jul 4, 2020

sjakobi commented Jul 15, 2020

hasufell commented Jun 25, 2021

hasufell commented Jan 7, 2020 •

edited

Loading

hasufell commented Jan 7, 2020 •

edited

Loading

hasufell commented Jan 10, 2020 •

edited

Loading

hasufell Jan 10, 2020 •

edited

Loading

hasufell commented Jan 10, 2020 •

edited

Loading

hasufell commented Jan 10, 2020 •

edited

Loading

hasufell commented Jan 11, 2020 •

edited

Loading