Skip to content

Add 'fromListWithKey' to HashMap #246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 11, 2020

Conversation

josefs
Copy link
Contributor

@josefs josefs commented Jan 17, 2020

No description provided.

@treeowl
Copy link
Collaborator

treeowl commented Jan 17, 2020

Can we reuse the unsafe function you wrote for unsafeInsertWith, or are there performance regressions?

@josefs
Copy link
Contributor Author

josefs commented Jan 17, 2020

Can we reuse the unsafe function you wrote for unsafeInsertWith, or are there performance regressions?

Yes, I suppose we could do that. I haven't looked in to the performance implications yet though. Will do.

@josefs
Copy link
Contributor Author

josefs commented Jan 17, 2020

Btw, the travis failure seems unrelated to my change.

@josefs
Copy link
Contributor Author

josefs commented Jan 20, 2020

Ok, so I've benchmarked the effect of having both full implementations for unsafeInsertWith and unsafeInsertWithKey compare to defining unsafeInsertWith in terms of unsafeInsertWithKey. Here are the benchmark for the former:
https://pastebin.com/s0BD8Q1G
And here are the benchmark results for defining unsafeInsertWith using unsafeInsertWithKey:
https://pastebin.com/X6FbvpZ3

The numbers change quite a bit, even for unrelated benchmarks. I assume that's due to code layout effects. If anything, the trend seems to be that it's slightly better define unsafeInsertWith using unsafeInsertWithKey rather than having two separate implementations. So I'm going going with that. Yell if you disagree @treeowl .

josefs added a commit to josefs/unordered-containers that referenced this pull request Jan 20, 2020
@sjakobi sjakobi added this to the 0.2.11.0 milestone May 31, 2020
Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josefs Could you address the merge conflicts?

Please also add some examples to the haddocks, ideally in doctest-style, and reference the documentation regarding the order of insertions that was added in #237.

Please also add a few tests. Unit tests will do, I think.

josefs added 4 commits June 2, 2020 08:09
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
It models the test for fromListWith but makes
sure that values are combined in a way that depends
on the key.
@josefs
Copy link
Contributor Author

josefs commented Jun 2, 2020

I've added some more documentation and tests.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @josefs! :)

I have a few more comments, but I think we're on a good track.

Comment on lines 1904 to 1905
-- > combine Mul = (*)
-- > combine Add = (+)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commutative operations make it harder to infer the behaviour of the function from the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? I thought that they would make it easier! The example is supposed to use the difference between addition and multiplication as a way to let the key affect combining the keys.

What would you suggest I use instead?

I suppose I could use something like taking the first or the last element, depending on the key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like

combine k x y = [ k ] ++ x ++ y

?

With a commutative operation you can't tell in which order the elements are being inserted which is an important gotcha here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of combining the key with the values in the way you suggest. That's how it's documented in Data.Map and I specifically tried to avoid it because I think it seems very contrived.

The point about the order of operations is made in the last lines of the documentation. This mirrors the documentation of fromListWith which also has examples which uses commutative operators but then explains below the order of operations. I kind of like that structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the thing is that IMHO fromListWith and fromListWithKey really have this rather serious design mistake where

fromListWith op [(k, a), (k, b)]

is not the obvious

fromList [(k, op a b)]

but instead

fromList [(k, op b a)]

I wish we could fix this properly but that's not a task for this PR.

So IMHO we need to ensure that potential users of fromListWith[Key] become aware of this gotcha ASAP so they do not produce a bug, for which I feel we, as the library authors and maintainers, would share some responsibility!

Currently the only way these haddocks hint at the gotcha is in the last two lines, which IMHO are fairly abstract and easy to overlook TLDR-style.

I think the documentation for fromListWith is less problematic because the first example with the commutative operation is very short, and the next example is very clear about the issue.

So maybe consider using David's free magma suggestion (which I feel is slightly abstract) or maybe just use Sub and Div instead of Add and Mul

josefs added 3 commits June 2, 2020 08:55
The old properties used associative operators to combine
values when there were duplicate keys.
With this diff we're using a non-commutative and
non-associative operator which can catch more bugs.
@treeowl
Copy link
Collaborator

treeowl commented Jun 2, 2020

Free magmas are great for arbitrary operations.

data Magma a
  = Leaf a
  | Op (Magma a) (Magma a)

If you pull up an arbitrary HashMap a and fmap Leaf over it, then you can use Op as your combining function. Op has no structure at all, so any mistake will not get cancelled out anywhere else.

@treeowl
Copy link
Collaborator

treeowl commented Jun 2, 2020 via email

@sjakobi
Copy link
Member

sjakobi commented Jun 2, 2020

@treeowl Yeah, we also discussed tests. There, I'm satisfied with the use of (-) as a non-commutative and non-associative operation.

@josefs You could of course consider using the same or a similar function for the example too!

I've opened #264 to dicuss the fromListWith ergonomics.

This was referenced Jun 2, 2020
@sjakobi
Copy link
Member

sjakobi commented Jun 6, 2020

@josefs I would like to include this PR in the upcoming release (see #266). Do you think you could address the outstanding documentation issue within the next 7 days?

@treeowl I assume you have vetted the internals and are happy with them!?

…ciative operators

Since the combining function is applied in a way that can be
counter-intuitive it's more pedagical to to use operators which
better illustrate this behaviour.
@josefs
Copy link
Contributor Author

josefs commented Jun 6, 2020

@sjakobi I've updated the documentation to use division and subtraction instead to better illustrate the non-obvious way in which values are combined.

I really liked the free magma suggestion from @treeowl so I've converted the tests to use that.

I believe that should take care of all the outstanding issues for this PR so it should be good to go for the next release.

Btw, it might be good to squash at least some of these commits.

Copy link
Member

@sjakobi sjakobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @josefs! :)

@sjakobi sjakobi merged commit 17127e0 into haskell-unordered-containers:master Jun 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants