Vulnerability to collision attacks #319

New issue

Open

Labels

collision-attacksperformancepitfall

sjakobi

opened

on Sep 13, 2021

Member

@NorfairKing's blog post describes a HashDoS attack on aeson which is enabled by u-c's O(n) handling of hash collisions and use of the same default salt for all hashing operations.

While possible mitigation measures for aeson are discussed in haskell/aeson#864, u-c should also prepare a proper response and possibly implement security features for users who are affected via aeson or in other ways.

In particular, I hope to find answers for the following questions in this thread (and possibly in separate sub-threads):

What mitigation measures can affected u-c users enable in the short term?
Is security against collision attacks a design goal for u-c?
2a. If yes, to what extent should we trade performance and API bloat for security features?
What mitigation measures should be implemented in u-c?

I'd also like to point out that I have very limited knowledge and experience with security issues, so I'd be very grateful if more experienced people could chime in and share their advice. :)

added

mentioned this

WIP: Collisionless containers (DO NOT MERGE) #217

sjakobi

MemberAuthor

What mitigation measures can affected u-c users enable in the short term?

A few ideas – maybe some more experienced people can comment whether these are any good:

Switch to a collision-resistant hash function. I'm aware of SipHash, but there may be more suitable hash functions. This should make it harder for an attacker to produce colliding keys.
Build hashable with -frandom-init-seed, to make it slightly harder to produce colliding keys. The hashable maintainer doesn't recommend this, but it should be somewhat useful anyway. See this discussion on r/haskell.
Limit the size of any HashMaps and HashSets. When n is small, O(n) operations are less problematic.
Use Data.Map and Data.Set from containers instead of this package. These use the Ord methods for performing lookups and insertions, and therefore aren't vulnerable to collision attacks.
Experimental: Use the hashmap package which relies on Data.Map for storing any collisions. Note that this package doesn't offer a Strict API that ensures that map values are evaluated to WHNF. Maybe @RyanGlScott, @augustss or other users can comment in which cases this package is a suitable replacement for u-c.

EDIT 2020-10-03:

Build hashable with -frandom-init-seed, to make it slightly harder to produce colliding keys. The hashable maintainer doesn't recommend this, but it should be somewhat useful anyway. See [this discussion on r/haskell]

Note that a random hash salt has very limited security benefits as long as a weak hash function like FNV is used. For FNV, it is possible to construct multi-collisions that collide no matter what salt they are hashed with: https://medium.com/@robertgrosse/generating-64-bit-hash-collisions-to-dos-python-5b21404a5306

dhess

Thanks for looking into this!

My only bit of feedback is this: because there are existing, deployed services that are vulnerable to this attack, and because shutting those services down or replacing them with something that doesn't use aeson is not feasible, a timely short-term fix is just as important as whatever long-term, proper fix the various parties come up with.

In other words, please let's not let perfect be the enemy of good-enough-for-now here. Even something that makes producing collisions slightly more difficult is helpful at this stage, IMO.

I'm encouraged that you've jumped right into mitigations in your first 2 posts, so it looks like this discussion is off to a great start!

brandon-leapyear

mentioned this

on Sep 13, 2021

Support configuring default salt hashable#218

NorfairKing

In other words, please let's not let perfect be the enemy of good-enough-for-now here.

-frandom-init-seed is good enough for the very short term (AFAICT)
The collisionless containers approach solves the specific exploit that I've built, but doesn't guarantee that there's no other cheap way to produce an exploit.

treeowl

Collaborator

Is there a way to mitigate the performance degradation of a random seed? How bad does it measure out?

brandon-leapyear

✨ This is an old work account. Please reference @brandonchinn178 for all future communication ✨

I also opened this issue for another possible mitigation: haskell-unordered-containers/hashable#218

Is there a way to mitigate the performance degradation of a random seed? How bad does it measure out?

IIUC it's not that a random seed will reduce performance, it's that one would get different hash values every time one restarts the application

NorfairKing

More ideas:

IMHO we should also seriously consider the collisionless containers idea (in the long term)
A createHashmap :: IO (HashMap k v) api where the hashmap stores its own (randomly generated) salt

brandon-leapyear

✨ This is an old work account. Please reference @brandonchinn178 for all future communication ✨

+1 to the createHashMap idea, in addition to createHashMapWith :: Int -> HashMap k v to get deterministic seeds (e.g. for getting deterministic results for tests)

ysangkok

Why should the function be in IO? We already have a class that captures needing random state, it is RandomGen. A PRNG would be good enough to solve this problem, real randomness is not needed.

treeowl

Collaborator

I also opened this issue for another possible mitigation: haskell-unordered-containers/hashable#218

Is there a way to mitigate the performance degradation of a random seed? How bad does it measure out?

IIUC it's not that a random seed will reduce performance, it's that one would get different hash values every time one restarts the application

A random seed will most likely be implemented something like this:

the_random_seed :: Seed
the_random_seed = unsafePerformIO ...
{-# NOINLINE the_random_seed #-}

This means that every (initial) seed access has to check a tag and follow a pointer. The seed will almost always be in L1 cache when heavy HashMap use is happening, but we should check that it's not too bad to access it.

sjakobi

MemberAuthor

Here's some earlier discussion with @tibbe and @infinity0 on mitigating collision attacks: #265 (comment)

The gist is that in order to make it sufficiently hard for an attacker to produce hash collisions, you need both a strong hash function like SipHash and a random seed.

The problem with SipHash is that apparently it's so slow that you might as well switch to Data.Map – at least that's what was mentioned in our internal discussions. Nevertheless, a hashable patch that uses SipHash for the Text instance seems like a reasonable short-term mitigation measure when combined with -frandom-init-seed.

Regarding the proposed fix in #217, @tibbe's assessment was that it would still require a strong hash function and a random seed to be reasonably secure. With many weaker hash functions, it is possible to generate seed-independent collisions, see e.g. this blog post. By this assessment, #217 "adds" little security of its own.

sjakobi

MemberAuthor

A createHashmap :: IO (HashMap k v) api where the hashmap stores its own (randomly generated) salt

Storing the salt within the HashMap was proposed in #45. Maybe we can use that issue to discuss the details of this idea.

25 remaining items

jappeace

added 3 commits that reference this issue

on Sep 22, 2021

Add HashMapT salt, which allows creation of salt with Nat.

dc40252

Add HashMapT salt, which allows creation of salt with Nat.

73fa02e

Add HashMapT salt, which allows creation of salt with Nat.

89c4f40

jappeace

For lack of better ideas I implemented this: #321

sjakobi

MemberAuthor

I have rekindled #265 to discuss approaches for making the hash salt less predictable to attackers.

I'm reluctant to invest much time into that debate while I'm not aware of a hash function that would make the whole fuss worthwhile though (see #319 (comment)).

If anyone's interested, I think finding an appropriate hash function would be the best way to make progress on this issue.

I noticed that rust currently uses SipHash 1-3:

The default hashing algorithm is currently SipHash 1-3, though this is subject to change at any point in the future. While its performance is very competitive for medium sized keys, other hashing algorithms will outperform it for small keys such as integers as well as large keys such as long strings, though those algorithms will typically not protect against attacks such as HashDoS.

@tibbe, is that the same SipHash variant that you tried? https://en.wikipedia.org/wiki/SipHash#Overview indicates that SipHash 2-4 might be more common, but also slower.

jappeace

I did some digging, spihash was still available in hashable in 2012: https://github.com/haskell-unordered-containers/hashable/tree/fea8260b9e0c0596fc7ef0c608364b3960649f26/cbits

It's quite easy to change that code to be spihash-1-3 (the numbers just indicate the loops of hashing/finilization).

I don't know how recent the SipHash implementation was tested for performance, but it looks relatively easy to add it back into hashable, and run the benchmarks once more. Perhaps it would be possible to speed it up? maybe we can try?

sip hash reference implementation (paper explaining it is linked there as well)

sjakobi

MemberAuthor

@jappeace Good find! Yeah, it would be interesting to set up benchmarks with that code.

The HighwayHash project also contains a supposedly faster SipHash implementation that we could give a spin.

jberryman

Contributor

A little OT, but in case it comes up and since I haven't wrote about this anywhere: I started a project a few years ago named hashabler that aimed to do a few things:

make hashable modular by separating choice of hash function from the byte-stream-supplying code (i.e. the Hashable instance)
Document what makes the latter instances principled, and supply such instances
Define some efficient implementations of alternative hash functions (SipHash among them)
Offer a stable hash for certain types, suitable for serializing and storing and comparing across platforms

Unfortunately I realized I didn't really understand (2) until after I'd made a couple releases and towards the end of a big rewrite, etc. I managed to get the library about 2/3 of the way fixed up but lost steam and haven't had the energy to pick it up since.

But (2) is interesting and I think not very well-understood. But the short version is a composable hashing library like hashable essentially needs to do the same work as a serialization library: the byte-supplying function (Hashable instance) needs to represent a uniquely decodable code ().

The point being you can get collisions from your choice of hash function, but also from the Hashable instance itself even in the presence of a perfect hash function (well, in theory. The obvious bad instances in hashable itself like (IIRC) ([a], [a]) were fixed a long time ago, but perhaps in an ad hoc way, and without documenting how user should avoid the same mistake)

FWIW I'd like to brush that work off some day. In the interim I've had the thought that I could use backpack to allow the choice of hash function to be configurable, without it needing to be part of the regular public API (so e.g. containers could use it transparently)

You're welcome to steal with attribution my siphash implementation here if helpful: https://github.com/jberryman/hashabler/blob/master/src/Data/Hashabler/SipHash.hs . It looks like in my version locally I've got a somewhat faster implementation that uses handwritten asm (only because ghc doesn't have bitwise rotate primops)

jappeace

mentioned this

on Sep 29, 2021

(don't merge) Re-add siphash hashable#222

sjakobi

added a commit that references this issue

on Oct 1, 2021

Add security advisory to package description (#320)

3d9efe3

sjakobi

MemberAuthor

For reference: aeson users can now avoid this vulnerability by enabling aeson's new ordered-keymap flag which makes aeson use Data.Map.Strict for storing JSON objects: https://hackage.haskell.org/package/aeson-2.0.1.0/changelog