Skip to content

Conversation

JanKrivanek
Copy link
Member

Fixes (internal) https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1645155/

Context

Moving away from crypthographicaly weak SHA1 to SHA256 for Hash task

Changes Made

SHA1 - SHA256

Testing

Existing tests

Copy link
Contributor

@Forgind Forgind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted with the test changes to expectedHash, this is a breaking change if someone uses the Hash task and verifies that the output matches some fixed value, but I'm still in favor of moving to the more secure SHA256 implementation. Also, I noticed we have a GetFileHash task competing with this one with fairly small differences, and that one seems to already use SHA256, so it feels weird that this one was overlooked...

@baronfel
Copy link
Member

We probably need a breaking change notice for this one, yeah?

@JanKrivanek
Copy link
Member Author

We probably need a breaking change notice for this one, yeah?

Thanks for bringing this up @baronfel

I haven't find any learn.microsoft.com documentation for this task, so probably only source serves as a documentation and it makes it clear this is subject to change: https://github.com/dotnet/msbuild/blob/main/src/Tasks/Hash.cs#L17

So it might not be necessary - though it certainly would not hurt

@JanKrivanek JanKrivanek added this to the VS 17.8 milestone May 31, 2023
@rainersigwald
Copy link
Member

We should document the task with the explicit "the hash function is an implementation detail and may vary between MSBuild engine versions" caveat. But I'd say we should also call it out as an info-level "breaking" change.

@JanKrivanek
Copy link
Member Author

@rainersigwald - I wasn't able to find documentation for the HashTask - only the GH code comments - those explicitly mention that hash algorithm is subject to change between releases. I agree with the breaking change - @baronfel I believe you planned to mention it, correct?
Chet as well adviced to merge it only into 17.8 (and behind changewave), to prevent mid-cycle (sdk) breaking change - so I'm keeping this open now

@rokonec
Copy link
Member

rokonec commented Jun 12, 2023

SHA256 is 15% - 25% slower than SHA1. Given the intended use case of HashTask which is "to measure only uniqueness between build executions" this task does not required to use "Strong cryptography". I am not sure if this pose real possibility of perf regression as it hashes only itemspec, but even then, I am not convinced it is actually needed.

@rainersigwald, @JanKrivanek Can we please reconsider it?

@JanKrivanek
Copy link
Member Author

SHA256 is 15% - 25% slower than SHA1. Given the intended use case of HashTask which is "to measure only uniqueness between build executions" this task does not required to use "Strong cryptography". I am not sure if this pose real possibility of perf regression as it hashes only itemspec, but even then, I am not convinced it is actually needed.

@rainersigwald, @JanKrivanek Can we please reconsider it?

Staying with SHA1 is not an option

We can pull in System.IO.Hashing - then we'd be able to use some non-cryptographic hash algo for an improved perf. For that I however don't have full context of what's needed to insert new dep(s) into VS (I believe I can get some inspiration in PR like this [internal], but still I would like to fully understand the requirements and consequences).

A dummy search seems to yield some usages within our repos and some in the wild, not much though - so priority might be bit questionable.

That being said - if anybody is willing to give me quick crash course (or point me to some PR (set of PRs) containing all required changes) I'm up for attempting to rewrite this with System.IO.Hashing

@JanKrivanek
Copy link
Member Author

In case we'll choose to leverage System.IO.Hashing - than this is related: dotnet/roslyn#68700

@CyrusNajmabadi
Copy link
Member

In case we'll choose to leverage System.IO.Hashing - than this is related: dotnet/roslyn#68700

I'm a little confused here about the roslyn side. The hash algorithm we use it not an implementation detail. It's a contract we have (including on the PDB side, where we use these hashes to match things up). I'm not understanding the idea here in terms of changing things out.

Note: i would have been happy to stay with sha1 (since we don't use this for cryptographic purposes). But my recollectino was that it was a mandatory MS thing (with no exceptions allowed whatsoever) to maintain support for sha1 without sha256 support.

@JanKrivanek
Copy link
Member Author

The Prism measurements of current impact: [internal link] https://prism.vsdata.io/?query=bf%3DMicrosoft.Build.Tasks.Core.dll!Microsoft.Build.Tasks.Hash.Execute&eventType=cpu

0.12% of CPU used by the Hash task in measured case. It sounds like the ~20% degradation would be negligible

@JanKrivanek JanKrivanek force-pushed the proto/hash-task-sha256 branch from d4b6f38 to e9cefe2 Compare July 10, 2023 07:52
@JanKrivanek
Copy link
Member Author

Perf impact measurements

Conclusion - for a OrchardCore build from command line and from VS the SHA256 backed Hash tasks added ~80ms to build time (both full rebuild and incremental rebuild) - this has more weight on no-change-incremental build (which is faster), but still the slowdown is on the edge 0.2% of the entire build time.

Below are samples of measurement for command line build

Full rebuild

Comparison of Orchard Core full rebuild via msbuild.exe orchardcore.sln (the entire build took ~90 seconds for both scenarios):

Hash task backed by SHA1

CPU sampling:
image
MSBuild Task events:
image

Hash task backed by SHA256

CPU sampling:
image
MSBuild Task events:
image

Incremental build

Comparison of Orchard Core incremental rebuild via msbuild.exe orchardcore.sln with no changes (the entire build took ~30 seconds for both scenarios):

Hash task backed by SHA1

CPU sampling:
image
MSBuild Task events:
image

Hash task backed by SHA256

CPU sampling:
image
MSBuild Task events:
image

Copy link
Member

@rokonec rokonec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets go for it - If performance measurements are acceptable .

Minor notes:

  1. We are using crypto-strong hashing algos when three is no need for it. It could be that thanks for HW optimization and stuff it is faster than crypto-weak hashing, or maybe loading another dependency on system.hashing can cancel perf gains, or we strategically do not want to introduce another dependency ... whatever is the reason we shall document it in code for future us.

  2. SUPER NIT: UnitTest HashTaskDifferentInputSizesTest is testing implementation, basically duplicates hashing code which makes it hard to maintain. I would probably just verify that it produces no conflicts by hashset or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants