Skip to content

fix x86 crash #5081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 4, 2020
Merged

Conversation

frank-dong-ms-zz
Copy link
Contributor

@frank-dong-ms-zz frank-dong-ms-zz commented May 2, 2020

fixes #1216.

TreeEnsembleCombiner has a bug that causing byte array out of range and corrupts heap

@frank-dong-ms-zz frank-dong-ms-zz requested a review from a team as a code owner May 2, 2020 02:22
@frank-dong-ms-zz frank-dong-ms-zz requested a review from harishsk May 2, 2020 03:21
@codecov
Copy link

codecov bot commented May 2, 2020

Codecov Report

Merging #5081 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #5081   +/-   ##
=======================================
  Coverage   75.66%   75.66%           
=======================================
  Files         993      993           
  Lines      178157   178157           
  Branches    19125    19125           
=======================================
+ Hits       134800   134805    +5     
+ Misses      38136    38134    -2     
+ Partials     5221     5218    -3     
Flag Coverage Δ
#Debug 75.66% <100.00%> (+<0.01%) ⬆️
#production 71.64% <100.00%> (+<0.01%) ⬆️
#test 88.67% <ø> (ø)
Impacted Files Coverage Δ
...est/Microsoft.ML.Predictor.Tests/TestPredictors.cs 70.07% <ø> (ø)
...t.ML.FastTree/TreeEnsemble/TreeEnsembleCombiner.cs 83.09% <100.00%> (ø)
...icrosoft.ML.AutoML/Experiment/SuggestedPipeline.cs 88.65% <0.00%> (-4.13%) ⬇️
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 86.55% <0.00%> (-0.85%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 88.93% <0.00%> (-0.21%) ⬇️
...soft.ML.Transforms/Text/WordEmbeddingsExtractor.cs 87.52% <0.00%> (ø)
src/Microsoft.ML.Sweeper/AsyncSweeper.cs 73.97% <0.00%> (+1.36%) ⬆️
...rosoft.ML.AutoML/ColumnInference/TextFileSample.cs 65.56% <0.00%> (+5.96%) ⬆️

Copy link
Contributor

@mstfbl mstfbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix! :shipit:

@@ -65,9 +65,9 @@ IPredictor IModelCombiner.CombineModels(IEnumerable<IPredictor> models)
foreach (var t in tree.TrainedEnsemble.Trees)
{
var bytes = new byte[t.SizeInBytes()];
int position = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Can you please explain this issue a bit more? Why does this happen in x64 but not in x86? This is managed memory. Why is it corrupting the unamanaged heap?

Copy link
Contributor Author

@frank-dong-ms-zz frank-dong-ms-zz May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is index out of range issue that corrupts memory but this one is index was out of range in former (start to use byte array from index -1)...
This memory corrupted (byte array) is allocated in managed but used as unmanaged (from fixed section in C# and pointer) like below:
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.FastTree/TreeEnsemble/InternalRegressionTree.cs#L101
https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.FastTree/Utils/ToByteArrayExtensions.cs#L113
when position is -1, the pointer ((int*)(pBuffer + position)) is accessing memory it should not.

I'm still not sure why this issue not repro in x64. In theory this can also corrupt x64 memory.


In reply to: 419011127 [](ancestors = 419011127)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspected this was the issue from reading above, but special thanks to @stephentoub for finding the actual code that proves it 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt also confirmed this in email.
@sharwell, @stephentoub and @eerhardt Thanks for shedding more light on the issue.

@frank-dong-ms-zz frank-dong-ms-zz merged commit 2a85f3f into dotnet:master May 4, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate why two predictor tests fail with "Unknown command: 'train'" on x86
5 participants