Skip to content

Commit c470cf0

Browse files
committed
Merge branch 'master' of https://github.com/dotnet/machinelearning into addx86prbuild
2 parents 86b337d + d63e21e commit c470cf0

File tree

188 files changed

+61453
-2324
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

188 files changed

+61453
-2324
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Along with these ML capabilities, this first release of ML.NET also brings the f
1616

1717
[![NuGet Status](https://img.shields.io/nuget/v/Microsoft.ML.svg?style=flat)](https://www.nuget.org/packages/Microsoft.ML/)
1818

19-
ML.NET runs on Windows, Linux, and macOS - any platform where 64 bit [.NET Core](https://github.com/dotnet/core) or later is available.
19+
ML.NET runs on Windows, Linux, and macOS - any platform where x64 [.NET Core](https://github.com/dotnet/core) or later is available. In addition, .NET Framework on Windows x64 is also supported.
2020

2121
The current release is 0.6. Check out the [release notes](docs/release-notes/0.6/release-0.6.md) to see what's new.
2222

build/Dependencies.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@
3838
<!-- Test-only Dependencies -->
3939
<PropertyGroup>
4040
<BenchmarkDotNetVersion>0.11.1</BenchmarkDotNetVersion>
41-
<MicrosoftMLTestModelsPackageVersion>0.0.2-test</MicrosoftMLTestModelsPackageVersion>
41+
<MicrosoftMLTestModelsPackageVersion>0.0.3-test</MicrosoftMLTestModelsPackageVersion>
4242
</PropertyGroup>
4343

4444
</Project>

docs/code/MlNetCookBook.md

Lines changed: 48 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
103103
hasHeader: true);
104104

105105
// Now read the file (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
106-
var data = reader.Read(new MultiFileSource(dataPath));
106+
var data = reader.Read(dataPath);
107107
```
108108

109109
If the schema of the data is not known at compile time, or too cumbersome, you can revert to the dynamically-typed API:
@@ -128,9 +128,43 @@ var reader = new TextLoader(mlContext, new TextLoader.Arguments
128128
});
129129

130130
// Now read the file (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
131-
var data = reader.Read(new MultiFileSource(dataPath));
131+
var data = reader.Read(dataPath);
132132
```
133133

134+
## How do I load data from multiple files?
135+
136+
You can again use the `TextLoader`, and specify an array of files to its Read method.
137+
The files need to have the same schema (same number and type of columns)
138+
139+
[Example file1](../../test/data/adult.train):
140+
[Example file2](../../test/data/adult.test):
141+
```
142+
Label Workclass education marital-status
143+
0 Private 11th Never-married
144+
0 Private HS-grad Married-civ-spouse
145+
1 Local-gov Assoc-acdm Married-civ-spouse
146+
1 Private Some-college Married-civ-spouse
147+
```
148+
149+
This is how you can read this data:
150+
```csharp
151+
// Create a new environment for ML.NET operations. It can be used for exception tracking and logging,
152+
// as well as the source of randomness.
153+
var env = new LocalEnvironment();
154+
155+
// Create the reader: define the data columns and where to find them in the text file.
156+
var reader = TextLoader.CreateReader(env, ctx => (
157+
// A boolean column depicting the 'target label'.
158+
IsOver50K: ctx.LoadBool(14),
159+
// Three text columns.
160+
Workclass: ctx.LoadText(1),
161+
Education: ctx.LoadText(3),
162+
MaritalStatus: ctx.LoadText(5)),
163+
hasHeader: true);
164+
165+
// Now read the files (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
166+
var data = reader.Read(exampleFile1, exampleFile2);
167+
134168
## How do I load data with many columns from a CSV?
135169
`TextLoader` is used to load data from text files. You will need to specify what are the data columns, what are their types, and where to find them in the text file.
136170

@@ -162,7 +196,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
162196
163197
164198
// Now read the file (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
165-
var data = reader.Read(new MultiFileSource(dataPath));
199+
var data = reader.Read(dataPath);
166200
```
167201

168202

@@ -183,7 +217,7 @@ var reader = mlContext.Data.TextReader(new[] {
183217
s => s.Separator = ",");
184218

185219
// Now read the file (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
186-
var data = reader.Read(new MultiFileSource(dataPath));
220+
var data = reader.Read(dataPath);
187221
```
188222

189223
## How do I look at the intermediate data?
@@ -231,7 +265,7 @@ var dataPipeline = reader.MakeNewEstimator()
231265

232266
// Let's verify that the data has been read correctly.
233267
// First, we read the data file.
234-
var data = reader.Read(new MultiFileSource(dataPath));
268+
var data = reader.Read(dataPath);
235269

236270
// Fit our data pipeline and transform data with it.
237271
var transformedData = dataPipeline.Fit(data).Transform(data);
@@ -305,7 +339,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
305339

306340

307341
// Now read the file (remember though, readers are lazy, so the actual reading will happen when the data is accessed).
308-
var trainData = reader.Read(new MultiFileSource(trainDataPath));
342+
var trainData = reader.Read(trainDataPath);
309343

310344
// Step two: define the learning pipeline.
311345
@@ -334,7 +368,7 @@ You can use the corresponding 'context' of the task to evaluate the model.
334368
Assuming the example above was used to train the model, here's how you calculate the metrics.
335369
```csharp
336370
// Read the test dataset.
337-
var testData = reader.Read(new MultiFileSource(testDataPath));
371+
var testData = reader.Read(testDataPath);
338372
// Calculate metrics of the model on the test data.
339373
var metrics = mlContext.Regression.Evaluate(model.Transform(testData), label: r => r.Target, score: r => r.Prediction);
340374
```
@@ -390,7 +424,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
390424
separator: ',');
391425

392426
// Retrieve the training data.
393-
var trainData = reader.Read(new MultiFileSource(irisDataPath));
427+
var trainData = reader.Read(irisDataPath);
394428

395429
// Build the training pipeline.
396430
var learningPipeline = reader.MakeNewEstimator()
@@ -557,7 +591,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
557591
separator: ',');
558592

559593
// Retrieve the training data.
560-
var trainData = reader.Read(new MultiFileSource(dataPath));
594+
var trainData = reader.Read(dataPath);
561595

562596
// This is the predictor ('weights collection') that we will train.
563597
MulticlassLogisticRegressionPredictor predictor = null;
@@ -648,7 +682,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
648682
separator: ',');
649683

650684
// Read the training data.
651-
var trainData = reader.Read(new MultiFileSource(dataPath));
685+
var trainData = reader.Read(dataPath);
652686

653687
// Apply all kinds of standard ML.NET normalization to the raw features.
654688
var pipeline = reader.MakeNewEstimator()
@@ -707,7 +741,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
707741
), hasHeader: true);
708742

709743
// Read the data.
710-
var data = reader.Read(new MultiFileSource(dataPath));
744+
var data = reader.Read(dataPath);
711745

712746
// Inspect the categorical columns to check that they are correctly read.
713747
var catColumns = data.GetColumn(r => r.CategoricalFeatures).Take(10).ToArray();
@@ -784,7 +818,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
784818
), hasHeader: true);
785819

786820
// Read the data.
787-
var data = reader.Read(new MultiFileSource(dataPath));
821+
var data = reader.Read(dataPath);
788822

789823
// Inspect the message texts that are read from the file.
790824
var messageTexts = data.GetColumn(x => x.Message).Take(20).ToArray();
@@ -849,7 +883,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
849883
separator: ',');
850884

851885
// Read the data.
852-
var data = reader.Read(new MultiFileSource(dataPath));
886+
var data = reader.Read(dataPath);
853887

854888
// Build the training pipeline.
855889
var learningPipeline = reader.MakeNewEstimator()
@@ -910,7 +944,7 @@ var reader = TextLoader.CreateReader(mlContext, ctx => (
910944
separator: ',');
911945

912946
// Read the data.
913-
var data = reader.Read(new MultiFileSource(dataPath));
947+
var data = reader.Read(dataPath);
914948

915949
// Build the pre-processing pipeline.
916950
var learningPipeline = reader.MakeNewEstimator()

docs/samples/Microsoft.ML.Samples/Microsoft.ML.Samples.csproj

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@
77

88
<ItemGroup>
99
<ProjectReference Include="..\..\..\src\Microsoft.ML.StandardLearners\Microsoft.ML.StandardLearners.csproj" />
10-
<ProjectReference Include="..\..\..\src\Microsoft.ML.SamplesUtils\Microsoft.ML.SamplesUtils.csproj" />
10+
<ProjectReference Include="..\..\..\src\Microsoft.ML.SamplesUtils\Microsoft.ML.SamplesUtils.csproj" />
11+
<ProjectReference Include="..\..\..\src\Microsoft.ML.FastTree\Microsoft.ML.FastTree.csproj" />
12+
<ProjectReference Include="..\..\..\src\Microsoft.ML.LightGBM\Microsoft.ML.LightGBM.csproj" />
13+
1114

1215
<NativeAssemblyReference Include="CpuMathNative" />
1316

docs/samples/Microsoft.ML.Samples/Trainers.cs

Lines changed: 121 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,24 @@
1-
// Licensed to the .NET Foundation under one or more agreements.
1+
// Licensed to the .NET Foundation under one or more agreements.
22
// The .NET Foundation licenses this file to you under the MIT license.
33
// See the LICENSE file in the project root for more information.
44

55
// the alignment of the usings with the methods is intentional so they can display on the same level in the docs site.
66
using Microsoft.ML.Runtime.Data;
77
using Microsoft.ML.Runtime.Learners;
8+
using Microsoft.ML.Runtime.LightGBM;
9+
using Microsoft.ML.Runtime.FastTree;
810
using Microsoft.ML.StaticPipe;
911
using System;
12+
using System.Linq;
1013

1114
// NOTE: WHEN ADDING TO THE FILE, ALWAYS APPEND TO THE END OF IT.
1215
// If you change the existinc content, check that the files referencing it in the XML documentation are still correct, as they reference
1316
// line by line.
1417
namespace Microsoft.ML.Samples
1518
{
1619
public static class Trainers
17-
{
18-
20+
{
21+
1922
public static void SdcaRegression()
2023
{
2124
// Downloading a regression dataset from github.com/dotnet/machinelearning
@@ -37,7 +40,7 @@ public static void SdcaRegression()
3740
separator: '\t', hasHeader: true);
3841

3942
// Read the data, and leave 10% out, so we can use them for testing
40-
var data = reader.Read(new MultiFileSource(dataFile));
43+
var data = reader.Read(dataFile);
4144
var (trainData, testData) = regressionContext.TrainTestSplit(data, testFraction: 0.1);
4245

4346
// The predictor that gets produced out of training
@@ -74,5 +77,119 @@ public static void SdcaRegression()
7477
Console.WriteLine($"RMS - {metrics.Rms}"); // 4.924493
7578
Console.WriteLine($"RSquared - {metrics.RSquared}"); // 0.565467
7679
}
80+
81+
public static void FastTreeRegression()
82+
{
83+
// Downloading a regression dataset from github.com/dotnet/machinelearning
84+
// this will create a housing.txt file in the filsystem this code will run
85+
// you can open the file to see the data.
86+
string dataFile = SamplesUtils.DatasetUtils.DownloadHousingRegressionDataset();
87+
88+
// Creating the ML.Net IHostEnvironment object, needed for the pipeline
89+
var env = new LocalEnvironment(seed: 0);
90+
91+
// Creating the ML context, based on the task performed.
92+
var regressionContext = new RegressionContext(env);
93+
94+
// Creating a data reader, based on the format of the data
95+
var reader = TextLoader.CreateReader(env, c => (
96+
label: c.LoadFloat(0),
97+
features: c.LoadFloat(1, 6)
98+
),
99+
separator: '\t', hasHeader: true);
100+
101+
// Read the data, and leave 10% out, so we can use them for testing
102+
var data = reader.Read(new MultiFileSource(dataFile));
103+
104+
// The predictor that gets produced out of training
105+
FastTreeRegressionPredictor pred = null;
106+
107+
// Create the estimator
108+
var learningPipeline = reader.MakeNewEstimator()
109+
.Append(r => (r.label, score: regressionContext.Trainers.FastTree(
110+
r.label,
111+
r.features,
112+
numTrees: 100, // try: (int) 20-2000
113+
numLeaves: 20, // try: (int) 2-128
114+
minDatapointsInLeafs: 10, // try: (int) 1-100
115+
learningRate: 0.2, // try: (float) 0.025-0.4
116+
onFit: p => pred = p)
117+
)
118+
);
119+
120+
var cvResults = regressionContext.CrossValidate(data, learningPipeline, r => r.label, numFolds: 5);
121+
var averagedMetrics = (
122+
L1: cvResults.Select(r => r.metrics.L1).Average(),
123+
L2: cvResults.Select(r => r.metrics.L2).Average(),
124+
LossFn: cvResults.Select(r => r.metrics.LossFn).Average(),
125+
Rms: cvResults.Select(r => r.metrics.Rms).Average(),
126+
RSquared: cvResults.Select(r => r.metrics.RSquared).Average()
127+
);
128+
Console.WriteLine($"L1 - {averagedMetrics.L1}");
129+
Console.WriteLine($"L2 - {averagedMetrics.L2}");
130+
Console.WriteLine($"LossFunction - {averagedMetrics.LossFn}");
131+
Console.WriteLine($"RMS - {averagedMetrics.Rms}");
132+
Console.WriteLine($"RSquared - {averagedMetrics.RSquared}");
133+
}
134+
135+
public static void LightGbmRegression()
136+
{
137+
// Downloading a regression dataset from github.com/dotnet/machinelearning
138+
// this will create a housing.txt file in the filsystem this code will run
139+
// you can open the file to see the data.
140+
string dataFile = SamplesUtils.DatasetUtils.DownloadHousingRegressionDataset();
141+
142+
// Creating the ML.Net IHostEnvironment object, needed for the pipeline
143+
var env = new LocalEnvironment(seed: 0);
144+
145+
// Creating the ML context, based on the task performed.
146+
var regressionContext = new RegressionContext(env);
147+
148+
// Creating a data reader, based on the format of the data
149+
var reader = TextLoader.CreateReader(env, c => (
150+
label: c.LoadFloat(0),
151+
features: c.LoadFloat(1, 6)
152+
),
153+
separator: '\t', hasHeader: true);
154+
155+
// Read the data, and leave 10% out, so we can use them for testing
156+
var data = reader.Read(new MultiFileSource(dataFile));
157+
var (trainData, testData) = regressionContext.TrainTestSplit(data, testFraction: 0.1);
158+
159+
// The predictor that gets produced out of training
160+
LightGbmRegressionPredictor pred = null;
161+
162+
// Create the estimator
163+
var learningPipeline = reader.MakeNewEstimator()
164+
.Append(r => (r.label, score: regressionContext.Trainers.LightGbm(
165+
r.label,
166+
r.features,
167+
numLeaves: 4,
168+
minDataPerLeaf: 6,
169+
learningRate: 0.001,
170+
onFit: p => pred = p)
171+
)
172+
);
173+
174+
// Fit this pipeline to the training data
175+
var model = learningPipeline.Fit(trainData);
176+
177+
// Check the weights that the model learned
178+
VBuffer<float> weights = default;
179+
pred.GetFeatureWeights(ref weights);
180+
181+
Console.WriteLine($"weight 0 - {weights.Values[0]}");
182+
Console.WriteLine($"weight 1 - {weights.Values[1]}");
183+
184+
// Evaluate how the model is doing on the test data
185+
var dataWithPredictions = model.Transform(testData);
186+
var metrics = regressionContext.Evaluate(dataWithPredictions, r => r.label, r => r.score);
187+
188+
Console.WriteLine($"L1 - {metrics.L1}");
189+
Console.WriteLine($"L2 - {metrics.L2}");
190+
Console.WriteLine($"LossFunction - {metrics.LossFn}");
191+
Console.WriteLine($"RMS - {metrics.Rms}");
192+
Console.WriteLine($"RSquared - {metrics.RSquared}");
193+
}
77194
}
78195
}

src/Microsoft.ML.Api/InternalSchemaDefinition.cs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,10 @@ public static InternalSchemaDefinition Create(Type userType, Direction direction
217217
public static InternalSchemaDefinition Create(Type userType, SchemaDefinition userSchemaDefinition)
218218
{
219219
Contracts.AssertValue(userType);
220-
Contracts.AssertValue(userSchemaDefinition);
220+
Contracts.AssertValueOrNull(userSchemaDefinition);
221+
222+
if (userSchemaDefinition == null)
223+
userSchemaDefinition = SchemaDefinition.Create(userType);
221224

222225
Column[] dstCols = new Column[userSchemaDefinition.Count];
223226

0 commit comments

Comments
 (0)