Skip to content

Commit bd83d91

Browse files
authored
Merge pull request #2 from dotnet/master
Update local fork
2 parents a42625c + 3e5b5ed commit bd83d91

File tree

12 files changed

+146
-29
lines changed

12 files changed

+146
-29
lines changed

Documentation/building/unix-instructions.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,37 +7,44 @@ Building ML.NET on Linux and macOS
77
3. Navigate to the `machinelearning` directory
88
4. Run the build script `./build.sh`
99

10-
Calling the script `build.sh` builds both the native and managed code.
10+
Calling the script `./build.sh` builds both the native and managed code.
1111

12-
For more information about the different options when building, run `build.sh -?` and look at examples in the [developer-guide](../project-docs/developer-guide.md).
12+
For more information about the different options when building, run `./build.sh -?` and look at examples in the [developer-guide](../project-docs/developer-guide.md).
1313

1414
## Minimum Hardware Requirements
1515
- 2GB RAM
16+
- x64
1617

17-
## Prerequisites (native build)
18+
## Prerequisites
1819

1920
### Linux
2021

21-
First, the package lists might need to be updated
22+
The following components are needed:
2223

23-
`sudo apt-get update`
24+
* git
25+
* clang-3.9
26+
* cmake 2.8.12
27+
* libunwind8
28+
* curl
29+
* All the requirements necessary to run .NET Core 2.0 applications: libssl1.0.0 (1.0.2 for Debian 9) and libicu5x (libicu52 for ubuntu 14.x, libicu55 for ubuntu 16.x, and libicu57 for ubuntu 17.x). For more information on prerequisites in different linux distributions click [here](https://docs.microsoft.com/en-us/dotnet/core/linux-prerequisites?tabs=netcore2x).
2430

25-
On Linux, the following components are needed
31+
e.g. for Ubuntu 16.x:
2632

27-
* CMake on the PATH
28-
* Clang 3.5+ (same requirements as coreclr/corefx)
29-
* All the requirements necessary to run .NET Core 2.0 applications
30-
* libunwind
31-
* curl
33+
```sh
34+
sudo apt-get update
35+
sudo apt-get install git clang-3.9 cmake libunwind8 curl
36+
sudo apt-get install libssl1.0.0 libicu55
37+
```
3238

3339
### macOS
3440

3541
macOS 10.12 or higher is needed to build dotnet/machinelearning.
3642

3743
On macOS a few components are needed which are not provided by a default developer setup:
38-
* CMake
44+
* cmake 3.10.3
45+
* All the requirements necessary to run .NET Core 2.0 applications. To view macOS prerequisites click [here](https://docs.microsoft.com/en-us/dotnet/core/macos-prerequisites?tabs=netcore2x).
3946

4047
One way of obtaining CMake is via [Homebrew](http://brew.sh):
4148
```sh
4249
$ brew install cmake
43-
```
50+
```

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,23 +14,25 @@ Along with these ML capabilities this first release of ML.NET also brings the fi
1414

1515
## Installation
1616

17+
[![NuGet Status](https://img.shields.io/nuget/v/Microsoft.ML.svg?style=flat)](https://www.nuget.org/packages/Microsoft.ML/)
18+
1719
ML.NET runs on Windows, Linux, and macOS - any platform where 64 bit [.NET Core](https://github.com/dotnet/core) or later is available.
1820

19-
The current release is 0.1. Check out the [release notes](https://github.com/dotnet/machinelearning/blob/master/Documentation/release-notes/0.1/release-0.1.md).
21+
The current release is 0.1. Check out the [release notes](Documentation/release-notes/0.1/release-0.1.md).
2022

2123
First ensure you have installed [.NET Core 2.0](https://www.microsoft.com/net/learn/get-started) or later. ML.NET also works on the .NET Framework. Note that ML.NET currently must run in a 64 bit process.
2224

23-
Once you have an app, you can install ML.NET NuGet from the .NET Core CLI using:
25+
Once you have an app, you can install the ML.NET NuGet package from the .NET Core CLI using:
2426
```
2527
dotnet add package Microsoft.ML
2628
```
2729

28-
or from the package manager:
30+
or from the NuGet package manager:
2931
```
3032
Install-Package Microsoft.ML
3133
```
3234

33-
Or alternatively you can add the Microsoft.ML package from within Visual Studio's NuGet package manager.
35+
Or alternatively you can add the Microsoft.ML package from within Visual Studio's NuGet package manager or via [Paket](https://github.com/fsprojects/Paket).
3436

3537
## Building
3638

@@ -55,7 +57,8 @@ For more information, see the [.NET Foundation Code of Conduct](https://dotnetfo
5557

5658
## Examples
5759

58-
Here's an example of code to train a model to predict sentiment from text samples. (You can see the complete sample [here](https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Scenario3_SentimentPrediction.cs)):
60+
Here's an example of code to train a model to predict sentiment from text samples.
61+
(You can see the complete sample [here](test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs)):
5962

6063
```C#
6164
var pipeline = new LearningPipeline();

build/BranchInfo.props

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<Project>
22
<PropertyGroup>
33
<MajorVersion>0</MajorVersion>
4-
<MinorVersion>1</MinorVersion>
4+
<MinorVersion>2</MinorVersion>
55
<PatchVersion>0</PatchVersion>
66
<PreReleaseLabel>preview</PreReleaseLabel>
77
</PropertyGroup>

src/Microsoft.ML/LearningPipeline.cs

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,25 +26,113 @@ public ScorerPipelineStep(Var<IDataView> data, Var<ITransformModel> model)
2626
public Var<ITransformModel> Model { get; }
2727
}
2828

29+
30+
/// <summary>
31+
/// The <see cref="LearningPipeline"/> class is used to define the steps needed to perform a desired machine learning task.<para/>
32+
/// The steps are defined by adding a data loader (e.g. <see cref="TextLoader"/>) followed by zero or more transforms (e.g. <see cref="Microsoft.ML.Transforms.TextFeaturizer"/>)
33+
/// and at most one trainer/learner (e.g. <see cref="Microsoft.ML.Trainers.FastTreeBinaryClassifier"/>) in the pipeline.
34+
///
35+
/// </summary>
36+
/// <example>
37+
/// <para/>
38+
/// For example,<para/>
39+
/// <code>
40+
/// var pipeline = new LearningPipeline();
41+
/// pipeline.Add(new TextLoader &lt;SentimentData&gt; (dataPath, separator: ","));
42+
/// pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
43+
/// pipeline.Add(new FastTreeBinaryClassifier());
44+
///
45+
/// var model = pipeline.Train&lt;SentimentData, SentimentPrediction&gt;();
46+
/// </code>
47+
/// </example>
2948
[DebuggerTypeProxy(typeof(LearningPipelineDebugProxy))]
3049
public class LearningPipeline : ICollection<ILearningPipelineItem>
3150
{
3251
private List<ILearningPipelineItem> Items { get; } = new List<ILearningPipelineItem>();
3352

53+
/// <summary>
54+
/// Construct an empty <see cref="LearningPipeline"/> object.
55+
/// </summary>
3456
public LearningPipeline()
3557
{
3658
}
3759

60+
/// <summary>
61+
/// Get the count of ML components in the <see cref="LearningPipeline"/> object
62+
/// </summary>
3863
public int Count => Items.Count;
3964
public bool IsReadOnly => false;
65+
66+
/// <summary>
67+
/// Add a data loader, transform or trainer into the pipeline.
68+
/// Possible data loader(s), transforms and trainers options are
69+
/// <para>
70+
/// Data Loader:
71+
/// <see cref="Microsoft.ML.TextLoader{TInput}" />
72+
/// etc.
73+
/// </para>
74+
/// <para>
75+
/// Transforms:
76+
/// <see cref="Microsoft.ML.Transforms.Dictionarizer"/>,
77+
/// <see cref="Microsoft.ML.Transforms.CategoricalOneHotVectorizer"/>
78+
/// <see cref="Microsoft.ML.Transforms.MinMaxNormalizer"/>,
79+
/// <see cref="Microsoft.ML.Transforms.ColumnCopier"/>,
80+
/// <see cref="Microsoft.ML.Transforms.ColumnConcatenator"/>,
81+
/// <see cref="Microsoft.ML.Transforms.TextFeaturizer"/>,
82+
/// etc.
83+
/// </para>
84+
/// <para>
85+
/// Trainers:
86+
/// <see cref="Microsoft.ML.Trainers.AveragedPerceptronBinaryClassifier"/>,
87+
/// <see cref="Microsoft.ML.Trainers.LogisticRegressor"/>,
88+
/// <see cref="Microsoft.ML.Trainers.StochasticDualCoordinateAscentClassifier"/>,
89+
/// <see cref="Microsoft.ML.Trainers.FastTreeRegressor"/>,
90+
/// etc.
91+
/// </para>
92+
/// For a complete list of transforms and trainers, please see "Microsoft.ML.Transforms" and "Microsoft.ML.Trainers" namespaces.
93+
/// </summary>
94+
/// <param name="item">Any ML component (data loader, transform or trainer) defined as <see cref="ILearningPipelineItem"/>.</param>
4095
public void Add(ILearningPipelineItem item) => Items.Add(item);
96+
97+
/// <summary>
98+
/// Remove all the loaders/transforms/trainers from the pipeline.
99+
/// </summary>
41100
public void Clear() => Items.Clear();
101+
102+
/// <summary>
103+
/// Check if a specific loader/transform/trainer is in the pipeline?
104+
/// </summary>
105+
/// <param name="item">Any ML component (data loader, transform or trainer) defined as <see cref="ILearningPipelineItem"/>.</param>
106+
/// <returns>true if item is found in the pipeline; otherwise, false.</returns>
42107
public bool Contains(ILearningPipelineItem item) => Items.Contains(item);
108+
109+
/// <summary>
110+
/// Copy the pipeline items into an array.
111+
/// </summary>
112+
/// <param name="array">The one-dimensional Array that is the destination of the elements copied from.</param>
113+
/// <param name="arrayIndex">The zero-based index in <paramref name="array" /> at which copying begins.</param>
43114
public void CopyTo(ILearningPipelineItem[] array, int arrayIndex) => Items.CopyTo(array, arrayIndex);
44115
public IEnumerator<ILearningPipelineItem> GetEnumerator() => Items.GetEnumerator();
116+
117+
/// <summary>
118+
/// Remove an item from the pipeline.
119+
/// </summary>
120+
/// <param name="item"><see cref="ILearningPipelineItem"/> to remove.</param>
121+
/// <returns>true if item was removed from the pipeline; otherwise, false.</returns>
45122
public bool Remove(ILearningPipelineItem item) => Items.Remove(item);
46123
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
47124

125+
/// <summary>
126+
/// Train the model using the ML components in the pipeline.
127+
/// </summary>
128+
/// <typeparam name="TInput">Type of data instances the model will be trained on. It's a custom type defined by the user according to the structure of data.
129+
/// <para/>
130+
/// Please see https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet/get-started/windows for more details on input type.
131+
/// </typeparam>
132+
/// <typeparam name="TOutput">Ouput type. The prediction will be return based on this type.
133+
/// Please see https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet/get-started/windows for more details on output type.
134+
/// </typeparam>
135+
/// <returns>PredictionModel object. This is the model object used for prediction on new instances. </returns>
48136
public PredictionModel<TInput, TOutput> Train<TInput, TOutput>()
49137
where TInput : class
50138
where TOutput : class, new()

src/Microsoft.ML/TextLoader.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ private void SetCustomStringFromType(bool useHeader, string separator,
5555
{
5656
var mappingAttr = field.GetCustomAttribute<ColumnAttribute>();
5757
if(mappingAttr == null)
58-
throw Contracts.ExceptParam(nameof(field.Name), " is missing ColumnAttribute");
58+
throw Contracts.ExceptParam(field.Name, $"{field.Name} is missing ColumnAttribute");
5959

6060
schemaBuilder.AppendFormat("col={0}:{1}:{2} ",
6161
mappingAttr.Name ?? field.Name,

src/Native/FastTreeNative/ExpandFloatType.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@
1616
#include "SumupOneBit.h"
1717

1818
// Ideally we should expand this using C++ templates.
19-
// However, In order to exporting functions from DLL float and double versions need to have different names (cannot be overloaded on type parameters)// Expanding here with ugly pre-processor macros to get double and float versions (with fucntion mes suffixes _float and _double)
19+
// However, in order to export functions from DLL, float and double versions need to have different names (cannot be overloaded on type parameters)
20+
// Expanding here with ugly pre-processor macros to get double and float versions (with function name suffixes _float and _double)
2021
// --andrzejp, 2010-03-05
2122

2223
#define FloatType float

src/Native/FastTreeNative/getderivatives.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ EXPORT_API(void) C_GetDerivatives(
106106

107107
bool sameLabel = labelHigh == pLabels[low];
108108

109-
// calculate the lambdaP for this pair by looking it up in the lambdaTable (computed in LambdaMart.FillLambdaTable)
109+
// calculate the lambdaP for this pair by looking it up in the sigmoidTable (e.g. computed in FastTreeRanking.FillSigmoidTable)
110110
double lambdaP;
111111
if (scoreHighMinusLow <= minScore) lambdaP = sigmoidTable[0];
112112
else if (scoreHighMinusLow >= maxScore) lambdaP = sigmoidTable[sigmoidTableLength - 1];

test/Microsoft.ML.Tests/Scenarios/Scenario_HousePricePrediction.cs renamed to test/Microsoft.ML.Tests/Scenarios/HousePricePredictionTests.cs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,17 @@
22
// The .NET Foundation licenses this file to you under the MIT license.
33
// See the LICENSE file in the project root for more information.
44

5+
using Microsoft.ML.Models;
56
using Microsoft.ML.Runtime.Api;
67
using Microsoft.ML.TestFramework;
8+
using Microsoft.ML.Trainers;
9+
using Microsoft.ML.Transforms;
710
using Xunit;
811
using Xunit.Abstractions;
912

1013
namespace Microsoft.ML.Scenarios
1114
{
12-
public partial class Top5Scenarios : BaseTestClass
15+
public partial class ScenariosTests : BaseTestClass
1316
{
1417
/*
1518
A real-estate firm Contoso wants to add a house price prediction to their ASP.NET/Xamarin application.
@@ -121,7 +124,7 @@ public class HousePricePrediction
121124
public float Price;
122125
}
123126

124-
public Top5Scenarios(ITestOutputHelper output) : base(output)
127+
public ScenariosTests(ITestOutputHelper output) : base(output)
125128
{
126129
}
127130
}

test/Microsoft.ML.Tests/Scenarios/Scenario_TrainPredictionModel.cs renamed to test/Microsoft.ML.Tests/Scenarios/HousePriceTrainAndPredictionTests.cs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
1-
// Licensed to the .NET Foundation under one or more agreements.
1+
// Licensed to the .NET Foundation under one or more agreements.
22
// The .NET Foundation licenses this file to you under the MIT license.
33
// See the LICENSE file in the project root for more information.
44

55
using Microsoft.ML.Models;
6+
using Microsoft.ML.Runtime.Api;
7+
using Microsoft.ML.TestFramework;
68
using Microsoft.ML.Trainers;
79
using Microsoft.ML.Transforms;
810
using Xunit;
11+
using Xunit.Abstractions;
912

1013
namespace Microsoft.ML.Scenarios
1114
{
12-
public partial class Top5Scenarios
15+
public partial class ScenariosTests
1316
{
1417
[Fact(Skip = "Missing data set. See https://github.com/dotnet/machinelearning/issues/3")]
1518
public void TrainAndPredictHousePriceModelTest()
@@ -70,4 +73,3 @@ public void TrainAndPredictHousePriceModelTest()
7073
}
7174
}
7275
}
73-

test/Microsoft.ML.Tests/Scenarios/TrainAndPredictIrisModelTest.cs renamed to test/Microsoft.ML.Tests/Scenarios/IrisPlantClassificationTests.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
namespace Microsoft.ML.Scenarios
1212
{
13-
public partial class Top5Scenarios
13+
public partial class ScenariosTests
1414
{
1515
[Fact]
1616
public void TrainAndPredictIrisModelTest()

test/Microsoft.ML.Tests/Scenarios/Scenario3_SentimentPrediction.cs renamed to test/Microsoft.ML.Tests/Scenarios/SentimentPredictionTests.cs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
namespace Microsoft.ML.Scenarios
1515
{
16-
public partial class Top5Scenarios
16+
public partial class ScenariosTests
1717
{
1818
public const string SentimentDataPath = "wikipedia-detox-250-line-data.tsv";
1919
public const string SentimentTestPath = "wikipedia-detox-250-line-test.tsv";

test/Microsoft.ML.Tests/TextLoaderTests.cs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
using Microsoft.ML.Runtime.Api;
88
using Microsoft.ML.Runtime.Data;
99
using Microsoft.ML.TestFramework;
10+
using System;
1011
using Xunit;
1112
using Xunit.Abstractions;
1213

@@ -219,6 +220,13 @@ public void CanSuccessfullyTrimSpaces()
219220
}
220221
}
221222

223+
[Fact]
224+
public void ThrowsExceptionWithPropertyName()
225+
{
226+
Exception ex = Assert.Throws<ArgumentOutOfRangeException>( () => new TextLoader<ModelWithoutColumnAttribute>("fakefile.txt") );
227+
Assert.StartsWith("String1 is missing ColumnAttribute", ex.Message);
228+
}
229+
222230
public class QuoteInput
223231
{
224232
[Column("0")]
@@ -254,5 +262,10 @@ public class Input
254262
[Column("1")]
255263
public float Number1;
256264
}
265+
266+
public class ModelWithoutColumnAttribute
267+
{
268+
public string String1;
269+
}
257270
}
258271
}

0 commit comments

Comments
 (0)