diff --git a/docs/machine-learning/tutorials/sentiment-analysis.md b/docs/machine-learning/tutorials/sentiment-analysis.md index 18d45ca519700..1159cf8d56610 100644 --- a/docs/machine-learning/tutorials/sentiment-analysis.md +++ b/docs/machine-learning/tutorials/sentiment-analysis.md @@ -1,7 +1,7 @@ --- title: Use ML.NET in a sentiment analysis binary classification scenario description: Discover how to use ML.NET in a binary classification scenario to understand how to use sentiment prediction to take the appropriate action. -ms.date: 03/01/2019 +ms.date: 03/07/2019 ms.topic: tutorial ms.custom: mvc, seodec18 #Customer intent: As a developer, I want to use ML.NET to apply a binary classification task so that I can understand how to use sentiment prediction to take appropriate action. @@ -13,7 +13,7 @@ This sample tutorial illustrates using ML.NET to create a sentiment classifier v > [!NOTE] > This topic refers to ML.NET, which is currently in Preview, and material may be subject to change. For more information, visit [the ML.NET introduction](https://www.microsoft.com/net/learn/apps/machine-learning-and-ai/ml-dotnet). -This tutorial and related sample are currently using **ML.NET version 0.10**. For more information, see the release notes at the [dotnet/machinelearning github repo](https://github.com/dotnet/machinelearning/tree/master/docs/release-notes) +This tutorial and related sample are currently using **ML.NET version 0.11**. For more information, see the release notes at the [dotnet/machinelearning GitHub repo](https://github.com/dotnet/machinelearning/tree/master/docs/release-notes) In this tutorial, you learn how to: > [!div class="checklist"] @@ -28,7 +28,7 @@ In this tutorial, you learn how to: ## Sentiment analysis sample overview -The sample is a console app that uses ML.NET to train a model that classifies and predicts sentiment as either positive or negative. It also evaluates the model with a second dataset for quality analysis. The sentiment datasets are from the WikiDetox project. +The sample is a console app that uses ML.NET to train a model that classifies and predicts sentiment as either positive or negative. The Yelp sentiment dataset is from University of California, Irvine (UCI), which is split into a train dataset and a test dataset. The sample evaluates the model with the test dataset for quality analysis. You can find the source code for this tutorial at the [dotnet/samples](https://github.com/dotnet/samples/tree/master/machine-learning/tutorials/SentimentAnalysis) repository. @@ -36,8 +36,7 @@ You can find the source code for this tutorial at the [dotnet/samples](https://g * [Visual Studio 2017 15.6 or later](https://visualstudio.microsoft.com/downloads/?utm_medium=microsoft&utm_source=docs.microsoft.com&utm_campaign=button+cta&utm_content=download+vs2017) with the ".NET Core cross-platform development" workload installed. -* The [Wikipedia detox line data tab separated file (wikiPedia-detox-250-line-data.tsv)](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-data.tsv). -* The [Wikipedia detox line test tab separated file (wikipedia-detox-250-line-test.tsv)](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-test.tsv). +* [The UCI Sentiment Labeled Sentences dataset zip file](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip) ## Machine learning workflow @@ -69,15 +68,16 @@ You then need to **determine** the sentiment, which helps you with the machine l With this problem, you know the following facts: -Training data: website comments can be toxic (1) or not toxic (0) (**sentiment**). -Predict the **sentiment** of a new website comment, either toxic or not toxic, such as in the following examples: +Training data: website comments can be positive (1) or negative (0) (**sentiment**). -* Please refrain from adding nonsense to Wikipedia. -* He is the best, and the article should say that. +Predict the **sentiment** of a new website comment, either positive or negative, such as in the following examples: + +* I love the wait staff here. They rock. +* This place has the worst soup. The classification machine learning algorithm is best suited for this scenario. -### About the classification task +### About the classification algorithm Classification is a machine learning algorithm that uses data to **determine** the category, type, or class of an item or row of data. For example, you can use classification to: @@ -91,6 +91,8 @@ Classification algorithms are frequently one of the following types: * Binary: either A or B. * Multiclass: multiple categories that can be predicted by using a single model. +Because the website comments need to be classified as either positive or negative, you use the Binary Classification algorithm. + ## Create a console application 1. Open Visual Studio 2017. Select **File** > **New** > **Project** from the menu bar. In the **New Project** dialog, select the **Visual C#** node followed by the **.NET Core** node. Then select the **Console App (.NET Core)** project template. In the **Name** text box, type "SentimentAnalysis" and then select the **OK** button. @@ -105,26 +107,29 @@ Classification algorithms are frequently one of the following types: ### Prepare your data -1. Download the [Wikipedia detox-250-line-data.tsv](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-data.tsv) and the [wikipedia-detox-250-line-test.tsv](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-test.tsv) data sets and save them to the *Data* folder previously created. The first dataset trains the machine learning model and the second can be used to evaluate how accurate your model is. +1. Download [The UCI Sentiment Labeled Sentences dataset zip file (see citations in the following note)](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip), and unzip. + +2. Copy the `yelp_labelled.txt` file into the *Data* directory you created. -2. In Solution Explorer, right-click each of the \*.tsv files and select **Properties**. Under **Advanced**, change the value of **Copy to Output Directory** to **Copy if newer**. +> [!NOTE] +> The datasets this tutorial uses are from the 'From Group to Individual Labels using Deep Features', Kotzias et. al,. KDD 2015, and hosted at the UCI Machine Learning Repository - Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. + +3. In Solution Explorer, right-click the `yelp_labeled.txt` file and select **Properties**. Under **Advanced**, change the value of **Copy to Output Directory** to **Copy if newer**. ### Create classes and define paths Add the following additional `using` statements to the top of the *Program.cs* file: -[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#1 "Add necessary usings")] +[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#AddUsings "Add necessary usings")] -You need to create three global fields to hold the paths to the recently downloaded files, and a global variable for the `TextLoader`: +You need to create two global fields to hold the recently downloaded dataset file path and the saved model file path: -* `_trainDataPath` has the path to the dataset used to train the model. -* `_testDataPath` has the path to the dataset used to evaluate the model. +* `_dataPath` has the path to the dataset used to train the model. * `_modelPath` has the path where the trained model is saved. -* `_textLoader` is the used to load and transform the datasets. -Add the following code to the line right above the `Main` method to specify those paths and the `_textLoader` variable: +Add the following code to the line right above the `Main` method to specify those paths: -[!code-csharp[Declare global variables](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#2 "Declare global variables")] +[!code-csharp[Declare global variables](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#DeclareGlobalVariables "Declare global variables")] You need to create some classes for your input data and predictions. Add a new class to your project: @@ -134,38 +139,73 @@ You need to create some classes for your input data and predictions. Add a new c The *SentimentData.cs* file opens in the code editor. Add the following `using` statement to the top of *SentimentData.cs*: -[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#1 "Add necessary usings")] +[!code-csharp[AddUsings](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#AddUsings "Add necessary usings")] Remove the existing class definition and add the following code, which has two classes `SentimentData` and `SentimentPrediction`, to the *SentimentData.cs* file: -[!code-csharp[DeclareTypes](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#2 "Declare data record types")] +[!code-csharp[DeclareTypes](../../../samples/machine-learning/tutorials/SentimentAnalysis/SentimentData.cs#DeclareTypes "Declare data record types")] -`SentimentData` is the input dataset class and has a `float` (`Sentiment`) that has a value for sentiment of either positive or negative, and a string for the comment (`SentimentText`). Both fields have `Column` attributes attached to them. This attribute describes the order of each field in the data file, and which is the `Label` field. `SentimentPrediction` is the class used for prediction after the model has been trained. It has a single boolean (`Sentiment`) and a `PredictedLabel` `ColumnName` attribute. The `Label` is used to create and train the model, and it's also used with a second dataset to evaluate the model. The `PredictedLabel` is used during prediction and evaluation. For evaluation, an input with training data, the predicted values, and the model are used. +The input dataset class, `SentimentData`, has a `string` for the comment (`SentimentText`) and a `bool` (`Sentiment`) that has a value for sentiment of either positive or negative. Both fields have attributes attached to them. This attribute describes the order of each field in the data file. In addition, the `Sentiment` property has a to designate it as the `Label` field. `SentimentPrediction` is the class used for prediction after the model has been trained. It has a single boolean (`Sentiment`) and a `PredictedLabel` `ColumnName` attribute. The `Label` is used to create and train the model, and it's also used with the split out test dataset to evaluate the model. The `PredictedLabel` is used during prediction and evaluation. For evaluation, an input with training data, the predicted values, and the model are used. -When building a model with ML.NET you start by creating an `MLContext`. This is comparable conceptually to using `DbContext` in Entity Framework. The environment provides a context for your ML job that can be used for exception tracking and logging. +When building a model with ML.NET you start by creating an . `MLContext` is comparable conceptually to using `DbContext` in Entity Framework. The environment provides a context for your ML job that can be used for exception tracking and logging. ### Initialize variables in Main Create a variable called `mlContext` and initialize it with a new instance of `MLContext`. Replace the `Console.WriteLine("Hello World!")` line with the following code in the `Main` method: -[!code-csharp[CreateMLContext](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#3 "Create the ML Context")] +[!code-csharp[CreateMLContext](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CreateMLContext "Create the ML Context")] -Next, to setup for data loading initialize the `_textLoader` global variable in order to reuse it. When you create a `TextLoader` using `MLContext.Data.CreateTextLoader`, you pass in the context needed and the class which enables customization. +Add the following as the next line of code in the `Main` method: - Specify the data schema by passing an array of objects to the loader containing all the column names and their types. You defined the data schema previously when you created our `SentimentData` class. For our schema, the first column (Label) is a (the prediction) and the second column (SentimentText) is the feature of type text/string used for predicting the sentiment. -The `TextLoader` class returns a fully initialized +[!code-csharp[CallLoadData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CallProcessData)] -To initialize the `_textLoader` global variable in order to reuse it for the needed datasets, add the following code after the `mlContext` initialization: +The `LoadData` method executes the following tasks: -[!code-csharp[initTextLoader](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#4 "Initialize the TextLoader")] +* Loads the data. +* Splits the loaded dataset into train and test datasets. +* Returns the split train and test datasets. -Add the following as the next line of code in the `Main` method: +Create the `LoadData` method, just after the `Main` method, using the following code: + +```csharp +public static (IDataView trainSet, IDataView testSet) LoadData(MLContext mlContext) +{ -[!code-csharp[Train](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#5 "Train your model")] +} +``` +## Load the data -The `Train` method executes the following tasks: +Since your previously created `SentimentData` data model type matches the dataset schema, you can combine the initialization, mapping, and dataset loading into one line of code using the `MLContext.Data.ReadFromTextFile` wrapper for . It returns a +. + + As the input and output of `Transforms`, a `DataView` is the fundamental data pipeline type, comparable to `IEnumerable` for `LINQ`. + +In ML.NET, data is similar to a SQL view. It is lazily evaluated, schematized, and heterogenous. The object is the first part of the pipeline, and loads the data. For this tutorial, it loads a dataset with comments and corresponding toxic or non toxic sentiment. This is used to create the model, and train it. + + Add the following code as the first line of the `LoadData` method: + +[!code-csharp[LoadData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#LoadData "loading dataset")] + +### Split the dataset for model training and testing + +Next, you need both a training dataset to train the model and a test dataset to evaluate the model. Use the `MLContext.BinaryClassification.TrainTestSplit` which wraps to split the loaded dataset into train and test datasets. You can specify the fraction of data for the test set with the `testFraction`parameter. The default is 10% but you use 20% in this case to use more data for the evaluation. + +To split the loaded data into the needed datasets, add the following code as the next line in the `LoadData` method: + +[!code-csharp[SplitData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#SplitData "Split the Data")] + +Return the `splitDataView` at the end of the `LoadData` method: + +[!code-csharp[ReturnSplitData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#ReturnSplitData)] + +## Build and train the model + +Add the following call to the `BuildAndTrainModel`method as the next line of code in the `Main` method: + +[!code-csharp[CallBuildAndTrainModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CallBuildAndTrainModel)] + +The `BuildAndTrainModel` method executes the following tasks: -* Loads the data. * Extracts and transforms the data. * Trains the model. * Predicts sentiment based on test data. @@ -174,24 +214,13 @@ The `Train` method executes the following tasks: Create the `Train` method, just after the `Main` method, using the following code: ```csharp - public static ITransformer Train(MLContext mlContext, string dataPath) +public static ITransformer BuildAndTrainModel(MLContext mlContext, IDataView splitTrainSet) { } ``` -Notice that two parameters are passed into the Train method; a `MLContext` for the context (`mlContext`), and a for the dataset path (`dataPath`). You're going to use this method more than once for training and testing. - -## Load the data - -You'll load the data using the `_textLoader` global variable with the `dataPath` parameter. It returns a -. As the input and output of `Transforms`, a `DataView` is the fundamental data pipeline type, comparable to `IEnumerable` for `LINQ`. - -In ML.NET, data is similar to a SQL view. It is lazily evaluated, schematized, and heterogenous. The object is the first part of the pipeline, and loads the data. For this tutorial, it loads a dataset with comments and corresponding toxic or non toxic sentiment. This is used to create the model, and train it. - - Add the following code as the first line of the `Train` method: - -[!code-csharp[LoadTrainData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#6 "loading training dataset")] +Notice that two parameters are passed into the Train method; a `MLContext` for the context (`mlContext`), and an `IDataView`for the training dataset (`splitTrainSet`). ## Extract and transform the data @@ -201,41 +230,41 @@ ML.NET's transform pipelines compose a custom set of transforms that are applied Next, call `mlContext.Transforms.Text.FeaturizeText` which featurizes the text column (`SentimentText`) column into a numeric vector called `Features` used by the machine learning algorithm. This is a wrapper call that returns an that will effectively be a pipeline. Name this `pipeline` as you will then append the trainer to the `EstimatorChain`. Add this as the next line of code: -[!code-csharp[TextFeaturizingEstimator](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#7 "Add a TextFeaturizingEstimator")] +[!code-csharp[FeaturizeText](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#FeaturizeText "Featurize the text")] >[!WARNING] -> ML.NET Version 0.10 has changed the order of the Transform parameters. This will not error out until you run the application and build the model. Use the parameter names for Transforms as illustrated in the previous code snippet. +> ML.NET Version 0.10 changed the order of the Transform parameters. This will not error out until you run the application and build the model. Use the parameter names for Transforms as illustrated in the previous code snippet. This is the preprocessing/featurization step. Using additional components available in ML.NET can enable better results with your model. ## Choose a learning algorithm -To add the trainer, call the `mlContext.Transforms.Text.FeaturizeText` wrapper method which returns a object. This is a decision tree learner you'll use in this pipeline. The `FastTreeBinaryClassificationTrainer` is appended to the `pipeline` and accepts the featurized `SentimentText` (`Features`) and the `Label` input parameters to learn from the historic data. +To add the trainer, call the `mlContext.BinaryClassification.Trainers.FastTree` wrapper method which returns a object. This is a decision tree learner you'll use in this pipeline. The `FastTreeBinaryClassificationTrainer` is appended to the `pipeline` and accepts the featurized `SentimentText` (`Features`) and the `Label` input parameters to learn from the historic data. -Add the following code to the `Train` method: +Add the following code to the `BuildAndTrainModel` method: -[!code-csharp[FastTreeBinaryClassificationTrainer](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#8 "Add a FastTreeBinaryClassificationTrainer")] +[!code-csharp[FastTreeBinaryClassificationTrainer](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#AddTrainer "Add a FastTreeBinaryClassificationTrainer")] ## Train the model -You train the model, , based on the dataset that has been loaded and transformed. Once the estimator has been defined, you train your model using the while providing the already loaded training data. This returns a model to use for predictions. `pipeline.Fit()` trains the pipeline and returns a `Transformer` based on the `DataView` passed in. The experiment is not executed until this happens. +You train the model, , based on the dataset that has been loaded and transformed. Once the estimator has been defined, you train your model using the method while providing the already loaded training data. This returns a model to use for predictions. `pipeline.Fit()` trains the pipeline and returns a `Transformer` based on the `DataView` passed in. The experiment is not executed until the `.Fit()` method runs. -Add the following code to the `Train` method: +Add the following code to the `BuildAndTrainModel` method: -[!code-csharp[TrainModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#9 "Train the model")] +[!code-csharp[TrainModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#TrainModel "Train the model")] ### Save and Return the model trained to use for evaluation -At this point, you have a model of type that can be integrated into any of your existing or new .NET applications. Return the model at the end of the `Train` method. +At this point, you have a model of type that can be integrated into any of your existing or new .NET applications. Return the model at the end of the `BuildAndTrainModel` method. -[!code-csharp[ReturnModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#10 "Return the model")] +[!code-csharp[ReturnModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#ReturnModel "Return the model")] ## Evaluate the model -Now that you've created and trained the model, you need to evaluate it with a different dataset for quality assurance and validation. In the `Evaluate` method, the model created in `Train` is passed in to be evaluated. Create the `Evaluate` method, just after `Train`, as in the following code: +Now that you've created and trained the model, you need to evaluate it with a different dataset for quality assurance and validation. In the `Evaluate` method, the model created in `BuildAndTrainModel` is passed in to be evaluated. Create the `Evaluate` method, just after `BuildAndTrainModel`, as in the following code: ```csharp -public static void Evaluate(MLContext mlContext, ITransformer model) +public static void Evaluate(MLContext mlContext, ITransformer model, IDataView splitTestSet) { } @@ -244,35 +273,31 @@ public static void Evaluate(MLContext mlContext, ITransformer model) The `Evaluate` method executes the following tasks: * Loads the test dataset. -* Creates the binary evaluator. -* Evaluates the model and create metrics. +* Creates the binaryclassification evaluator. +* Evaluates the model and creates metrics. * Displays the metrics. Add a call to the new method from the `Main` method, right under the `Train` method call, using the following code: -[!code-csharp[CallEvaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#11 "Call the Evaluate method")] - -You'll load the test dataset using the previously initialized `_textLoader` global variable with the `_testDataPath` global field. You can evaluate the model using this dataset as a quality check. Add the following code to the `Evaluate` method: - -[!code-csharp[LoadTestDataset](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#12 "Load the test dataset")] +[!code-csharp[CallEvaluate](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CallEvaluate "Call the Evaluate method")] -Next, you'll use the machine learning `model` parameter (a transformer) to input the features and return predictions. Add the following code to the `Evaluate` method as the next line: +Next, you'll use the machine learning `model` parameter (a transformer) and the `splitTestSet` parameter to input the features and return predictions. Add the following code to the `Evaluate` method as the next line: -[!code-csharp[PredictWithTransformer](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#13 "Predict using the Transformer")] +[!code-csharp[PredictWithTransformer](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#TransformData "Predict using the Transformer")] -The `BinaryClassificationContext.Evaluate` method computes the quality metrics for the `PredictionModel` using the specified dataset. It returns a `BinaryClassificationEvaluator.CalibratedResult` object contains the overall metrics computed by binary classification evaluators. To display these to determine the quality of the model, you need to get the metrics first. Add the following code as the next line in the `Evaluate` method: +The `mlContext.BinaryClassification.Evaluate` method computes the quality metrics for the `PredictionModel` using the specified dataset. It returns a object that contains the overall metrics computed by binary classification evaluators. To display these to determine the quality of the model, you need to get the metrics first. Add the following code as the next line in the `Evaluate` method: -[!code-csharp[ComputeMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#14 "Compute Metrics")] +[!code-csharp[ComputeMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#Evaluate "Compute Metrics")] ### Displaying the metrics for model validation Use the following code to display the metrics, share the results, and then act on them: -[!code-csharp[DisplayMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#15 "Display selected metrics")] +[!code-csharp[DisplayMetrics](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#DisplayMetrics "Display selected metrics")] To save your model to a .zip file before returning, add the following code to call the `SaveModelAsFile` method as the next line in `Evaluate`: -[!code-csharp[SaveModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#23 "Save the model")] +[!code-csharp[SaveModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CallSaveModel "Save the model")] ## Save the model as a.zip file @@ -291,8 +316,10 @@ The `SaveModelAsFile` method executes the following tasks: Next, create a method to save the model so that it can be reused and consumed in other applications. The `ITransformer` has a method that takes in the `_modelPath` global field, and a . To save this as a zip file, you'll create the `FileStream` immediately before calling the `SaveTo` method. Add the following code to the `SaveModelAsFile` method as the next line: -[!code-csharp[SaveToMethod](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#24 "Add the SaveTo Method")] -Deploy and Predict with a loaded model +[!code-csharp[SaveToMethod](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#SaveModel "Add the SaveTo Method")] + +## Deploy and Predict with a loaded model + You could also display where the file was written by writing a console message with the `_modelPath`, using the following code: ```csharp @@ -301,16 +328,16 @@ Console.WriteLine("The model is saved to {0}", _modelPath); ## Predict the test data outcome with the saved model -Create the `Predict` method, just after the `Evaluate` method, using the following code: +Create the `UseModelWithSingleItem` method, just after the `Evaluate` method, using the following code: ```csharp -private static void Predict(MLContext mlContext, ITransformer model) +private static void UseModelWithSingleItem(MLContext mlContext, ITransformer model) { } ``` -The `Predict` method executes the following tasks: +The `UseModelWithSingleItem` method executes the following tasks: * Creates a single comment of test data. * Predicts sentiment based on test data. @@ -319,73 +346,73 @@ The `Predict` method executes the following tasks: Add a call to the new method from the `Main` method, right under the `Evaluate` method call, using the following code: -[!code-csharp[CallPredict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#16 "Call the Predict method")] +[!code-csharp[CallUseModelWithSingleItem](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CallUseModelWithSingleItem "Call the UseModelWithSingleItem method")] While the `model` is a `transformer` that operates on many rows of data, a very common production scenario is a need for predictions on individual examples. The is a wrapper that is returned from the `CreatePredictionEngine` method. Let's add the following code to create the `PredictionEngine` as the first line in the `Predict` Method: -[!code-csharp[CreatePredictionEngine](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#17 "Create the PredictionEngine")] +[!code-csharp[CreatePredictionEngine](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CreatePredictionEngine1 "Create the PredictionEngine")] Add a comment to test the trained model's prediction in the `Predict` method by creating an instance of `SentimentData`: -[!code-csharp[PredictionData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#18 "Create test data for single prediction")] +[!code-csharp[PredictionData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CreateTestIssue1 "Create test data for single prediction")] - You can use that to predict the Toxic or Non Toxic sentiment of a single instance of the comment data. To get a prediction, use on the data. Note that the input data is a string and the model includes the featurization. Your pipeline is in sync during training and prediction. You didn’t have to write preprocessing/featurization code specifically for predictions, and the same API takes care of both batch and one-time predictions. + You can use that to predict the positive or negative sentiment of a single instance of the comment data. To get a prediction, use on the data. Note that the input data is a string and the model includes the featurization. Your pipeline is in sync during training and prediction. You didn’t have to write preprocessing/featurization code specifically for predictions, and the same API takes care of both batch and one-time predictions. -[!code-csharp[Predict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#19 "Create a prediction of sentiment")] +[!code-csharp[Predict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#Predict "Create a prediction of sentiment")] -### Using the model: prediction +### Use the model: prediction Display `SentimentText` and corresponding sentiment prediction in order to share the results and act on them accordingly. This is called operationalization, using the returned data as part of the operational policies. Create a display for the results using the following code: -[!code-csharp[OutputPrediction](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#20 "Display prediction output")] +[!code-csharp[OutputPrediction](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#OutputPrediction "Display prediction output")] ## Deploy and Predict with a loaded model -Create the `PredictWithModelLoadedFromFile` method, just before the `SaveModelAsFile` method, using the following code: +Create the `UseLoadedModelWithBatchItems` method, just before the `SaveModelAsFile` method, using the following code: ```csharp -public static void PredictWithModelLoadedFromFile(MLContext mlContext) +public static void UseLoadedModelWithBatchItems(MLContext mlContext) { } ``` -The `PredictWithModelLoadedFromFile` method executes the following tasks: +The `UseLoadedModelWithBatchItems` method executes the following tasks: * Creates batch test data. * Predicts sentiment based on test data. * Combines test data and predictions for reporting. * Displays the predicted results. -Add a call to the new method from the `Main` method, right under the `Predict` method call, using the following code: +Add a call to the new method from the `Main` method, right under the `UseModelWithSingleItem` method call, using the following code: -[!code-csharp[CallPredictModelLoaded](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#25 "Call the PredictWithModelLoadedFromFile method")] +[!code-csharp[CallPredictModelLoaded](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CallUseLoadedModelWithBatchItems "Call the CallUseLoadedModelWithBatchItems method")] -Add some comments to test the trained model's predictions in the `PredictWithModelLoadedFromFile` method: +Add some comments to test the trained model's predictions in the `UseLoadedModelWithBatchItems` method: -[!code-csharp[PredictionData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#26 "Create test data for predictions")] +[!code-csharp[PredictionData](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#CreateTestIssues "Create test data for predictions")] Load the model -[!code-csharp[LoadTheModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#27 "Load the model")] +[!code-csharp[LoadTheModel](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#LoadModel "Load the model")] -Now that you have a model, you can use that to predict the Toxic or Non Toxic sentiment of the comment data using the method. To get a prediction, use `Predict` on new data. Note that the input data is a string and the model includes the featurization. Your pipeline is in sync during training and prediction. You didn’t have to write preprocessing/featurization code specifically for predictions, and the same API takes care of both batch and one-time predictions. Add the following code to the `PredictWithModelLoadedFromFile` method for the predictions: +Now that you have a model, you can use that to predict the Toxic or Non Toxic sentiment of the comment data using the method. To get a prediction, use `Predict` on new data. Note that the input data is a string and the model includes the featurization. Your pipeline is in sync during training and prediction. You didn’t have to write preprocessing/featurization code specifically for predictions, and the same API takes care of both batch and one-time predictions. Add the following code to the `UseLoadedModelWithBatchItems` method for the predictions: -[!code-csharp[Predict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#28 "Create predictions of sentiments")] +[!code-csharp[Predict](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#Prediction "Create predictions of sentiments")] -### Using the loaded model for prediction +### Use the loaded model for prediction Display `SentimentText` and corresponding sentiment prediction in order to share the results and act on them accordingly. This is called operationalization, using the returned data as part of the operational policies. Create a header for the results using the following code: -[!code-csharp[OutputHeaders](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#29 "Display prediction outputs")] +[!code-csharp[OutputHeaders](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#AddInfoMessage "Display prediction outputs")] Before displaying the predicted results, combine the sentiment and prediction together to see the original comment with its predicted sentiment. The following code uses the method to make that happen, so add that code next: -[!code-csharp[BuildTuples](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#30 "Build the pairs of sentiment data and predictions")] +[!code-csharp[BuildTuples](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#BuildSentimentPredictionPairs "Build the pairs of sentiment data and predictions")] Now that you've combined the `SentimentText` and `Sentiment` into a class, you can display the results using the method: -[!code-csharp[DisplayPredictions](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#31 "Display the predictions")] +[!code-csharp[DisplayPredictions](../../../samples/machine-learning/tutorials/SentimentAnalysis/Program.cs#DisplayResults "Display the predictions")] Because inferred tuple element names are a new feature in C# 7.1 and the default language version of the project is C# 7.0, you need to change the language version to C# 7.1 or higher. To do that, right-click on the project node in **Solution Explorer** and select **Properties**. Select the **Build** tab and select the **Advanced** button. In the dropdown, select **C# 7.1** (or a higher version). Select the **OK** button. @@ -397,27 +424,27 @@ Your results should be similar to the following. As the pipeline processes, it d ```console Model quality metrics evaluation -------------------------------- -Accuracy: 94.44% -Auc: 98.77% -F1Score: 94.74% +Accuracy: 79.14% +Auc: 86.27% +F1Score: 80.60% + =============== End of model evaluation =============== +The model is saved to C:\Tutorials\SentimentAnalysis\bin\Debug\netcoreapp2.1\Data\Model.zip =============== Prediction Test of model with a single sample and test dataset =============== -Sentiment: This is a very rude movie | Prediction: Toxic | Probability: 0.5297049 +Sentiment: This was a very bad steak | Prediction: Negative | Probability: 0.4641322 =============== End of Predictions =============== -=============== New iteration of Model =============== -=============== Create and Train the Model =============== -=============== End of training =============== - -The model is saved to: C:\Tutorial\SentimentAnalysis\bin\Debug\netcoreapp2.1\Data\Model.zip +=============== Prediction Test of loaded model with a multiple samples =============== -=============== Prediction Test of loaded model with a multiple sample =============== +Sentiment: This was a horrible meal | Prediction: Negative | Probability: 0.1391833 +Sentiment: I love this spaghetti. | Prediction: Positive | Probability: 0.9819039 +=============== End of predictions =============== -Sentiment: This is a very rude movie | Prediction: Toxic | Probability: 0.4585565 -Sentiment: I love this article. | Prediction: Not Toxic | Probability: 0.09454837 +=============== End of process =============== +Press any key to continue . . . ```