Skip to content

Modified how data is saved to disk #4424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 1, 2019
Merged

Modified how data is saved to disk #4424

merged 8 commits into from
Nov 1, 2019

Conversation

bpstark
Copy link
Contributor

@bpstark bpstark commented Oct 31, 2019

pre-trained meta files are now stored in one location always, this
allows multiple runs to re-use the same meta file without having to
redownload.

Additionally added the ability to cleanup the temporary workspace used
to train the model. This should prevent issues of running out of disk
space when running multiple training session sequentially.

@bpstark bpstark requested a review from a team as a code owner October 31, 2019 18:19
options.CleanTmpWorkspace = false;
}

string resourcePath = Path.Combine(Path.GetTempPath(), ResourceDir);
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string resourcePath [](start = 12, length = 19)

make it a readonly property of the class. #Resolved

@@ -78,6 +78,7 @@ public sealed class ImageClassificationTrainer :
internal const string UserName = "Image Classification Trainer";
internal const string ShortName = "IMGCLSS";
internal const string Summary = "Trains a DNN model to classify images.";
internal const string ResourceDir = "MLNET_VISION";
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MLNET_VISION [](start = 45, length = 12)

MLNET #Resolved

/// Indicates if the temporary workspace should be deleted on cleanup
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Indicates if the temporary workspace should be deleted on cleanup", SortOrder = 15)]
public bool CleanTmpWorkspace = true;
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove. #Resolved

else
{
//Since this is not a tmp workspace, but one provided by the user, do not allow deletion.
options.CleanTmpWorkspace = false;
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a readonly private variable and set it to true indicating we need to cleanup workspace in the event user does not provide one. #Resolved

@codemzs
Copy link
Member

codemzs commented Oct 31, 2019

using System.Collections.Generic;

add/modify existing tests to check for below:

  1. meta files exist in temp dir/mlnet
  2. the file that contains number of examples also exists in the workspace directory after the run is completed (this is when user provides a workspace).
  3. we need to think how we can test the scenario when workspace is not passed by the user and we create a temp workspace in temp dir and we later clean it.... we need to verify the clean step but I understand getting hold of this temp directory is challenging. #Resolved

Refers to: src/Microsoft.ML.Vision/ImageClassificationTrainer.cs:6 in 9b1d5b1. [](commit_id = 9b1d5b1, deletion_comment = False)

@@ -1123,11 +1138,19 @@ private int GetNumSamples(string path)

trainSaver.save(_session, _checkpointPath);
UpdateTransferLearningModelOnDisk(_classCount);
CleanUpTmpWorkspace();
}
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline. #Resolved

@@ -1123,11 +1138,19 @@ private int GetNumSamples(string path)

trainSaver.save(_session, _checkpointPath);
UpdateTransferLearningModelOnDisk(_classCount);
CleanUpTmpWorkspace();
}
private void CleanUpTmpWorkspace()
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanUpTmpWorkspace() [](start = 21, length = 21)

TryCleanupTemporaryWorkspace #Resolved

private int _classCount;
private Graph Graph => _session.graph;

private static readonly string _resourcePath = Path.Combine(Path.GetTempPath(), "MLNET");
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a newline between this and the previous field? #Resolved

else
{
_cleanupWorkspace = false;
}
Copy link
Member

@codemzs codemzs Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, by default in C# boolean variables are initialized to false. #Resolved

@@ -630,8 +638,9 @@ private protected override ImageClassificationModelParameters TrainModelCore(Tra
_inputTensorName, _bottleneckTensor.name, trainSetBottleneckCachedValuesFilePath,
ImageClassificationMetrics.Dataset.Train, _options.MetricsCallback);

string sizeFile = Path.Combine(_options.WorkspacePath, "TrainingSetSize.txt");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) you may want to extract this into a helper method so it isn't duplicated where this file is, and what its name is.

{
if (_cleanupWorkspace && Directory.Exists(_options.WorkspacePath))
{
Directory.Delete(_options.WorkspacePath, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want a try-catch here? Or if this throws an exception, do you want it to kill the training process?

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Brian Stark added 6 commits October 31, 2019 15:16
pre-trained meta files are now stored in one location always, this
allows multiple runst to re-use the same meta file without having to
redownload.

Additionally added the ability to cleanup the temporary workspace used
to train the model. This should prevent issues of running out of disk
space when running multiple training session sequentially.

private void TryCleanupTemporaryWorkspace()
{
if (_cleanupWorkspace && Directory.Exists(_options.WorkspacePath))
Copy link
Contributor

@justinormont justinormont Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you hold the file pointers open, an elegant method is using the FileOptions.DeleteOnClose when you initially open the temporary files.

This then causes the OS (or filesystem when on macOS/Linux) to cleanup the files when they are closed, or if your program crashes or is killed by the user.

https://docs.microsoft.com/en-us/dotnet/api/system.io.fileoptions?view=netcore-3.0

@bpstark
Copy link
Contributor Author

bpstark commented Nov 1, 2019

Note this change will continue to fail until such time as we update the inception meta file in storage

This will fix #4426

Copy link
Member

@codemzs codemzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codemzs codemzs merged commit 7a8f063 into dotnet:master Nov 1, 2019
@bpstark bpstark deleted the cleanup branch November 1, 2019 22:03
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this pull request Nov 4, 2019
* Modified how data is saved to disk

pre-trained meta files are now stored in one location always, this
allows multiple runst to re-use the same meta file without having to
redownload.

Additionally added the ability to cleanup the temporary workspace used
to train the model. This should prevent issues of running out of disk
space when running multiple training session sequentially.

* fixed extra line error.

* changes based on comments.

* comment fixes.

* address comments.

* fixed error

* fix for when to download.

* Remove special handling of inception as we have fixed the meta file to not have extras that need to be downloaded.
frank-dong-ms-zz pushed a commit to frank-dong-ms-zz/machinelearning that referenced this pull request Nov 4, 2019
* Modified how data is saved to disk

pre-trained meta files are now stored in one location always, this
allows multiple runst to re-use the same meta file without having to
redownload.

Additionally added the ability to cleanup the temporary workspace used
to train the model. This should prevent issues of running out of disk
space when running multiple training session sequentially.

* fixed extra line error.

* changes based on comments.

* comment fixes.

* address comments.

* fixed error

* fix for when to download.

* Remove special handling of inception as we have fixed the meta file to not have extras that need to be downloaded.
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants