-
Notifications
You must be signed in to change notification settings - Fork 1.9k
switch housing dataset to wine #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
switch housing dataset to wine #17
Conversation
Humble attempt to resolve #3 issue. Still not sure what to do with kc_housing since it's behind Kaggle authorization |
|
||
|
||
<ItemGroup> | ||
<TestFile Include="d" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Can you make the "Include" a better name?
<TestFile Include="winequality" Url="[https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv](https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv)"/>
@@ -738,6 +738,7 @@ public void EntryPointTextToKeyToText() | |||
'Name': 'Data.TextLoader', | |||
'Inputs': {{ | |||
'InputFile': '$file' | |||
{8} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this already existed before you change, but what do you think about using string interpolation here? I think it would make this so much more readable:
'Inputs': {{
'InputFile': '$file'
{string.IsNullOrWhiteSpace(loader) ? "" : string.Format(",'CustomSchema': '{0}'", loader)}
@@ -855,15 +857,16 @@ public void EntryPointEvaluateMultiClass() | |||
Assert.Equal(3, CountRows(loader)); | |||
} | |||
|
|||
[Fact(Skip = "Missing data set. See https://github.com/dotnet/machinelearning/issues/3")] | |||
[Fact()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) no need for the parens. You can remove ()
.
<TestFile Include="d" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"/> | ||
</ItemGroup> | ||
|
||
<Target Name="DownloadExternalTestFiles" Condition="'$(RunTests)'=='true'"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always do this, not just when RunTests
is on. That way I can still run tests from VS Test Explorer even if I just "build.cmd" from command line.
However, we should also have "Inputs and Outputs" hooked up, so that way the target is skipped if the files already exist. check out https://msdn.microsoft.com/en-us/library/ms171483.aspx for how to make incremental builds work. Let me know if you need help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance you can help me with that?
<TestFile Include="winequality-white.csv" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv">
<Result>$(MSBuildThisFileDirectory)test\data\external\winequality-white.csv</Result>
</TestFile>
<TestFile Include="winequality-red.csv" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv">
<Result>$(MSBuildThisFileDirectory)test\data\external\winequality-red.csv</Result>
</TestFile>
</ItemGroup>
<Target Name="DownloadExternalTestFiles" Inputs="@(TestFile)" Outputs="%(TestFile.Result)">
<Message Importance="High" Text="Downloading external test files... %(TestFile.Result)" />
<DownloadFilesFromUrl Items="@(TestFile)"
DestinationDir="test/data/external"
TreatErrorsAsWarnings="true"/>
</Target>
It keeps download files and i'm not sure how to debug it properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eerhardt Can you help me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way Inputs and Outputs work is based on file timestamps. You need an actual Input file, so MSBuild can compare its last write time to the output file's last write time.
Here when you specify:
Inputs="@(TestFile)"
It uses the TestFile
's ItemSpec (what is put into the Include
attribute). I'm assuming we don't have files named "winequality-white.csv", so MSBuild probably decides to always do the task, since it can't find the input file.
Another option is to check if the local file already exists, and if it does, then don't do the download.
Here's that approach we do in a different repo that is doing a similar thing (downloading files from the internet during the build, and caching them so they aren't downloaded for every build invocation):
The above will download any URL in the @(_DownloadAndExtractItem)
Item. And then where those items are defined:
The item is "Condition"d on whether the file exists or not:
<_DownloadAndExtractItem Include="CombinedSharedHostAndFrameworkArchive"
Condition="!Exists('$(CombinedSharedHostAndFrameworkArchive)')">
<Url>$(CoreSetupRootUrl)$(MicrosoftNETCoreAppPackageVersion)/$(CombinedFrameworkHostCompressedFileName)$(CoreSetupBlobAccessTokenParam)</Url>
<DownloadFileName>$(CombinedSharedHostAndFrameworkArchive)</DownloadFileName>
<ExtractDestination>$(SharedFrameworkPublishDirectory)</ExtractDestination>
</_DownloadAndExtractItem>
So if the CombinedSharedHostAndFrameworkArchive
file already exists, then this item isn't included. Thus, when all the files already exist locally, nothing is downloaded.
@@ -33,6 +33,7 @@ | |||
RestoreProjects; | |||
BuildNative; | |||
$(TraversalBuildDependsOn); | |||
DownloadExternalTestFiles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this thing download a file on every build or only if the file is not present already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to make it download file only once, but struggle for right now, see Eric comment and my reply.
|
||
<Target Name="DownloadExternalTestFiles" Condition="'$(RunTests)'=='true'"> | ||
<Message Importance="High" Text="Downloading external test files..." /> | ||
<DownloadFilesFromUrl Items="@(TestFile)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reasonable way to check the file hash? At some point the UCI site will be broken, and/or we will receive (perhaps) a partial file.
As we expand on the use of this, there'll be additional ways for it to fail.
I guess I delete my repository and recreate it again, and now PR lost track of origin source, so I need to create new PR #170 |
This commit replaces housing dataset to wine dataset which we download during build from external source