-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Q: How to add a missing column to DataView? #5967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I might misunderstand something but finding documentation on how to add a column seems difficult for me. I think adding columns is important with custom or static content, because there might be cases where we a) dont have all the data in a new dataset but we still want to reuse and evaluate an existing model or b) we want to evaluate the impact of some column. I believe this could be done with custom mapping but as I understand it requires first to explicitly declare classes. It makes working with CSV files a bit more slow. I am trying to automatically dynamically add columns with 0's whenever a model is requiring a column which is missing from the CSV file. |
Fix: use DataFrame from Microsoft.Data.Analysis Nuget package. It possibly could be included more in the docs as it was took a bit of effort to find it. Below my first hack, suggestion for improvement appreciated.
|
ToDataFrame() can be confusing because the default value is 100. It might be better to rename it to ToDataFramePreview, or put default value to -1 I reopened this issue for review. I should be able to continue tomorrow with this, but I think it is harder than it should be in future version (or I might be missing something). .ToDataFrame(-1) is also slow, probably why default is 100. Could it that DataFrame is slower than the default implementation of IDataView as well? My full simulations probably became about x5-x10 slower. It might be a bigger problem with big datasets which do not fit in RAM. Transforms for appending cols would be better if there is a ways to do it for dynamic feature names. All alternatives appreciated. This issue can be closed after review. |
@pgovind for notification You should be able to add a column similar with how you add a new key to dictionary
Or you can also add to df.Columns
|
Thanks. If textloader would return DataFrame, that would make it simpler. Now it returns IDataView which would not seem to expose methods or properties for adding columns. Long-term, a transform to append a column with static values would be nice (similar to drop columns)
|
One option might be to use a See https://docs.microsoft.com/dotnet/api/microsoft.ml.custommappingcatalog.custommapping and https://www.youtube.com/watch?v=TEnQp5qtopo for how to create one. |
In the |
@torronen I'm going to close this for now at is seems you have the answer you need. If you have more questions feel free to reopen as needed. |
I am reading to a DataView with TextLoader and column inference. My dataset is missing some columns (boolean, in my case) that are expected by the ML.NET model. I would like to add all 0's or false's for this column and then run the prediction.
I can only find how to add missing values based on another column. In this case, I dont have any other column to copy with appended missing values.
I can find DropColumn method, but not AddColumn method. Is there some way to add columns with constant values with transformers or other way to IDataView?
The text was updated successfully, but these errors were encountered: