Skip to content

Displaying the records that have been loaded using TextLoader #2466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sdg002 opened this issue Feb 7, 2019 · 6 comments
Closed

Displaying the records that have been loaded using TextLoader #2466

sdg002 opened this issue Feb 7, 2019 · 6 comments

Comments

@sdg002
Copy link

sdg002 commented Feb 7, 2019

System information

  • OS version/distro: Windows 10
  • .NET Version (eg., dotnet --info): .NET 4.5

Issue

I was trying to get a better understanding of loading records from a flat file. All I wanted to do was to access the records that have been loaded sequentially and display them.

  • What did you do?
  • What happened?
  • What did you expect?
    The code in the while block below is my attempt. This worked well. However, I am wondering if this is the right way to iterate over a cursor? Is there anything simpler to get the individual column values? Feels a bit onerous.

Source code / logs

    [TestMethod]
    public void TestMethod1()
    {
        string datafile = @"Data\Dummy2.csv";
        string pathFull = System.IO.Path.Combine(Util.GetProjectDir2(), datafile);
        var argsLoader = new Microsoft.ML.Data.TextLoader.Arguments();
        try
        {
            argsLoader.HasHeader = true;
            argsLoader.Separators = new char[] { '|' };
            argsLoader.Column = new Microsoft.ML.Data.TextLoader.Column[]
            {
                new Microsoft.ML.Data.TextLoader.Column("id", Microsoft.ML.Data.DataKind.I4,0),
            new Microsoft.ML.Data.TextLoader.Column("ht", Microsoft.ML.Data.DataKind.R4,1),
            new Microsoft.ML.Data.TextLoader.Column("wt", Microsoft.ML.Data.DataKind.R4,2),
            new Microsoft.ML.Data.TextLoader.Column("overwt", Microsoft.ML.Data.DataKind.Bool,3)
            };
            var mlContext = new Microsoft.ML.MLContext();
            var loader = mlContext.Data.CreateTextLoader(argsLoader);
            Microsoft.Data.DataView.IDataView dataView = loader.Read(pathFull);
            var schema = dataView.Schema;
            Microsoft.Data.DataView.RowCursor cur = dataView.GetRowCursor(schema);
            while (cur.MoveNext())
            {
                System.Diagnostics.Trace.WriteLine($"got a row, position={cur.Position}");
                ///
                /// Column 0
                ///
                Microsoft.Data.DataView.ValueGetter<int> getter = cur.GetGetter<int>(0);
                int id = 0;
                getter.Invoke(ref id);
                ///
                /// Column 1
                ///
                Microsoft.Data.DataView.ValueGetter<float> getterWt = cur.GetGetter<float>(1);
                float wt = 0;
                getterWt.Invoke(ref wt);
                ///
                /// Column 3
                ///
                Microsoft.Data.DataView.ValueGetter<bool> getterIsOverWt = cur.GetGetter<bool>(3);
                bool isOverWt = false;
                getterIsOverWt.Invoke(ref isOverWt);

                System.Diagnostics.Trace.WriteLine($"id={id}    wt={wt}    isOverWt={isOverWt}");
            }

        }
        catch (Exception ex)
        {
            System.Diagnostics.Trace.WriteLine(ex.ToString());
        }
    }

/*
id|wt|ht|overwt
01|30.0|4.0|False
02|35.0|4.5|False
03|40.0|5.0|False
10|33.0|4.0|True
11|38.0|4.5|True
12|43.0|5.0|True

*/

Please paste or attach the code or logs or traces that would be helpful to diagnose the issue you are reporting.

@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Feb 8, 2019

Would this example satisfies you?
If you want map data in your dataview on specific class we provide
mlcontext.CreateEnumerable<YourType> method.

if you want to get values in one column we have:
dataView.GetColum<string[]>(mlcontext, "columnName")

@TomFinley
Copy link
Contributor

Also I'd point out that if you wanted just to get some text to see what was going on, you could also have exploited mlContext.Data.SaveAsText or suchlike. However, that is not why I am writing. You are not using IDataView in a great way and might benefit from knowing the better way. Consider this line.

Microsoft.Data.DataView.ValueGetter<int> getter = cur.GetGetter<int>(0);

That is done inside the while loop. Getting that delegate after you create the cursor and using it multiple times, but before you enter the loop, is optimal since creation of the "getter" is a fairly costly operation, but calling the getter itself is cheap. So the pattern is: create cursor, get getters, and iterate over the getter, and in that iteration call the getter to get values. (Note that the values into which you put values should likewise be allocated outside the loop to enable buffer sharing.) For more info on this and its motivations, please see here.

Not also that cursors are disposable. In this specific particular case of yours it is I think irrelevant, but if it were something that has to allocate some native memory that has to be cleaned up (e.g., TensorFlow transformer) failure to dispose might lead to memory leaks on the unmanaged heap.

@sdg002
Copy link
Author

sdg002 commented Feb 8, 2019

@TomFinley

Thank you. I have taken your suggestions

  • Getters once only
  • Disposing the cursor

And amended my block of code

        using (Microsoft.Data.DataView.RowCursor cur = dataView.GetRowCursor(schema))
        {
            Microsoft.Data.DataView.ValueGetter<int> getter = cur.GetGetter<int>(0);
            Microsoft.Data.DataView.ValueGetter<float> getterWt = cur.GetGetter<float>(1);
            Microsoft.Data.DataView.ValueGetter<bool> getterIsOverWt = cur.GetGetter<bool>(3);
            while (cur.MoveNext())
            {
                int id = 0;
                getter.Invoke(ref id);
                float wt = 0;
                getterWt.Invoke(ref wt);
                bool isOverWt = false;
                getterIsOverWt.Invoke(ref isOverWt);
                System.Diagnostics.Trace.WriteLine($"id={id}    wt={wt}    isOverWt={isOverWt}");
            }
        }

@sdg002
Copy link
Author

sdg002 commented Feb 8, 2019

Would this example satisfies you?
If you want map data in your dataview on specific class we provide
mlcontext.CreateEnumerable<YourType> method.

if you want to get values in one column we have:
dataView.GetColum<string[]>(mlcontext, "columnName")

Hi @Ivanidzo4ka ,

Thansk for replying. Works like a charm.

    [TestMethod]
    public void Iterate_StronglyTyped()
    {
        string pathFull = System.IO.Path.Combine(Util.GetProjectDir2(), _datafile);
        var mlContext = new Microsoft.ML.MLContext();
        Microsoft.Data.DataView.IDataView dataView = LoadDummy2(mlContext, pathFull);
        var schema = dataView.Schema;
        var someRows = mlContext.CreateEnumerable<entity.Dummy2>(dataView, false);
        foreach(var oRow in someRows)
        {
            System.Diagnostics.Trace.WriteLine($"id={oRow.Id}    wt={oRow.Weight}    isOverWt={oRow.IsOverWeight}");
        }
    }

I observed that when I had mapped the property Weight to a Microsoft.ML.Data.DataKind.R4 the code threw an exception. Worked fine when I changed to Microsoft.ML.Data.DataKind.R8.
It would be nice if Microsoft.ML.Data.DataKind has accompanying XML comments giving such hints.
E.g. System.Data.DbType

image

@sdg002 sdg002 closed this as completed Feb 8, 2019
@sdg002
Copy link
Author

sdg002 commented Feb 8, 2019

Would this example satisfies you?
If you want map data in your dataview on specific class we provide
mlcontext.CreateEnumerable<YourType> method.

if you want to get values in one column we have:
dataView.GetColum<string[]>(mlcontext, "columnName")

Hi @Ivanidzo4ka ,
Your link https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetCookBook.md
is exactly what I am after. This is really good.
Please - such valuable information should be surfaced on MSDN.

@sfilipi
Copy link
Member

sfilipi commented Feb 8, 2019

@sdg002 slowly getting there:)
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/inspect-intermediate-data-ml-net

@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants