Skip to content

Understanding multi-column vectors #9962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Jan 12, 2019 — with docs.microsoft.com · 4 comments
Closed

Understanding multi-column vectors #9962

ghost opened this issue Jan 12, 2019 — with docs.microsoft.com · 4 comments

Comments

Copy link

ghost commented Jan 12, 2019

Is there a difference between using a single vector to represent multiple columns opposed to having each column be separately defined?

For example, as the documentation shows:
new TextLoader.Column("FeatureVector", DataKind.R4, 0, 9),

Would this be different to something like:

new TextLoaderColumn("FeatureColumn0", DataKind.R4, 0),
...
new TextLoader.Column("FeatureColumn9", DataKind.R4, 9),

Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@JRAlexander
Copy link
Contributor

@asthana86, @CESARDELATORRE, @sfilipi - Any thoughts?

@ghost
Copy link
Author

ghost commented Jan 17, 2019

If it makes more sense, I can open an issue on the actual machine learning repo, or ask a question on StackOverflow.

I feel the documentation should express this information ultimately, however.

@sfilipi
Copy link
Member

sfilipi commented Apr 19, 2019

Hi @Fedoranimus, thanks for the question. In a typical pipeline, where you have the data transformation anda trainer at the end of it, If the data type is all the same, and especially if it is all floats(System.Single), and you want to use all of them as Features, you either should load them all together as a vector, or you'd have to concatenate them into one before supplying them to a trainer (algorithm).

If you want to do different things with your columns, like normalize some of them, but not others, or convert them to different formats, you can load them separately, perform the operations, than you'd still have to concatenate them in a single float vector, before passing them to the algorithm.

@JRAlexander
Copy link
Contributor

Thanks, @sfilipi!! Closing, as I believe this is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants