-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CategoricalHashTransform should accept Floats and Doubles #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @ganik thanks for looking at this. One detail that I am not clear on here is what we shall do with 0 values. Previously when operating over keys or text, the default values for either (missing and empty respectively, for those two types) was to map to the missing key value. In this way the mapping was sparsity preserving. Yet, in the case of Other details include, do we want to be sure negative and positive zeros have the same hash? It seems like they ought to.
I agree with this, but just from a strategic perspective I might elect to not include handling to integers until the work of #673 is done. Work on floats and doubles though can be done immediately. |
Sure, I can postpone integers work until #673 done, will do only floats and doubles. If we don't map 0.0 to 0 then sparsity is not maintained. How about we map 0.0 to 0, missing values to 1 and the rest will have hash code starting from 2? |
Hi @ganik.
Interesting, but that still has the problem that values of 0.0 will be dropped from further processing (using, say, If we consider sparsity preserving a useful property (which I think we do) one possibility I'm not sure that I like is, I could imagine is the behavior could be optional, that NA and 0 inputs to NA keys for vector inputs. (Sort of like we have the Incidentally, it may be helpful to read about key types, both here for a formal definition and here for a more intuitive discussion, to understand what they are, and how they're used. |
@ganik could you please look into this issue. Thanks, |
Thanks |
Currently CategoricalHashTransform accepts only Text or Key types.
If Double is passed in for ex, below error message is shown:
Error: *** System.ArgumentOutOfRangeException: 'Source column 'workclass1' has invalid type ('R8'): Expected Text or Key item type.
It would be good if it can accept numbers: Ints and Floats.
The text was updated successfully, but these errors were encountered: