You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The .toArray() javascript method of vectors includes a shortcut to return the underlying typed array; but this doesn't respect null values, and so can return the wrong number.
Solution: Eliminate the fast method, always return Javascript arrays. It might be better to keep the old method in cases where there are guaranteed no nulls.
Dominik Moritz / @domoritz:
I think it's nice to have the fast toArray. We do have toJSON, which always takes the slow path. However, I agree that it could be confusing to accidentally miss nulls. And always using toJSON isn't good since it would always take the slow pass even when we have no nulls.
Btw, I found that NaN is a valid value in typed arrays even though null is not.
I think two three best solutions are
Leave this as they are and ask people to use toJSON if they want to guarantee to have nulls. We could add a way for users to get the null mask (there already is isValid(index) but you need to ask for each value). Or we could have a way for people to define null values (e.g. as -1 or NaN)
Take the slow path when the data type for the vector is nullable. Unfortunately, a user would have no easy option to take the fast pass anymore. Maybe we could have an option to force the fast pass?
Ben Schmidt / @bmschmidt:
Maybe it could be as simple as just checking if array.nullCount > 0 before returning the typed array? That's a lot less common than a column being nullable. If so, happy to take another pass at this. If there are actually nulls there, it seems pretty dicey to return something with a known wrong value. I suppose an escape hatch/exposing null mask couldn't hurt.
(I think I found this problem when accessing the index array of a Uint16 dictionary. In that context null was getting translated from 'nothing' to 'the first key in whatever order they were put in,'
Also... NaN does work on Float32 arrays, but not on Ints.
Not sure there's a correct behavior on the below, but '0' is weird.
arrow.vectorFromArray([1, 2, 3, NaN, 4], new arrow.Int8()).get(3)
The .toArray() javascript method of vectors includes a shortcut to return the underlying typed array; but this doesn't respect null values, and so can return the wrong number.
Solution: Eliminate the fast method, always return Javascript arrays. It might be better to keep the old method in cases where there are guaranteed no nulls.
Reporter: Ben Schmidt / @bmschmidt
Externally tracked issue: apache/arrow#8385
PRs and other links:
Note: This issue was originally created as ARROW-10221. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: