Skip to content

[JS] toArray() method ignores nulls on some types. #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asfimport opened this issue Oct 7, 2020 · 2 comments
Open

[JS] toArray() method ignores nulls on some types. #105

asfimport opened this issue Oct 7, 2020 · 2 comments
Labels
Type: bug Something isn't working

Comments

@asfimport
Copy link

The .toArray() javascript method of vectors includes a shortcut to return the underlying typed array; but this doesn't respect null values, and so can return the wrong number.

 


v = arrow.Vector.from(\{values: [1, 2, 3, 4, 5, null, 6],type: new arrow.Int32()})

v.toArray()[5] // Incorrectly returns '0'

v.get(5) // Correctly returns null

 

Solution: Eliminate the fast method, always return Javascript arrays. It might be better to keep the old method in cases where there are guaranteed no nulls.

Reporter: Ben Schmidt / @bmschmidt

Externally tracked issue: apache/arrow#8385

PRs and other links:

Note: This issue was originally created as ARROW-10221. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Dominik Moritz / @domoritz:
I think it's nice to have the fast toArray. We do have toJSON, which always takes the slow path. However, I agree that it could be confusing to accidentally miss nulls. And always using toJSON isn't good since it would always take the slow pass even when we have no nulls.

Btw, I found that NaN is a valid value in typed arrays even though null is not.

I think two three best solutions are

  • Leave this as they are and ask people to use toJSON if they want to guarantee to have nulls. We could add a way for users to get the null mask (there already is isValid(index) but you need to ask for each value). Or we could have a way for people to define null values (e.g. as -1 or NaN)

  • Take the slow path when the data type for the vector is nullable. Unfortunately, a user would have no easy option to take the fast pass anymore. Maybe we could have an option to force the fast pass?

    I'm not happy with either but

@asfimport
Copy link
Author

Ben Schmidt / @bmschmidt:
Maybe it could  be as simple as just checking if array.nullCount > 0 before returning the typed array? That's a lot less common than a column being nullable. If so, happy to take another pass at this. If there are actually nulls there, it seems pretty dicey to return something with a known wrong value. I suppose an escape hatch/exposing null mask couldn't hurt.

(I think I found this problem when accessing the index array of a Uint16 dictionary. In that context null was getting translated from 'nothing' to 'the first key in whatever order they were put in,' 

Also... NaN does work on Float32 arrays, but not on Ints.

Not sure there's a correct behavior on the below, but '0' is weird.


arrow.vectorFromArray([1, 2, 3, NaN, 4], new arrow.Int8()).get(3)

@assignUser assignUser transferred this issue from apache/arrow May 26, 2025
@assignUser assignUser added the Type: bug Something isn't working label May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants