Skip to content

Support creating tables from cudf dataframes #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 20, 2021

Conversation

ayushdg
Copy link
Collaborator

@ayushdg ayushdg commented Aug 19, 2021

This PR updates the PandasInputPlugin class to also accept (optionally) cudf dataframes, to support operations like:

df = cudf.DataFrame({'a': [1,2,3]})

context.create_table("sql_table", df)

Opted to update the existing Pandas class since dd.from_pandas syntax works on cudf dataframes as well. Happy to split the implementation as well if that's preferred.

@nils-braun
Copy link
Collaborator

Thanks @ayushdg! Looks good to me. I do not have a strong opinion on splitting (except if there are special parameters to dd.from_pandas, but I do not see any reasons for this). However, I would opt to rename the class (and file) to something like PandasLikeInputClass, if this is fine for you.

@ayushdg
Copy link
Collaborator Author

ayushdg commented Aug 19, 2021

Thanks for the suggestion @nils-braun. I've updated the class and module names to suggest that it handles pandas like dataframes.

@nils-braun
Copy link
Collaborator

Great! This can be merged.
One general question (not particularly on this PR): what is the best strategy to test the cudf integration into dask-sql in Github actions?

@codecov-commenter

This comment has been minimized.

@nils-braun nils-braun merged commit 2a6cf15 into dask-contrib:main Aug 20, 2021
@ayushdg
Copy link
Collaborator Author

ayushdg commented Aug 24, 2021

Great! This can be merged.
One general question (not particularly on this PR): what is the best strategy to test the cudf integration into dask-sql in Github actions?

GpuCI is one option that could be used to test integration with rapids libraries. It's what the dask repository uses as well (dask/community#138).
I don't think it uses GitHub actions specifically but is enabled via a webhook to GitHub. If that sounds like an option that would work for dask-sql, we could open up a broader issue for discussion where others could chime in as well

@nils-braun
Copy link
Collaborator

Hi @ayushdg - that is some good news. I would be perfectly fine with this solution - I will open an issue to discuss this further.

@charlesbluca charlesbluca mentioned this pull request Oct 1, 2021
@ayushdg ayushdg deleted the fea-cudf-input branch April 11, 2022 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants