Skip to content

Move basic SQL query examples to user guide #11217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 2, 2024

Which issue does this PR close?

Fixes #11210
Part of #11172 and #1813

Rationale for this change

Hopefully this fixes the CI failure we were seeing in #11210 while also improving the documentation

There are several places in the existing documentation for running SQL via SessionContext

  1. https://datafusion.apache.org/user-guide/example-usage.html#run-a-sql-query-against-data-stored-in-a-csv
  2. https://docs.rs/datafusion/latest/datafusion/index.html#sql

Thus we don't need another one in the examples directory

What changes are included in this PR?

  1. Add a SQL section in the library user guide with the basic examples (thanks @tshauck for starting this)
  2. Consolidate parquet_sql, avro_sql, and csv_sql examples into the docs
  3. Make sure the examples run as part of the doctests (cargo doc ...)

Are these changes tested?

Yes, they are run via doctests

Are there any user-facing changes?

New docs

@alamb alamb added the documentation Improvements or additions to documentation label Jul 2, 2024
@github-actions github-actions bot added core Core DataFusion crate and removed documentation Improvements or additions to documentation labels Jul 2, 2024
@@ -592,3 +592,9 @@ doc_comment::doctest!(
"../../../docs/source/user-guide/example-usage.md",
user_guid_example_tests
);

#[cfg(doctest)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means these tests are run as part of the doc tests

$ cargo test --doc --features avro,json -- library_user_guide

...
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.16s
   Doc-tests datafusion

running 5 tests
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 764) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 718) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 673) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 775) ... ok
test datafusion/core/src/lib.rs - library_user_guide_example_usage (line 641) ... ok
...

@alamb alamb force-pushed the alamb/sql_user_guide branch from 07bd95b to f21b576 Compare July 2, 2024 15:22
- [`parquet_sql_multiple_files.rs`](examples/parquet_sql_multiple_files.rs): Build and run a query plan from a SQL statement against multiple local Parquet files
- [`parquet_exec_visitor.rs`](examples/parquet_exec_visitor.rs): Extract statistics by visiting an ExecutionPlan after execution
- [`parse_sql_expr.rs`](examples/parse_sql_expr.rs): Parse SQL text into Datafusion `Expr`.
- [`plan_to_sql.rs`](examples/plan_to_sql.rs): Generate SQL from Datafusion `Expr` and `LogicalPlan`
- [`pruning.rs`](examples/parquet_sql.rs): Use pruning to rule out files based on statistics
- [`pruning.rs`](examples/pruning.rs): Use pruning to rule out files based on statistics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by cleanup

@@ -47,10 +47,8 @@ cargo run --example csv_sql
- [`advanced_udf.rs`](examples/advanced_udf.rs): Define and invoke a more complicated User Defined Scalar Function (UDF)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed examples were just inlined into the guide

It is also possible to read multiple files as a single table. This is done
with the ListingTableProvider which takes a list of file paths and reads them
as a single table, matching schemas as appropriate

Coming Soon
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be cool to do but this PR is already large enought

@alamb alamb marked this pull request as ready for review July 2, 2024 15:46
Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@alamb
Copy link
Contributor Author

alamb commented Jul 3, 2024

Thank you for the review @jonahgao

@alamb alamb merged commit 03848c5 into apache:main Jul 3, 2024
24 checks passed
comphead pushed a commit to comphead/arrow-datafusion that referenced this pull request Jul 8, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datafusion-examples CI run is failing: final link failed: No space left on device
2 participants