Skip to content

feat: add SchemaProvider::table_type(table_name: &str) #16401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 18, 2025

Conversation

epgif
Copy link
Contributor

@epgif epgif commented Jun 13, 2025

InformationSchemaConfig::make_tables only needs the TableType not the whole TableProvider, and the former may require an expensive catalog operation to construct and the latter may not.

This allows avoiding SELECT * FROM information_schema.tables having to make 1 of those potentially expensive operations per table.

@github-actions github-actions bot added the catalog Related to the catalog crate label Jun 13, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature makes sense to me -- thank you @epgif

I wonder if there is some way we can write a test for it (mostly to prevent it from being accidentally broken/changed in the future)

Maybe an example or something 🤔

@epgif
Copy link
Contributor Author

epgif commented Jun 13, 2025

@alamb

I wonder if there is some way we can write a test for it (mostly to prevent it from being accidentally broken/changed in the future)

I looked around for some tests implementing this interface that I could extend, but didn't find any. I meant to ask earlier but forgot: is there anything like that? It would be a big help if I had some previous tests to take inspiration from.

Failing that, I'll take a stab at it anyway :)

Thanks!

/// returns `None`. Implementations for which this operation is cheap but [Self::table] is
/// expensive can override this to improve operations that only need the type, e.g.
/// `SELECT * FROM information_schema.tables`.
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> {
async fn table_type(&self, name: &str) -> DataFusionResult<Option<TableType>> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any DataFusionResult in the project, but I do see that E = DataFusionError is the default so I dropped that redundancy. Is that what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, messed with another repo. in this file it should be like datafusion_common::Result

in DF we have a type alias in common crate

pub type Result<T, E = DataFusionError> = result::Result<T, E>;

please reuse it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All references to Result in this file are already the one from datafusion_common, including what I've added.

Copy link
Contributor

@comphead comphead Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change also one in the test

    ) -> Result<Option<Arc<dyn TableProvider>>, DataFusionError> {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All references in the test are also already datafusion_common::Result, thanks to use super::*. However, as before, this one can be simplified as DataFusionError is already the default. The only reason the code I added is like that is it's copied and pasted from what is already here.

/// expensive can override this to improve operations that only need the type, e.g.
/// `SELECT * FROM information_schema.tables`.
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> {
self.table(name).await.map(|o| o.map(|t| t.table_type()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way to avoid nested map? perhaps flat_map or and_then ? would me more readable IMHO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to and_then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, clippy suggested what I had to begin with:

warning: using `Result.and_then(|x| Ok(y))`, which is more succinctly expressed as `map(|x| y)`
  --> datafusion/catalog/src/schema.rs:63:9
   |
63 | /         self.table(name)
64 | |             .await
65 | |             .and_then(|o| Ok(o.map(|t| t.table_type())))
   | |________________________________________________________^
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#bind_instead_of_map
   = note: `#[warn(clippy::bind_instead_of_map)]` on by default
help: try
   |
63 ~         self.table(name)
64 +             .await.map(|o| o.map(|t| t.table_type()))

So I changed it back.

@epgif epgif force-pushed the schema-provider-table-type branch from 50e9552 to 6961bcf Compare June 16, 2025 22:59
@epgif
Copy link
Contributor Author

epgif commented Jun 16, 2025

Please take another look @comphead -- thanks!

InformationSchemaConfig::make_tables only needs the TableType not the
whole TableProvider, and the former may require an expensive catalog
operation to construct and the latter may not.

This allows avoiding `SELECT * FROM information_schema.tables` having to
make 1 of those potentially expensive operations per table.
@epgif epgif force-pushed the schema-provider-table-type branch from 6961bcf to 9366395 Compare June 17, 2025 23:48
@epgif
Copy link
Contributor Author

epgif commented Jun 17, 2025

@alamb

I wonder if there is some way we can write a test for it (mostly to prevent it from being accidentally broken/changed in the future)

I looked around for some tests implementing this interface that I could extend, but didn't find any.

I tried panic-driven development and identified various unrelated test, none providing a clue.

The trick here is a test needs to check that one function is called rather than another. This seems like a classic low-level unit test.

So, that's what I've done.

It looks noisy if you're not used to this kind of test, but it does work, and does provide the expected failure when the information_schema.rs part is reverted (i.e. wrong function called).

Let me know what you think.

Thanks, @alamb!

@epgif epgif force-pushed the schema-provider-table-type branch from 9366395 to 50e2ecb Compare June 18, 2025 18:38
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @epgif for your first contribution, nicely done

@@ -0,0 +1,88 @@
use std::sync::Arc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is failing because this file doesn't have the ASF header

I think it is more common that these types of tests go in the same module (file) at the bottom in the DataFusion codebase anyways so I'll move them there to fix the CI as well

I'll move them there

@alamb alamb merged commit 4c3b847 into apache:main Jun 18, 2025
28 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 18, 2025

Thanks again @epgif and @comphead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants