Open
Description
Describe the bug
Im trying to register a listing table using a certain glob string:
s3://somebucket/somepath/*44fad0765ac6-00001.parquet
and receive the error
Error: Custom { kind: Other, error:
"Object Store error: Object at location
somepath/%2A44fad0765ac6-00001.parquet not found:
response error \"No Body\", after 0 retries: HTTP status client error (404 Not Found)
for url (https://s3.eu-west-1.amazonaws.com/somebucket/somepath/%252A44fad0765ac6-00001.parquet)" }
however, I know the file is there because the action succeeds when I just using the full file name
s3://somebucket/somepath/00000-2367-2918fbc9-ea04-4927-a669-44fad0765ac6-00001.parquet
from the error, I have a feeling it's related to how the url and gloc char is being encoded, looks like its being encoded twice for some reason - from * -> %2A -> %252A
To Reproduce
let glob_str = "*44fad0765ac6-00001.parquet";
let table_files = String::from("somepath");
let table_files_path = format!("s3://{bucket_name}/{table_files}/{glob_str}");
let table_files_url = ListingTableUrl::parse(&table_files_path)?;
let schema_file_path = format!("s3://{bucket_name}/{table_files}/00000-2367-2918fbc9-ea04-4927-a669-44fad0765ac6-00001.parquet");
let schema_file_url = ListingTableUrl::parse(&schema_file_path)?;
println!("table_files_url: {}", table_files_path);
let file_format = ParquetFormat::default()
.with_enable_pruning(Some(true));
let listing_options = ListingOptions::new(Arc::new(file_format))
.with_file_extension(".parquet");
let schema = listing_options.infer_schema(&ctx.state(), &schema_file_url).await?;
let table_config =
ListingTableConfig::new(table_files_url) // the path of the files
.with_listing_options(listing_options)
.with_schema(schema);
let table_provider = Arc::new(ListingTable::try_new(table_config).unwrap());
ctx.register_table(TableReference::from("users"),table_provider).unwrap();
let df = ctx.sql("SELECT count(*) FROM users").await?;
df.show().await?;
Expected behavior
the files that correspond to the glob string will be registered as a table.
In the code above if I just switch between the ListingTableUrl with the globe string to that of the schema the code executes successfully (so no permissions issue)/
Additional context
My Cargo.Toml
aws-config = "0.55.3"
aws-sdk-s3 = "0.28.0"
object_store = { version = "0.6.1", features = ["aws"] }
url = "2.4.0"
async-graphql = "6.0.4"
async-graphql-actix-web = { version = "6.0.4"}
actix-web = { version = "4.3.1" }
datafusion = "28.0.0"
tokio = { version = "1.32.0", features = ["full"] }