-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add clickbench parquet based queries to sql_planner benchmark #13103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add clickbench parquet based queries to sql_planner benchmark #13103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THank you @Omega359 -- I think this is a great addition. I have two suggestions I think are worth considering but I don't think they are required ❤️
BTW here is a preview flamegraph (lots of cool things there)
|
||
let clickbench_ctx = register_clickbench_hits_table(); | ||
|
||
for (i, sql) in clickbench_queries.iter().enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the physical planing benchmark also includes the logical planning and most usecases include both logican and physical planning, I think the logical planning only benchmarks are largely redundant
however, I realize this PR just follows the existing pattern. Maybe we should remove all the "logical planning" benchmarks 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can comment them out and if anyone wants to see just those they can edit the code 💭
use std::sync::Arc; | ||
use test_utils::tpcds::tpcds_schemas; | ||
use test_utils::tpch::tpch_schemas; | ||
use test_utils::TableDef; | ||
use tokio::runtime::Runtime; | ||
|
||
const CLICKBENCH_DATA_PATH: &str = "../../benchmarks/data/hits_partitioned/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this assumes the script is run from datafusion/core
(what cargo does)
However, that meant when I tried to run the benchmark binary directly it failed like this:
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ target/release/deps/sql_planner-d64e21551189f776 --bench physical_plan_clickbench_q1
Gnuplot not found, using plotters backend
thread 'main' panicked at datafusion/core/benches/sql_planner.rs:121:38:
benchmarks/data/hits_partitioned/ could not be loaded. Please run 'benchmarks/bench.sh data clickbench_partitioned' prior to running this benchmark: Os { code: 2, kind: NotFound, message: "No such file or directory" }
Any chance you could make the script test both locations
../../benchmarks/data/hits_partitioned
benchmarks/data/hits_partitioned
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out most logical_plan tests & updated code to allow for running from either cargo or via target/release/deps/sql_planner-xyz
…ning from either cargo or via target/release/deps/sql_planner-xyz
…ith_clickbench # Conflicts: # datafusion/core/benches/sql_planner.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Omega359 -- this looks good to me
I tested it locally and it worked great
target/release/deps/sql_planner-d64e21551189f776 --bench physical_plan_clickbench_q37
@@ -235,9 +274,15 @@ fn criterion_benchmark(c: &mut Criterion) { | |||
"q16", "q17", "q18", "q19", "q20", "q21", "q22", | |||
]; | |||
|
|||
let benchmarks_path = if PathBuf::from(BENCHMARKS_PATH_1).exists() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
} | ||
}) | ||
}); | ||
// c.bench_function("logical_plan_tpch_all", |b| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could just delete it entirely?
|
||
let queries_file = | ||
File::open(format!("{benchmarks_path}queries/clickbench/queries.sql")).unwrap(); | ||
let extended_file = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused at first what click bench Q48 was (as there are only 42 queries) -- but this now makes sense.
physical_plan_clickbench_q48
time: [2.6437 ms 2.6674 ms 2.6943 ms]
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe
It would probably be less confusing if this was called physical_plan_clickbench_extended_q5
or whatever to align with the naming of suites
I think this is a nice improvement as is -- maybe we can keep improving things as follow on PRs thanks again @Omega359 |
Which issue does this PR close?
Closes #13098
This change uses the
benchmarks/data/hits_partitioned/
data which must be downloaded via the bench.sh script prior to the sql_planner benchmark running. If this is not reasonable we can either switch to the itty bitty clickbench_hits_10.parquet file or create some other clickbench files (perhaps in the testing folder)Rationale for this change
Extending the sql planning benchmark with addition tests that covers planning against an actual table (not memory based) and including sorting.
What changes are included in this PR?
sql planner benchmark only.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.