[FEA] Create NDS-H benchmark for performance analysis

I would like to add another benchmark to the repository to support additional workloads for comparison.  The TPC-H benchmark is used by different partners for comparison so we can enable the execution of a TPC-H similar workload benchmark.  The requirements are similar to what we have for NDS:

Data generation
- [x] P0: Support generation of raw data at various scale factors
- [x] P0: Support conversion of raw data to Parquet
- [ ] P1: Support conversion of raw data to ORC
- [ ] P1: Support conversion of raw data to CSV

Query generation
- [x] P0: Support generation of queries at various scale factors

Power run execution
- [x] P0: Support execution of full query set given a specified input path
- [x] P1: Support execution of individual query given a specific query and input path

We can add additional requirements once the initial NH scripts are set up to more closely match how we execute NDS.

Relevant links of other repos that execute TPC-H workloads:
- https://github.com/sql-benchmarks/tpctools
- https://github.com/sql-benchmarks/sqlbench-h
- https://github.com/sql-benchmarks/sqlbench-runners/tree/main/spark

Disclaimers for TPC-H:
- TPC-H is Copyright © 1993-2024 Transaction Processing Performance Council. The full TPC-H specification in PDF format can be found [here](https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf)
- TPC, TPC Benchmark, and TPC-H are trademarks of the Transaction Processing Performance Council.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Create NDS-H benchmark for performance analysis #182

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Create NDS-H benchmark for performance analysis #182

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions