Skip to content

CI checks on benchmark histograms and associated benchmark fixups #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 78 commits into from
Apr 4, 2023

Conversation

bpkroth
Copy link
Collaborator

@bpkroth bpkroth commented Mar 23, 2023

As previously noted in #122 the Wikipedia benchmark has a bug that results in most transactions getting aborted.

The fix developed in #173 by @apavlo seems to mostly fix it (though there are still some duplicate key errors generated by the benchmark's AddWatchPage procedure).

This PR pulls in those changes plus some additional ones (below) and additionally adds some CI checks on the json histogram outputs so that any benchmark the generates errors more than 1% of the completed transactions should be considered a failure (previously a CI failure was reported only if java exited non-zero, which doesn't happen when the many transactions fail, only when an unexpected exception occurs).

The 1% value is just an initial threshold and can be adjusted both on a per database and a per benchmark basis.
We do this in several places (e.g. TATP benchmark) in order to keep the changes required for this PR somewhat smaller and leave it for future work to improve those benchmarks.

To address the Wikipedia failures in some databases, we expand on auto-increment column support to SqlServer, add an auto-increment DDL schema for hsqldb for unit test purposes.

For others, we introduce improved schemas and/or dialects to fix query errors.

Note: in the case of sqlite, which currently lacks support for things like native SLEEP or TABLOCK we leave a more simplified version of the query for now and mark it for future work to improve.

Copy link
Collaborator Author

@bpkroth bpkroth Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a schema file for the hsqldb for unit tests so that we:

  1. Get an auto increment column
  2. Use varchar instead of varbinary for columns to avoid invalid character cast exceptions

Note: this doesn't change the ddl-generic.sql file, in part because auto increment column syntax is not well standardized (or at least adhered to).
Up for discussion on whether these changes should go there instead since basically any new db system will currently fail to operate without those.

@bpkroth bpkroth changed the title Fixups for Wikipedia and CI checks on benchmark histograms CI checks on benchmark histograms and associated benchmark fixups Apr 2, 2023
@bpkroth bpkroth removed the in-progress This PR is in progress. label Apr 2, 2023
@bpkroth
Copy link
Collaborator Author

bpkroth commented Apr 2, 2023

@anjagruenheid @jcamachor if either of you have a moment, could you please give this a once over? Thanks!

@bpkroth bpkroth requested a review from timveil April 2, 2023 23:39
@bpkroth bpkroth requested a review from mbutrovich April 2, 2023 23:49
@bpkroth bpkroth merged commit 82c990d into cmu-db:main Apr 4, 2023
@bpkroth bpkroth deleted the wikipedia-may2022 branch December 11, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working testing Testing Infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wikipedia Title Generation is Broken aborted transactions in Wikipedia
2 participants