-
Notifications
You must be signed in to change notification settings - Fork 26
Make integration tests compatible with the next-gen TiDB cluster #2337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary of ChangesHello @tenfyzhong, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the integration testing capabilities by introducing full support for the next-generation TiDB cluster architecture. This allows TiCDC to be validated against the new components and object storage mechanisms, ensuring compatibility and functionality in modern deployments. The changes primarily involve refactoring existing test utilities and adding new scripts to manage the lifecycle of next-gen TiDB clusters within the testing environment. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
/test all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant infrastructure to enable integration testing for the next-gen TiDB architecture, which is a valuable addition. The refactoring of the cluster startup scripts and the introduction of new health checks are well-structured. However, the primary new script, start_tidb_cluster_nextgen
, contains several critical bugs related to shell syntax, incorrect command arguments, and faulty variable handling that will prevent the test environment from starting correctly. I have provided detailed comments and suggestions to address these critical issues. Additionally, there are opportunities to improve code quality by refactoring duplicated code in the new health check scripts and correcting syntax in environment variable setup. Addressing these points will ensure the new testing infrastructure is robust and maintainable.
…n tests - Introduced `check_pd_health` and `check_tidb_health` scripts for robust component readiness checks. - Refactored `start_tidb_cluster` to dynamically select between classic and next-gen cluster startup based on the `NEXT_GEN` environment variable. - Moved existing cluster startup logic to `start_tidb_cluster_classic`. - Added `start_tidb_cluster_nextgen` to provision a next-gen TiDB cluster, which includes: - Starting MinIO as an S3-compatible object storage for DFS. - Configuring and launching multiple upstream PD instances. - Deploying dedicated TiDB instances for the `SYSTEM` keyspace and user keyspaces. - Configuring TiKV with DFS settings. - Updated `test_prepare` to export all cluster-related environment variables and include new variables for MinIO and next-gen TiDB components. Signed-off-by: tenfyzhong <[email protected]>
- Add `tidb-server-nextgen`, `tikv-server-nextgen`, `pd-server-nextgen` to the process kill list to ensure all TiDB cluster components are stopped during test cleanup. - Add `minio` to the process kill list to stop the MinIO service if running. - Kill processes on `UP_TIDB_SYSTEM_PORT` and `UP_TIDB_SYSTEM_STATUS` to clean up any lingering port bindings. Signed-off-by: tenfyzhong <[email protected]>
- Source `_utils/test_prepare` script to handle common test setup. - Set `CDC_BINARY` environment variable to `cdc.test` to ensure tests use the correct binary. Signed-off-by: tenfyzhong <[email protected]>
- Standardize indentation from spaces to tabs for consistency. Signed-off-by: tenfyzhong <[email protected]>
- Ensure `UP_TIDB_SYSTEM_*` variables default to their specific values, not general `UP_TIDB_*` values. - Update `MINIO_*` environment variable defaults to use standard `:-` bash syntax. - Add `KEYSPACE_NAME` environment variable with a default value. Signed-off-by: tenfyzhong <[email protected]>
… improve configuration - Migrate from `WORK_DIR` to `OUT_DIR` for all output paths. - Remove `--keyspace-name` CLI option; it's now expected as an environment variable. - Introduce new CLI options for component configurations (`--tidb-config`, `--pd-config`, `--tikv-config`) and retry times (`--retry`). - Remove PID tracking by deleting the `env` file generation. - Add explicit `mkdir -p` calls for all component data and log directories. - Introduce a second upstream keyspace TiDB instance for testing. - Adjust TiDB configuration files to include `max-server-connections` under `[instance]`. - Change upstream keyspace TiDB `--path` to connect directly to PD. - Ensure system TiDB health before starting keyspace TiDB. Signed-off-by: tenfyzhong <[email protected]>
- Provides a utility script to conditionally add `-k KEYSPACE_NAME` to `cdc cli changefeed` commands. - This allows integration tests to adapt CLI arguments based on the `NEXT_GEN` environment variable. Signed-off-by: tenfyzhong <[email protected]>
- Replaced direct `cdc cli changefeed create` call with `cdc_cli_changefeed` helper function. - Made the `run.sh` script executable. Signed-off-by: tenfyzhong <[email protected]>
- Changed indentation from 4 spaces to tabs for consistency within the script. Signed-off-by: tenfyzhong <[email protected]>
- Replace direct `cdc cli changefeed` calls with the `cdc_cli_changefeed` helper function. - Standardize command invocation across integration tests for improved consistency and maintainability. - Update file permissions to ensure test scripts are executable. Signed-off-by: tenfyzhong <[email protected]>
6c5b5b2
to
059b140
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant infrastructure changes to support integration testing for the next-generation TiDB cluster architecture. The refactoring of cluster startup/teardown scripts and the addition of new health checks are well-structured. However, I've identified a critical bug in the start_tidb_cluster_nextgen
script where an incorrect configuration file is used, which would likely break the test setup. Additionally, there's a high-severity argument handling bug in the check_tidb_health
script. I've also pointed out several instances of code duplication that could be refactored to improve maintainability and a minor code cleanup suggestion. Overall, the changes are a great step forward, but these issues should be addressed.
*) SINK_URI="mysql://normal:[email protected]:3306/" ;; | ||
esac | ||
cdc cli changefeed create --sink-uri="$SINK_URI" | ||
# cdc_cli_changefeed create --sink-uri="$SINK_URI" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/test all |
- The `check_tidb_health` script no longer accepts or processes a secondary TiDB port. - Calls to `check_tidb_health` were updated to reflect the simplified signature, now performing separate checks for multiple ports if needed. - Fixed a bug where `tidb-server-nextgen` for the secondary upstream TiDB instance was incorrectly using the primary instance's config file. - Removed a redundant commented-out line in `autorandom/run.sh`. Signed-off-by: tenfyzhong <[email protected]>
… tests - This Dockerfile enables running next-gen integration tests in a consistent Docker environment. - It utilizes a multi-stage build to: - Download Go 1.23.0 and necessary third-party binaries. - Build the `cdc` binary for integration testing. - The base image is `rockylinux:8`, requiring a switch from `yum` to `dnf` for package management. - Package installations were adjusted to: - Remove `musl-dev` as it's not available on Rocky 8. - Use the default `mysql` client from Rocky 8 repositories instead of `mysql 5.7`. Signed-off-by: tenfyzhong <[email protected]>
- Upgrades the base image for the local integration test Dockerfile from Rocky Linux 8 to 9.3. - Updates package installations for Rocky Linux 9: - Adds 'dnf update -y' for package refresh. - Includes 'nmap-ncat' as a new dependency. - Removes 'curl' as it's not directly used in the main stage's installations. - Adapts to the default MySQL client available on Rocky Linux 9. - Comments out several build steps related to downloading and building integration test binaries. Signed-off-by: tenfyzhong <[email protected]>
- The MinIO console is not required for integration tests. - Simplifies the MinIO startup command. - Avoids binding an unnecessary port. Signed-off-by: tenfyzhong <[email protected]>
…tion tests - The new function name is more descriptive, specifically for `cdc cli changefeed` commands. - Improves consistency and readability across test scripts. Signed-off-by: tenfyzhong <[email protected]>
- Replaced direct `cdc cli changefeed` calls with `run_cdc_cli changefeed`. - This ensures consistency in how `cdc cli` commands are invoked within integration tests. - Leverages a common utility function for CLI invocation. Signed-off-by: tenfyzhong <[email protected]>
- The keyspace used in API v2 integration tests can now be configured via the `KEYSPACE_NAME` environment variable. - This allows running tests against different keyspaces, useful for isolated testing environments. - If `KEYSPACE_NAME` is not set, it defaults to "default" to maintain existing behavior. Signed-off-by: tenfyzhong <[email protected]>
…ipts - Removed `set -x` and `export PS4` from `cdc_cli_changefeed` and `start_tidb_cluster_nextgen`. - This reduces log verbosity during integration test runs, making output cleaner. - These debugging flags are not needed for normal test execution. Signed-off-by: tenfyzhong <[email protected]>
- Remove `2>&1` from the `cdc_cli_changefeed create` command. - Ensure that only standard output is processed when extracting the changefeed ID. - This prevents potential parsing issues if stderr contains unexpected messages. Signed-off-by: tenfyzhong <[email protected]>
- Disabled `set -x` to reduce verbose output in integration test logs. - `set -x` prints every command executed, which can make logs noisy and harder to parse during normal test runs. - The corresponding `set +x` was removed as it is no longer needed. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-light |
- Replaced `cdc_cli_changefeed query` with `cdc.test cli query` in integration tests. - This change aligns the test utility with the current command structure for interacting with changefeeds. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-light |
This reverts commit 75ab86b.
- Replaced `run_cdc_cli changefeed` with `cdc.test cli changefeed` for consistency. - This change aligns with the newer testing utility for interacting with the CDC CLI. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-light |
1 similar comment
/test pull-cdc-mysql-integration-light |
- The `cdc.test` command is being used in the integration tests for changefeed operations. - This command is an alias or a wrapper that is not necessary for the actual execution of the `cdc cli changefeed` command. - Removing the `.test` suffix simplifies the command and ensures that the actual `cdc cli changefeed` command is being invoked. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-light |
- When the kernel type is classic, the default keyspace value will be returned. - This change ensures that the default keyspace is correctly handled for the classic kernel type in the API. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-light |
- The kill command in the `stop_minio` function was not handling cases where the process might have already exited. - Adding `|| true` ensures that the script continues to run even if the `kill` command fails (e.g., if the PID is no longer valid). - This prevents potential test failures due to non-zero exit codes from the `kill` command. Signed-off-by: tenfyzhong <[email protected]>
- Introduce the `NEXT_GEN` environment variable to control the next-generation features in integration tests. - Set a default value of `0` if `NEXT_GEN` is not explicitly provided. Signed-off-by: tenfyzhong <[email protected]>
/test all |
- Set `KEYSPACE_NAME` to 'keyspace1' if `NEXT_GEN` is 1. - Otherwise, set `KEYSPACE_NAME` to 'default'. Signed-off-by: tenfyzhong <[email protected]>
/test all |
- Previously, errors during storage closure were logged unconditionally. - This change ensures that errors are only logged if an actual error occurs during the `storage.Close()` operation. - This reduces unnecessary log noise when storage closure is successful. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-heavy |
- Removed `export PS4='+$(basename ${BASH_SOURCE}):${LINENO}:'` - Removed `set -x` These lines were added for debugging purposes and are no longer needed. The `set -x` command makes the script overly verbose, and the custom `PS4` setting is also not required for the test's functionality. Signed-off-by: tenfyzhong <[email protected]>
- Removed `set -x` and `export PS4` from multiple test scripts. - These flags were likely left in from previous debugging sessions and are not needed for normal test execution. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-heavy |
- Adjust assertion for the number of running capture servers after owner resignation. - The test now correctly expects only one capture server to remain running, as the second capture server was already running and did not exit. Signed-off-by: tenfyzhong <[email protected]>
/test all |
- The test `test_owner_retryable_error` was checking if there was only one capture server running. - This check is redundant because the test's setup ensures only one capture server is running. - Removing this check simplifies the test and avoids potential false negatives if the test environment changes unexpectedly. Signed-off-by: tenfyzhong <[email protected]>
/test all |
- Added debug output to `owner.sh` to help diagnose potential issues with owner key retrieval. - The `result` variable now captures the output of `cdc cli capture list`, and this output is printed before parsing for the owner ID. This will help in understanding what the `cdc cli capture list` command is returning in case of failures. Signed-off-by: tenfyzhong <[email protected]>
/test pull-cdc-mysql-integration-heavy |
- Capture the full output of `cdc cli capture list` before parsing. - This change ensures that the correct owner PID is retrieved even if there are multiple capture clients listed. - The previous implementation assumed the first capture client was always the owner, which is not always true in all test scenarios. Signed-off-by: tenfyzhong <[email protected]>
@tenfyzhong: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/test pull-cdc-mysql-integration-heavy |
What problem does this PR solve?
This PR introduces the necessary infrastructure to run integration tests against the TiDB "nextgen" architecture. Previously, our integration tests only supported the classic TiDB cluster setup. With the introduction of nextgen components and object storage (MinIO), a dedicated setup is required to validate TiCDC's compatibility and functionality with this new architecture.
Issue Number: close #2336 close #2561 close #1884
What is changed and how it works?
This PR refactors the TiDB cluster startup and teardown scripts to support both classic and nextgen TiDB architectures for integration testing.
New Health Check Utilities:
tests/integration_tests/_utils/check_pd_health
to robustly check the health of a PD server using its/pd/api/v1/health
endpoint.tests/integration_tests/_utils/check_tidb_health
to verify TiDB server readiness by executing a simpleSELECT
query via themysql
client.Refactored TiDB Cluster Startup:
tests/integration_tests/_utils/start_tidb_cluster
script is now a dispatcher, conditionally invoking eitherstart_tidb_cluster_classic
orstart_tidb_cluster_nextgen
based on theNEXT_GEN
environment variable.tests/integration_tests/_utils/start_tidb_cluster_classic
.tests/integration_tests/_utils/start_tidb_cluster_nextgen
has been added to provision a TiDB cluster with nextgen components. This includes:pd-server-nextgen
instances (supporting multiple upstream PDs).tikv-server-nextgen
instances, configured to use MinIO for storage.tidb-server-nextgen
instances: one for theSYSTEM
keyspace and another for a user-definedKEYSPACE_NAME
.pd-server
,tikv-server
, andtidb-server
for the downstream cluster.check_pd_health
andcheck_tidb_health
scripts to ensure component readiness.Updated TiDB Cluster Teardown:
tests/integration_tests/_utils/stop_tidb_cluster
has been updated to correctly terminatetidb-server-nextgen
,tikv-server-nextgen
,pd-server-nextgen
, andminio
processes, along with their respective ports.Enhanced Test Environment Preparation:
tests/integration_tests/_utils/test_prepare
now includes additional environment variables for:UP_TIDB_SYSTEM_PORT
,UP_TIDB_SYSTEM_STATUS
).MINIO_API_PORT
,MINIO_CONSOLE_PORT
,MINIO_MC_ALIAS
,MINIO_ROOT_USER
,MINIO_ROOT_PASSWORD
).Integration Test Runner Modifications:
tests/integration_tests/run.sh
now sources_utils/test_prepare
and setsexport TICDC_NEWARCH=true
andexport CDC_BINARY=cdc.test
to enable nextgen testing by default for relevant tests.Check List
Tests
Questions
Will it cause performance regression or break compatibility?
No, this PR primarily introduces new test infrastructure for a different TiDB architecture. It does not affect the performance or compatibility of TiCDC itself.
Do you need to update user documentation, design documentation or monitoring documentation?
No, these changes are internal to the integration test suite and do not require updates to user, design, or monitoring documentation.
Release note