Skip to content

Conversation

3AceShowHand
Copy link
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Sep 12, 2025
Copy link

ti-chi-bot bot commented Sep 12, 2025

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Copy link

ti-chi-bot bot commented Sep 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 3aceshowhand for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 12, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @3AceShowHand, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces foundational changes to support keyspace awareness within TiCDC, particularly focusing on how table spans are generated and how raw KV entries are processed. The primary goal appears to be to debug or prepare for a more robust keyspace integration, as indicated by the [DNM] tag and the extensive logging additions. These changes ensure that key ranges and event decoding correctly account for keyspace prefixes, which is crucial for multi-tenancy or keyspace-enabled TiKV clusters.

Highlights

  • Keyspace Integration: Introduced functionality to retrieve keyspace IDs from PD and incorporate them into table span calculations, enabling keyspace-aware operations across various components.
  • Enhanced Debugging Logs: Significantly improved logging by adding detailed information, including startKey, endKey, and key (hex-encoded using spanz.HexKey), to various event handling and request processing logs. This aids in debugging keyspace-related issues.
  • Key Prefix Handling: Added a RemoveKeyspacePrefix utility function and integrated it into the event mounter to correctly decode raw KV entries by stripping keyspace prefixes, ensuring proper data interpretation.
  • PD Client Extension: Extended the PDAPIClient interface and its implementation to include a GetKeyspaceID method, allowing components to query PD for keyspace identifiers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces keyspace support across various components and adds extensive debug logging, likely to facilitate troubleshooting. The core logic for keyspace-aware span and range generation seems correct. However, there are several areas for improvement. A recurring issue is the use of context.Background() for network calls to PD, which should be replaced with proper context propagation to improve robustness and testability. There are also minor inconsistencies in logging and a potential loss of important information in a fatal error log. My review provides specific suggestions to address these points.

Comment on lines 244 to +247
log.Fatal("The CommitTs must be greater than the resolvedTs",
zap.String("EventType", "COMMITTED"),
zap.Uint64("CommitTs", entry.CommitTs),
zap.Uint64("resolvedTs", resolvedTs),
zap.Uint64("regionID", regionID))
zap.Uint64("resolvedTs", resolvedTs))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The regionID was removed from this log.Fatal call. For a fatal error, having as much context as possible is crucial for debugging. The regionID is available in this scope and should be included in the log message.

                log.Fatal("The CommitTs must be greater than the resolvedTs",
                    zap.String("EventType", "COMMITTED"),
                    zap.Uint64("CommitTs", entry.CommitTs),
                    zap.Uint64("resolvedTs", resolvedTs),
                    zap.Uint64("regionID", regionID))

log.Info("commit entry cached since not initialized",
zap.Uint64("regionID", regionID),
zap.Int64("tableID", span.span.TableID),
zap.String("key", hex.EncodeToString(entry.GetKey())))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistent use of key encoding for logging. Here hex.EncodeToString is used, while in other parts of this file (e.g., line 263), spanz.HexKey is used. To maintain consistency, it's better to use spanz.HexKey throughout.

                        zap.String("key", spanz.HexKey(entry.GetKey())))

log.Info("commit entry is staled",
zap.Uint64("regionID", regionID),
zap.Int64("tableID", span.span.TableID),
zap.String("key", hex.EncodeToString(entry.GetKey())),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's an inconsistent use of key encoding for logging. Here hex.EncodeToString is used, while in other parts of this file (e.g., line 263), spanz.HexKey is used. To maintain consistency, it's better to use spanz.HexKey throughout.

                    zap.String("key", spanz.HexKey(entry.GetKey())),


func getAllDDLSpan() []heartbeatpb.TableSpan {
pdClient := appctx.GetService[pdutil.PDAPIClient](appctx.PDAPIClient)
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using context.Background() for a network call to PD is not ideal. It makes the operation uncancelable and harder to manage. It would be better to pass a context to getAllDDLSpan. The caller, ddlJobFetcher.run, has access to p.ctx, which can be passed down.

// Handle existing table spans
span := common.TableIDToComparableSpan(table.TableID)
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient)
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using context.Background() for a network call to PD can be problematic. While this is part of the bootstrap process, it's still better to propagate a context to allow for cancellation and timeouts, especially if the PD is slow or unavailable. Consider updating FinishBootstrap and its callers to accept and pass down a context.Context.


span := common.TableIDToComparableSpan(tableID)
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient)
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using context.Background() for a network call is discouraged. Since splitTableByRegionCount is used in an HTTP API for testing, the request's context should be passed down and used here to handle cancellations and timeouts gracefully.

covered: atomic.Bool{},
}
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient)
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The constructor NewTableSpanRangeChecker now performs a network call to PD using context.Background(). Constructors should ideally be free of side effects like network requests. It would be better to pass the keyspaceID as an argument to the constructor. If a network call is unavoidable, it should at least accept a context.Context to allow for proper cancellation and timeout handling.

r.groupID = replica.DefaultGroupID
span := heartbeatpb.TableSpan{TableID: r.Span.TableID, StartKey: r.Span.StartKey, EndKey: r.Span.EndKey}
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient)
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The initGroupID method, which is called from constructors, now makes a network call to PD using context.Background(). This is not ideal as it introduces side effects into the object creation process. Consider passing the keyspaceID as an argument to the constructors (NewSpanReplication, NewWorkingSpanReplication) to avoid this network call during initialization.

}
span := common.TableIDToComparableSpan(table.TableID)
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient)
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using context.Background() for a network call to PD is not recommended. The AddNewTable method should be updated to accept a context.Context and pass it down to GetKeyspaceID. This will make the operation more robust by allowing for cancellation and timeouts.

Copy link

ti-chi-bot bot commented Sep 12, 2025

@3AceShowHand: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-error-log-review 02a7ebf link true /test pull-error-log-review
pull-cdc-kafka-integration-light 02a7ebf link true /test pull-cdc-kafka-integration-light
pull-cdc-pulsar-integration-light 02a7ebf link false /test pull-cdc-pulsar-integration-light
pull-cdc-storage-integration-light 02a7ebf link true /test pull-cdc-storage-integration-light
pull-cdc-kafka-integration-heavy 02a7ebf link true /test pull-cdc-kafka-integration-heavy
pull-cdc-mysql-integration-light 02a7ebf link true /test pull-cdc-mysql-integration-light
pull-cdc-storage-integration-heavy 02a7ebf link true /test pull-cdc-storage-integration-heavy
pull-cdc-mysql-integration-heavy 02a7ebf link true /test pull-cdc-mysql-integration-heavy

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@asddongmen asddongmen marked this pull request as draft September 18, 2025 01:58
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 18, 2025
@asddongmen
Copy link
Collaborator

stale PR.

@asddongmen asddongmen closed this Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants