-
Notifications
You must be signed in to change notification settings - Fork 26
[DNM] Keyspace pdclient debug #2084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] Keyspace pdclient debug #2084
Conversation
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @3AceShowHand, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces foundational changes to support keyspace awareness within TiCDC, particularly focusing on how table spans are generated and how raw KV entries are processed. The primary goal appears to be to debug or prepare for a more robust keyspace integration, as indicated by the [DNM]
tag and the extensive logging additions. These changes ensure that key ranges and event decoding correctly account for keyspace prefixes, which is crucial for multi-tenancy or keyspace-enabled TiKV clusters.
Highlights
- Keyspace Integration: Introduced functionality to retrieve keyspace IDs from PD and incorporate them into table span calculations, enabling keyspace-aware operations across various components.
- Enhanced Debugging Logs: Significantly improved logging by adding detailed information, including
startKey
,endKey
, andkey
(hex-encoded usingspanz.HexKey
), to various event handling and request processing logs. This aids in debugging keyspace-related issues. - Key Prefix Handling: Added a
RemoveKeyspacePrefix
utility function and integrated it into the event mounter to correctly decode raw KV entries by stripping keyspace prefixes, ensuring proper data interpretation. - PD Client Extension: Extended the
PDAPIClient
interface and its implementation to include aGetKeyspaceID
method, allowing components to query PD for keyspace identifiers.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces keyspace support across various components and adds extensive debug logging, likely to facilitate troubleshooting. The core logic for keyspace-aware span and range generation seems correct. However, there are several areas for improvement. A recurring issue is the use of context.Background()
for network calls to PD, which should be replaced with proper context propagation to improve robustness and testability. There are also minor inconsistencies in logging and a potential loss of important information in a fatal error log. My review provides specific suggestions to address these points.
log.Fatal("The CommitTs must be greater than the resolvedTs", | ||
zap.String("EventType", "COMMITTED"), | ||
zap.Uint64("CommitTs", entry.CommitTs), | ||
zap.Uint64("resolvedTs", resolvedTs), | ||
zap.Uint64("regionID", regionID)) | ||
zap.Uint64("resolvedTs", resolvedTs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regionID
was removed from this log.Fatal
call. For a fatal error, having as much context as possible is crucial for debugging. The regionID
is available in this scope and should be included in the log message.
log.Fatal("The CommitTs must be greater than the resolvedTs",
zap.String("EventType", "COMMITTED"),
zap.Uint64("CommitTs", entry.CommitTs),
zap.Uint64("resolvedTs", resolvedTs),
zap.Uint64("regionID", regionID))
log.Info("commit entry cached since not initialized", | ||
zap.Uint64("regionID", regionID), | ||
zap.Int64("tableID", span.span.TableID), | ||
zap.String("key", hex.EncodeToString(entry.GetKey()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Info("commit entry is staled", | ||
zap.Uint64("regionID", regionID), | ||
zap.Int64("tableID", span.span.TableID), | ||
zap.String("key", hex.EncodeToString(entry.GetKey())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
func getAllDDLSpan() []heartbeatpb.TableSpan { | ||
pdClient := appctx.GetService[pdutil.PDAPIClient](appctx.PDAPIClient) | ||
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Handle existing table spans | ||
span := common.TableIDToComparableSpan(table.TableID) | ||
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient) | ||
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using context.Background()
for a network call to PD can be problematic. While this is part of the bootstrap process, it's still better to propagate a context to allow for cancellation and timeouts, especially if the PD is slow or unavailable. Consider updating FinishBootstrap
and its callers to accept and pass down a context.Context
.
|
||
span := common.TableIDToComparableSpan(tableID) | ||
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient) | ||
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
covered: atomic.Bool{}, | ||
} | ||
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient) | ||
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor NewTableSpanRangeChecker
now performs a network call to PD using context.Background()
. Constructors should ideally be free of side effects like network requests. It would be better to pass the keyspaceID
as an argument to the constructor. If a network call is unavoidable, it should at least accept a context.Context
to allow for proper cancellation and timeout handling.
r.groupID = replica.DefaultGroupID | ||
span := heartbeatpb.TableSpan{TableID: r.Span.TableID, StartKey: r.Span.StartKey, EndKey: r.Span.EndKey} | ||
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient) | ||
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initGroupID
method, which is called from constructors, now makes a network call to PD using context.Background()
. This is not ideal as it introduces side effects into the object creation process. Consider passing the keyspaceID
as an argument to the constructors (NewSpanReplication
, NewWorkingSpanReplication
) to avoid this network call during initialization.
} | ||
span := common.TableIDToComparableSpan(table.TableID) | ||
pdClient := appcontext.GetService[pdutil.PDAPIClient](appcontext.PDAPIClient) | ||
keyspaceID, err := pdClient.GetKeyspaceID(context.Background(), "SYSTEM") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@3AceShowHand: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
stale PR. |
What problem does this PR solve?
Issue Number: close #xxx
What is changed and how it works?
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note