-
Notifications
You must be signed in to change notification settings - Fork 38
Prefer primary replica for the token for LWT queries #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Cassandra issue: https://issues.apache.org/jira/browse/CASSANDRA-15746 |
Support for this optimization is already implemented in Java Driver 3.x, via those commits (no PR was made, it was directly committed by @haaawk): 47552b7 Support for this feature is missing in Java Driver 4.x. It seems that the work can be split into a few smaller subtasks (parsing |
To avoid contention, we want to try primary replica first, then secondary and so on. More information: scylladb#24
To avoid contention, we want to try primary replica first, then secondary and so on. More information: scylladb#24
To avoid contention, we want to try primary replica first, then secondary and so on. More information: #24
@avelanarius It seems that this PR #125 has resolved the issue, so can this issue be closed now? |
Uh oh!
There was an error while loading. Please reload this page.
The way Paxos protocol works is that if two queries attempt to update the same key from different coordinators, they will start two independent Paxos rounds. Each round will be assigned a timestamp, and the coordinator who has the highest timestamp will win.
The issue is, the loser only queues up after the winner (using a semaphore) if both rounds are coordinated by the same node. If rounds are started at different nodes, the only option for the user is to sleep increasingly random interval and retry (this is what our implementation does).
If the key is contended and the driver is neither shard nor token aware, this leads to a lot of retries to make an update.
If the key is contended and the driver is token and shard aware, it will send the query to one of the replicas for the partition, but will round-robin between replicas.
This still means there will be at least 50% of loser queries, which will retry before they can commit. Retries against a contended key in multi-DC setup lead to increasingly growing delays and time outs. If it takes 100 milliseconds to do a round due to network topology, and the key is contended, we get as few as 1 query per second for such a key. If all queries queue up on the same coordinator, we can get to up to 10 QPS per key.
This is why for LWT queries the driver should choose replicas in a pre-defined order, so that in case of contention they will queue up at the same replica, rather than compete: choose the primary replica first, then, if the primary is known to be down, the first secondary, then the second secondary, and so on.
This will reduce contention over hot keys and thus increase LWT performance.
If LOCAL_SERIAL serial consistency is used, we should prefer the primary in the local DC, because only local DCs endpoints will participate in the Paxos round. If SERIAL consistency is specified in a multi-DC setup, we should use any DCs primary, but consistently use the same primary for all queries on all clients. The key to avoiding contention is in all clients consistently choosing the same replica for the same key, if it's available/alive.
See also scylladb/gocql#40
The text was updated successfully, but these errors were encountered: