Prefer primary replica for the token for LWT queries #24

kostja · 2020-04-20T18:52:12Z

The way Paxos protocol works is that if two queries attempt to update the same key from different coordinators, they will start two independent Paxos rounds. Each round will be assigned a timestamp, and the coordinator who has the highest timestamp will win.
The issue is, the loser only queues up after the winner (using a semaphore) if both rounds are coordinated by the same node. If rounds are started at different nodes, the only option for the user is to sleep increasingly random interval and retry (this is what our implementation does).

If the key is contended and the driver is neither shard nor token aware, this leads to a lot of retries to make an update.
If the key is contended and the driver is token and shard aware, it will send the query to one of the replicas for the partition, but will round-robin between replicas.

This still means there will be at least 50% of loser queries, which will retry before they can commit. Retries against a contended key in multi-DC setup lead to increasingly growing delays and time outs. If it takes 100 milliseconds to do a round due to network topology, and the key is contended, we get as few as 1 query per second for such a key. If all queries queue up on the same coordinator, we can get to up to 10 QPS per key.

This is why for LWT queries the driver should choose replicas in a pre-defined order, so that in case of contention they will queue up at the same replica, rather than compete: choose the primary replica first, then, if the primary is known to be down, the first secondary, then the second secondary, and so on.

This will reduce contention over hot keys and thus increase LWT performance.

If LOCAL_SERIAL serial consistency is used, we should prefer the primary in the local DC, because only local DCs endpoints will participate in the Paxos round. If SERIAL consistency is specified in a multi-DC setup, we should use any DCs primary, but consistently use the same primary for all queries on all clients. The key to avoiding contention is in all clients consistently choosing the same replica for the same key, if it's available/alive.

See also scylladb/gocql#40

kostja · 2020-05-14T11:17:45Z

Cassandra issue: https://issues.apache.org/jira/browse/CASSANDRA-15746

avelanarius · 2022-04-04T15:21:43Z

Support for this optimization is already implemented in Java Driver 3.x, via those commits (no PR was made, it was directly committed by @haaawk):

47552b7
d2cfc30
f724fdc
674f33f
9aebd64
f91f3b1
cd5faa9
5be42b6
f4aeff6

Support for this feature is missing in Java Driver 4.x. It seems that the work can be split into a few smaller subtasks (parsing LwtInfo, parsing the response of prepared statements whether a query is LWT, picking a correct replica when doing a query).

To avoid contention, we want to try primary replica first, then secondary and so on. More information: scylladb#24

To avoid contention, we want to try primary replica first, then secondary and so on. More information: #24

Gor027 · 2023-03-03T11:58:35Z

@avelanarius It seems that this PR #125 has resolved the issue, so can this issue be closed now?

kostja mentioned this issue Apr 21, 2020

Extend the binary protocol resultMetadata with LWT flag scylladb/scylladb#6259

Closed

haaawk pushed a commit that referenced this issue Jun 24, 2020

JAVA-2529: Standardize optional/excludable dependency checks (#24)

09c1f6a

haaawk self-assigned this Mar 21, 2021

avelanarius assigned Lorak-mmk and unassigned haaawk Apr 4, 2022

Lorak-mmk added a commit to Lorak-mmk/java-driver that referenced this issue Apr 28, 2022

Pick replicas in the right order for LWT statements

c7dc3a6

To avoid contention, we want to try primary replica first, then secondary and so on. More information: scylladb#24

Lorak-mmk mentioned this issue Apr 28, 2022

Lwt awareness #125

Merged

avelanarius mentioned this issue Apr 29, 2022

Add metrics tracking LWTs being sent to non-primary replica #127

Open

fruch mentioned this issue May 1, 2022

Prefer primary replica for the token for LWT queries scylladb/python-driver#144

Closed

Lorak-mmk added a commit to Lorak-mmk/java-driver that referenced this issue May 17, 2022

Pick replicas in the right order for LWT statements

d33eb5e

To avoid contention, we want to try primary replica first, then secondary and so on. More information: scylladb#24

avelanarius pushed a commit that referenced this issue May 26, 2022

Pick replicas in the right order for LWT statements

b58bd0e

To avoid contention, we want to try primary replica first, then secondary and so on. More information: #24

avelanarius closed this as completed Mar 3, 2023

dkropachev mentioned this issue Nov 22, 2024

Help drivers to invalidate tablets in time scylladb/scylladb#21664

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prefer primary replica for the token for LWT queries #24

Prefer primary replica for the token for LWT queries #24

kostja commented Apr 20, 2020 •

edited

Loading

kostja commented May 14, 2020

Uh oh!

avelanarius commented Apr 4, 2022

Uh oh!

Gor027 commented Mar 3, 2023

Uh oh!

Prefer primary replica for the token for LWT queries #24

Prefer primary replica for the token for LWT queries #24

Comments

kostja commented Apr 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kostja commented May 14, 2020

Uh oh!

avelanarius commented Apr 4, 2022

Uh oh!

Gor027 commented Mar 3, 2023

Uh oh!

kostja commented Apr 20, 2020 •

edited

Loading