Skip to content

Cannot :ack task after implicit net.box reconnect #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RunsFor opened this issue Nov 21, 2018 · 2 comments · Fixed by #139
Closed

Cannot :ack task after implicit net.box reconnect #85

RunsFor opened this issue Nov 21, 2018 · 2 comments · Fixed by #139
Assignees
Labels
feature A new functionality prio8

Comments

@RunsFor
Copy link

RunsFor commented Nov 21, 2018

Prerequisite

  1. Server which stores tasks using tarantool/queue with custom driver based on utubettl
  2. Client which takes tasks from the server through net.box connection

Scenario

  1. Client takes a task through net.box connection and start processing it
  2. Network glitch (or another issue) occur and net.box decides to reconnect
  3. Client finishes processing task and decides to ack the task.
  4. Server responds to ack command with error - Task was not taken in the session
  5. After ttr delay task returned to the READY state

Problem description

This "ack error" happen because tarantool/queue locks "taken" task to box.session.id(). That means that not only the same client must take and ack the same task, but it has to be done inside the same net.box connection, since box.session.id() updates after implicit reconnect. That implies that client has no way to handle such error from the server and retry ack.

Possible solutions

There is a possible workaround to eliminate this kind of errors is to implement a simple buffer (fiber.channel) on the server with client ack commands and let another fiber to call an actual queue:ack. But this is not very handy, if you want to handle ack response on the client.

Another way to approach this is to use different id to lock a task. This id should determine the same client even through implicit net.box reconnect calls. This id may be set explicitly through some registration process (api breaking change) or implicitly using some tarantool client information (if exists).

Is this reasoning correct? Are there any different approaches/workarounds?

@Totktonada
Copy link
Member

A note from @printercu: if we'll allow reconnects, then we possibly should not automatically release tasks in on_disconnect trigger.

@printercu
Copy link
Contributor

Maybe this should be an option because it depends on queue type: I use ttr to rerun stale tasks and ping queue continuously from active worker.

The other related bug is that task can not be handled from different fibers within single process: box.session.id returns different values from different fibers.

@Totktonada Totktonada added the feature A new functionality label Jul 8, 2020
@LeonidVas LeonidVas self-assigned this Oct 26, 2020
LeonidVas added a commit that referenced this issue Nov 8, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identificate()" method), and one session can have many
connections. The session will be deleted (all session tasks will
be released) after the last connection is disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

Part of #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
The patch added the ability to set the queue settings(by calling
"queue.set_settings(opts)").
Now only one setting is available - "ttr"(time to release).
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Part of #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Closes #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identificate()" method), and one session can have many
connections. The session will be deleted (all session tasks will
be released) after the last connection is disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

Part of #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
The patch added the ability to set the queue settings(by calling
"queue.set_settings(opts)").
Now only one setting is available - "ttr"(time to release).
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Part of #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Closes #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identificate()" method), and one session can have many
connections. The session will be deleted (all session tasks will
be released) after the last connection is disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

Part of #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
The patch added the ability to set the queue settings(by calling
"queue.set_settings(opts)").
Now only one setting is available - "ttr"(time to release).
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Part of #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Closes #85
LeonidVas added a commit that referenced this issue Nov 8, 2020
LeonidVas added a commit that referenced this issue Nov 25, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identificate()" method), and one session can have many
connections. The session will be deleted (all session tasks will
be released) after the last connection is disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

Part of #85
LeonidVas added a commit that referenced this issue Nov 25, 2020
The patch added the ability to set the queue settings(by calling
"queue.set_settings(opts)").
Now only one setting is available - "ttr"(time to release).
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Part of #85
LeonidVas added a commit that referenced this issue Nov 25, 2020
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Closes #85
LeonidVas added a commit that referenced this issue Nov 25, 2020
LeonidVas added a commit that referenced this issue Nov 25, 2020
LeonidVas added a commit that referenced this issue Nov 30, 2020
The term "queue session" will be added to the queue while working
on #85. One "queue session" will include many connections (box.session).
For clarity, box.session has been renamed to connection.

Needed for #85
LeonidVas added a commit that referenced this issue Nov 30, 2020
In the future, we will refuse to use the "pk" index of the
"_queue_taken" space to improve performance. So, let's minimize
its use.

Needed for #85
LeonidVas added a commit that referenced this issue Dec 17, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identify()" method), and one session can have many
connections.
The session will be deleted (all session tasks will be released) after
"ttr" seconds have passed since the last connection was disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Also, the "_queue_taken" internal space has been replaced with the
"_queue_taken_2" with a change format and used indexes (for better
performance).
The downgrade to previous version works correctly.

Closes of #85
LeonidVas added a commit that referenced this issue Dec 17, 2020
LeonidVas added a commit that referenced this issue Dec 17, 2020
LeonidVas added a commit that referenced this issue Dec 22, 2020
The term "queue session" will be added to the queue while working
on #85. One "queue session" will include many connections (box.session).
For clarity, box.session has been renamed to connection.

Needed for #85
LeonidVas added a commit that referenced this issue Dec 22, 2020
In the future, we will refuse to use the "pk" index of the
"_queue_taken" space to improve performance (the conclusion about
the improvement in performance is based on the benchmarks that
will be added in one of the next commits). So, let's minimize
its use.

Needed for #85
LeonidVas added a commit that referenced this issue Dec 22, 2020
The same time functions are used in several files, so move them
to a separate file.

Needed for #85
LeonidVas added a commit that referenced this issue Dec 22, 2020
The patch adds the ability to set the queue settings(by calling
"queue.cfg(opts)").
Now only one setting is available - "ttr"(time to release).
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Part of #85
LeonidVas added a commit that referenced this issue Dec 22, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identify()" method), and one session can have many
connections.
The session will be deleted (all session tasks will be released) after
"ttr" seconds have passed since the last connection was disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Also, the "_queue_taken" internal space has been replaced with the
"_queue_taken_2" with a change format and used indexes (for better
performance).
The downgrade to previous version works correctly.

Closes of #85
LeonidVas added a commit that referenced this issue Dec 22, 2020
LeonidVas added a commit that referenced this issue Dec 22, 2020
LeonidVas added a commit that referenced this issue Dec 22, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identify()" method), and one session can have many
connections.
The session will be deleted (all session tasks will be released) after
"ttr" seconds have passed since the last connection was disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Also, the "_queue_taken" internal space has been replaced with the
"_queue_taken_2" with a change format and used indexes (for better
performance).
The downgrade to previous version works correctly.

Closes of #85
LeonidVas added a commit that referenced this issue Dec 22, 2020
LeonidVas added a commit that referenced this issue Dec 22, 2020
LeonidVas added a commit that referenced this issue Dec 24, 2020
The term "queue session" will be added to the queue while working
on #85. One "queue session" will include many connections (box.session).
For clarity, box.session has been renamed to connection.

Needed for #85
LeonidVas added a commit that referenced this issue Dec 24, 2020
In the future, we will refuse to use the "pk" index of the
"_queue_taken" space to improve performance (the conclusion about
the improvement in performance is based on the benchmarks that
will be added in one of the next commits). So, let's minimize
its use.

Needed for #85
LeonidVas added a commit that referenced this issue Dec 24, 2020
The same time functions are used in several files, so move them
to a separate file.

Needed for #85
LeonidVas added a commit that referenced this issue Dec 24, 2020
The patch adds the ability to set the queue settings(by calling
"queue.cfg(opts)").
Now only one setting is available - "ttr"(time to release).
"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Part of #85
LeonidVas added a commit that referenced this issue Dec 24, 2020
Before the patch a connection to server was synonym of the queue
session. Now the session has a unique UUID (returned by the
"queue.identify()" method), and one session can have many
connections.
The session will be deleted (all session tasks will be released) after
"ttr" seconds have passed since the last connection was disconnected.
To connect to an existing session, call "queue.identificate(uuid)"
with the previously obtained UUID.

"ttr" in seconds - the time after which, if there is no active
connection in the session, it will be released with all its tasks.

Also, the "_queue_taken" internal space has been replaced with the
"_queue_taken_2" with a change format and used indexes (for better
performance).
The downgrade to previous version works correctly.

Closes of #85
LeonidVas added a commit that referenced this issue Dec 24, 2020
LeonidVas added a commit that referenced this issue Dec 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality prio8
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants