Skip to content

Replace queue v2 part2 #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

willmmiles
Copy link

This PR is the core of #21: it replaces the FreeRTOS queue with a mutex and an intrusive list. This has a number of small benefits:

  • Queue clears for a closing/errored client can be performed atomically and quickly, even on the LwIP task;
  • Poll coaelscence can be performed atomically and quickly without needing to pump and reload the queue;
  • (Future) It permits pre-allocation of error events, which will allow us to guarantee that the dispose callback will be dispatched even under memory pressure.

Included is a small correctness patch for non-CONFIG_LWIP_TCPIP_CORE_LOCKING systems (eg. Arduino core 2), which have a potential race in AsyncClient::_close() where the tcp callbacks are unbound on the client task instead of the LwIP task.

Use a simple intrusive list for the event queue.  The ultimate goal here
is to arrange that certain kinds of events (errors) can be guaranteed
to be queued, as client objects will leak if they are discarded.  As a
secondary improvement, there are some operations (peeking, remove_if)
that can be more efficient as we can hold the queue lock for longer.

This commit is a straight replacement and does not attempt any logic
changes.
This eliminates a round-trip through the LwIP lock and allows
_tcp_close_api to specialize for AsyncClient.
Ensure that _tcp_close completely disconnects a pcb from an AsyncClient
- All callbacks are unhooked
- All events are purged
- abort() called on close() failure
This fixes some race conditions with closing, particularly without
CONFIG_LWIP_TCPIP_CORE_LOCKING, where an event might be processed for a
now-dead client if it arrived after arg was cleared but before the
callbacks were disconnected.
@mathieucarbou mathieucarbou requested a review from Copilot May 7, 2025 08:46
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the FreeRTOS queue mechanism with a mutex-guarded intrusive list for managing asynchronous TCP event packets, while also refactoring related callback and event handling logic.

  • Introduces SimpleIntrusiveList for event packet management.
  • Replaces FreeRTOS queue APIs with intrusive list operations guarded by a mutex.
  • Adjusts TCP callback binding/teardown and event processing logic to work with the new data structure.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/AsyncTCPSimpleIntrusiveList.h Introduces an intrusive list implementation for event packet management.
src/AsyncTCP.cpp Replaces queue operations with the newly implemented intrusive list; updates TCP callback binding and event handling logic.
Comments suppressed due to low confidence (1)

src/AsyncTCP.cpp:260

  • [nitpick] Consider explicitly capturing 'client' in the lambda (using [client] instead of [=]) to improve the clarity of the intended capture in _remove_events_for_client.
removed_event_chain = _async_queue.remove_if([=](lwip_tcp_event_packet_t &pkt) {

@mathieucarbou
Copy link
Member

@willmmiles : I will first release a version with the pbuf_free fix (and update espasyncws). Then will go through this PR.

src/AsyncTCP.cpp Outdated
bool holds_mutex;

public:
inline queue_mutex_guard() : holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY)){};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That great idea for using a singleton pattern 👌
I have a comment here.
It is better like this: holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY) == pdTRUE) {}

@mathieucarbou
Copy link
Member

@willmmiles : I got the time to review and test the PR and the perf are worse (testing with the ESPAsyncWS PerfTest example).

autocannon -c 10 -w 10 -d 20 http://192.168.4.1 can barely reach 10-11 req/sec sometimes

For SSE:

  • with 16 clients I am reaching ~350 msg / sec instead of ~ 400 before
  • with 10 clients, I am reaching ~430 msg / sec instead of ~ 530 before

}

inline size_t size() const {
return list_size(_head);
Copy link
Member

@mathieucarbou mathieucarbou May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to hold a size_t for the element count instead of looping through all elements ? There is not really a big impact on memory to add it and would improve speed I think - size() begin called at 2 places.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can give it a try! I had omitted this particular tradeoff since I had ultimately intended to replace the stochastic poll coalescence logic with a per-client counter that did not depend on queue length. I agree the memory cost is negligible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Queue size cache implemented in 9262a9e, though I found it didn't have any measurable impact on the performance tests. I think polls aren't frequent enough for it to matter much. Later if we do replace the poll coaelescence, we can consider reverting that patch if there are no users of size().

@willmmiles
Copy link
Author

@willmmiles : I got the time to review and test the PR and the perf are worse (testing with the ESPAsyncWS PerfTest example).

Thanks for taking the time to look at the code again! I'll run some tests and see if I can isolate the performance regression. IIRC the original all-the-things-together PR had comparable or better performance, so I don't think it's something fundamental to the architecture.

Cache the list length to avoid performing expensive lookups during
stochastic poll coaelescence.  This can be removed later when the
size() function is no longer necessary outside of a debug context.
The creation check in the hot path comes at a
measurable performance cost.
@willmmiles
Copy link
Author

@mathieucarbou I've undone the singleton pattern on the queue mutex, so it doesn't need to be checked for creation in the hot path. I'm not measuring any signficant performance difference from the main branch anymore on arduino-3 with my immediately available test board (an ESP32-WROVER); I haven't tested other platforms. Would you mind giving it a quick re-check?

@mathieucarbou
Copy link
Member

Yes of course, with pleasure! As soon as I get some free time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants