-
Notifications
You must be signed in to change notification settings - Fork 19
Replace queue v2 part2 #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Use a simple intrusive list for the event queue. The ultimate goal here is to arrange that certain kinds of events (errors) can be guaranteed to be queued, as client objects will leak if they are discarded. As a secondary improvement, there are some operations (peeking, remove_if) that can be more efficient as we can hold the queue lock for longer. This commit is a straight replacement and does not attempt any logic changes.
This eliminates a round-trip through the LwIP lock and allows _tcp_close_api to specialize for AsyncClient.
Ensure that _tcp_close completely disconnects a pcb from an AsyncClient - All callbacks are unhooked - All events are purged - abort() called on close() failure This fixes some race conditions with closing, particularly without CONFIG_LWIP_TCPIP_CORE_LOCKING, where an event might be processed for a now-dead client if it arrived after arg was cleared but before the callbacks were disconnected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the FreeRTOS queue mechanism with a mutex-guarded intrusive list for managing asynchronous TCP event packets, while also refactoring related callback and event handling logic.
- Introduces SimpleIntrusiveList for event packet management.
- Replaces FreeRTOS queue APIs with intrusive list operations guarded by a mutex.
- Adjusts TCP callback binding/teardown and event processing logic to work with the new data structure.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
src/AsyncTCPSimpleIntrusiveList.h | Introduces an intrusive list implementation for event packet management. |
src/AsyncTCP.cpp | Replaces queue operations with the newly implemented intrusive list; updates TCP callback binding and event handling logic. |
Comments suppressed due to low confidence (1)
src/AsyncTCP.cpp:260
- [nitpick] Consider explicitly capturing 'client' in the lambda (using [client] instead of [=]) to improve the clarity of the intended capture in _remove_events_for_client.
removed_event_chain = _async_queue.remove_if([=](lwip_tcp_event_packet_t &pkt) {
@willmmiles : I will first release a version with the pbuf_free fix (and update espasyncws). Then will go through this PR. |
src/AsyncTCP.cpp
Outdated
bool holds_mutex; | ||
|
||
public: | ||
inline queue_mutex_guard() : holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY)){}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That great idea for using a singleton pattern 👌
I have a comment here.
It is better like this: holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY) == pdTRUE) {}
@willmmiles : I got the time to review and test the PR and the perf are worse (testing with the ESPAsyncWS PerfTest example).
For SSE:
|
src/AsyncTCPSimpleIntrusiveList.h
Outdated
} | ||
|
||
inline size_t size() const { | ||
return list_size(_head); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe better to hold a size_t for the element count instead of looping through all elements ? There is not really a big impact on memory to add it and would improve speed I think - size() begin called at 2 places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can give it a try! I had omitted this particular tradeoff since I had ultimately intended to replace the stochastic poll coalescence logic with a per-client counter that did not depend on queue length. I agree the memory cost is negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Queue size cache implemented in 9262a9e, though I found it didn't have any measurable impact on the performance tests. I think polls aren't frequent enough for it to matter much. Later if we do replace the poll coaelescence, we can consider reverting that patch if there are no users of size()
.
Thanks for taking the time to look at the code again! I'll run some tests and see if I can isolate the performance regression. IIRC the original all-the-things-together PR had comparable or better performance, so I don't think it's something fundamental to the architecture. |
Cache the list length to avoid performing expensive lookups during stochastic poll coaelescence. This can be removed later when the size() function is no longer necessary outside of a debug context.
The creation check in the hot path comes at a measurable performance cost.
@mathieucarbou I've undone the singleton pattern on the queue mutex, so it doesn't need to be checked for creation in the hot path. I'm not measuring any signficant performance difference from the main branch anymore on arduino-3 with my immediately available test board (an ESP32-WROVER); I haven't tested other platforms. Would you mind giving it a quick re-check? |
Yes of course, with pleasure! As soon as I get some free time. |
This PR is the core of #21: it replaces the FreeRTOS queue with a mutex and an intrusive list. This has a number of small benefits:
Included is a small correctness patch for non-CONFIG_LWIP_TCPIP_CORE_LOCKING systems (eg. Arduino core 2), which have a potential race in
AsyncClient::_close()
where the tcp callbacks are unbound on the client task instead of the LwIP task.