Replace queue v2 part2 #58

willmmiles · 2025-05-07T03:00:06Z

This PR is the core of #21: it replaces the FreeRTOS queue with a mutex and an intrusive list. This has a number of small benefits:

Queue clears for a closing/errored client can be performed atomically and quickly, even on the LwIP task;
Poll coaelscence can be performed atomically and quickly without needing to pump and reload the queue;
(Future) It permits pre-allocation of error events, which will allow us to guarantee that the dispose callback will be dispatched even under memory pressure.

Included is a small correctness patch for non-CONFIG_LWIP_TCPIP_CORE_LOCKING systems (eg. Arduino core 2), which have a potential race in AsyncClient::_close() where the tcp callbacks are unbound on the client task instead of the LwIP task.

Use a simple intrusive list for the event queue. The ultimate goal here is to arrange that certain kinds of events (errors) can be guaranteed to be queued, as client objects will leak if they are discarded. As a secondary improvement, there are some operations (peeking, remove_if) that can be more efficient as we can hold the queue lock for longer. This commit is a straight replacement and does not attempt any logic changes.

This eliminates a round-trip through the LwIP lock and allows _tcp_close_api to specialize for AsyncClient.

Ensure that _tcp_close completely disconnects a pcb from an AsyncClient - All callbacks are unhooked - All events are purged - abort() called on close() failure This fixes some race conditions with closing, particularly without CONFIG_LWIP_TCPIP_CORE_LOCKING, where an event might be processed for a now-dead client if it arrived after arg was cleared but before the callbacks were disconnected.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull Request Overview

This PR replaces the FreeRTOS queue mechanism with a mutex-guarded intrusive list for managing asynchronous TCP event packets, while also refactoring related callback and event handling logic.

Introduces SimpleIntrusiveList for event packet management.
Replaces FreeRTOS queue APIs with intrusive list operations guarded by a mutex.
Adjusts TCP callback binding/teardown and event processing logic to work with the new data structure.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/AsyncTCPSimpleIntrusiveList.h	Introduces an intrusive list implementation for event packet management.
src/AsyncTCP.cpp	Replaces queue operations with the newly implemented intrusive list; updates TCP callback binding and event handling logic.

Comments suppressed due to low confidence (1)

src/AsyncTCP.cpp:260

[nitpick] Consider explicitly capturing 'client' in the lambda (using [client] instead of [=]) to improve the clarity of the intended capture in _remove_events_for_client.

removed_event_chain = _async_queue.remove_if([=](lwip_tcp_event_packet_t &pkt) {

src/AsyncTCP.cpp

mathieucarbou · 2025-05-08T19:58:01Z

@willmmiles : I will first release a version with the pbuf_free fix (and update espasyncws). Then will go through this PR.

TienHuyIoT · 2025-05-13T02:20:57Z

src/AsyncTCP.cpp

+  bool holds_mutex;
+
+public:
+  inline queue_mutex_guard() : holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY)){};


That great idea for using a singleton pattern 👌
I have a comment here.
It is better like this: holds_mutex(xSemaphoreTake(_async_queue_mutex(), portMAX_DELAY) == pdTRUE) {}

mathieucarbou · 2025-05-24T20:29:13Z

@willmmiles : I got the time to review and test the PR and the perf are worse (testing with the ESPAsyncWS PerfTest example).

autocannon -c 10 -w 10 -d 20 http://192.168.4.1 can barely reach 10-11 req/sec sometimes

For SSE:

with 16 clients I am reaching ~350 msg / sec instead of ~ 400 before
with 10 clients, I am reaching ~430 msg / sec instead of ~ 530 before

mathieucarbou · 2025-05-24T20:56:09Z

src/AsyncTCPSimpleIntrusiveList.h

+  }
+
+  inline size_t size() const {
+    return list_size(_head);


maybe better to hold a size_t for the element count instead of looping through all elements ? There is not really a big impact on memory to add it and would improve speed I think - size() begin called at 2 places.

I can give it a try! I had omitted this particular tradeoff since I had ultimately intended to replace the stochastic poll coalescence logic with a per-client counter that did not depend on queue length. I agree the memory cost is negligible.

Queue size cache implemented in 9262a9e, though I found it didn't have any measurable impact on the performance tests. I think polls aren't frequent enough for it to matter much. Later if we do replace the poll coaelescence, we can consider reverting that patch if there are no users of size().

willmmiles · 2025-05-26T14:36:32Z

@willmmiles : I got the time to review and test the PR and the perf are worse (testing with the ESPAsyncWS PerfTest example).

Thanks for taking the time to look at the code again! I'll run some tests and see if I can isolate the performance regression. IIRC the original all-the-things-together PR had comparable or better performance, so I don't think it's something fundamental to the architecture.

Cache the list length to avoid performing expensive lookups during stochastic poll coaelescence. This can be removed later when the size() function is no longer necessary outside of a debug context.

The creation check in the hot path comes at a measurable performance cost.

willmmiles · 2025-05-27T02:12:52Z

@mathieucarbou I've undone the singleton pattern on the queue mutex, so it doesn't need to be checked for creation in the hot path. I'm not measuring any signficant performance difference from the main branch anymore on arduino-3 with my immediately available test board (an ESP32-WROVER); I haven't tested other platforms. Would you mind giving it a quick re-check?

mathieucarbou · 2025-05-27T14:54:07Z

Yes of course, with pleasure! As soon as I get some free time.

mathieucarbou · 2025-06-01T11:23:24Z

@willmmiles : definitely better for http requests: with autocannon 10 or 16 workers I get about the same perf as in main now.
Same for SSE, I get 20-40 req / sec more, so like in main.
Even with main I don't get anymore 530 req/sec like before: I suspect this caused by some recent commits (maybe the re-entrant locking in espasyncws).

Anyway PR is good to merge! Thanks a lot!

mathieucarbou · 2025-06-01T11:24:35Z

Ah ! Libretiny support is failing..

mathieucarbou · 2025-06-01T11:32:56Z

Will fix by adding a commit in main - I cannot add a commit to your branch.

mathieucarbou · 2025-06-26T15:24:41Z

@willmmiles : FYI: emsesp/EMS-ESP32#2581 (comment)

willmmiles · 2025-06-26T15:43:23Z

@willmmiles : FYI: emsesp/EMS-ESP32#2581 (comment)

Thanks, I'm following. Wish there were some stack traces.

willmmiles added 5 commits April 22, 2025 19:20

Factor common binding code

18cc80d

Purge queue when unbinding

6825e07

_tcp_bind_api: Close automatically on failure

27bbd29

This eliminates a round-trip through the LwIP lock and allows _tcp_close_api to specialize for AsyncClient.

mathieucarbou requested a review from Copilot May 7, 2025 08:46

Copilot AI reviewed May 7, 2025

View reviewed changes

mathieucarbou requested review from mathieucarbou, me-no-dev and vortigont May 7, 2025 08:46

mathieucarbou assigned willmmiles May 7, 2025

mathieucarbou added the Status: Pending Merge label May 7, 2025

mathieucarbou requested a review from Copilot May 7, 2025 20:02

Copilot AI reviewed May 7, 2025

View reviewed changes

src/AsyncTCP.cpp Show resolved Hide resolved

src/AsyncTCP.cpp Show resolved Hide resolved

TienHuyIoT reviewed May 13, 2025

View reviewed changes

mathieucarbou added 2 commits May 24, 2025 21:45

Merge branch 'main' into replace-queue-v2-part2

5d17d99

Merge branch 'main' into replace-queue-v2-part2

f519aa8

mathieucarbou reviewed May 24, 2025

View reviewed changes

willmmiles added 2 commits May 26, 2025 21:48

Add size cache to SimpleIntrusiveList

9262a9e

Cache the list length to avoid performing expensive lookups during stochastic poll coaelescence. This can be removed later when the size() function is no longer necessary outside of a debug context.

Remove singleton pattern on queue mutex

5f08839

The creation check in the hot path comes at a measurable performance cost.

Merge branch 'main' into replace-queue-v2-part2

c8d5ff9

mathieucarbou approved these changes Jun 1, 2025

View reviewed changes

mathieucarbou merged commit 9d2f186 into ESP32Async:main Jun 1, 2025
19 of 27 checks passed

mathieucarbou mentioned this pull request Jun 1, 2025

Remove support for Libretiny generic-rtl8710bn-2mb-788k #64

Merged

Replace queue v2 part2 #58

Replace queue v2 part2 #58

Uh oh!

Conversation

willmmiles commented May 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

mathieucarbou commented May 8, 2025

Uh oh!

TienHuyIoT May 13, 2025

Choose a reason for hiding this comment

Uh oh!

mathieucarbou commented May 24, 2025

Uh oh!

mathieucarbou May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willmmiles May 26, 2025

Choose a reason for hiding this comment

Uh oh!

willmmiles May 27, 2025

Choose a reason for hiding this comment

Uh oh!

willmmiles commented May 26, 2025

Uh oh!

willmmiles commented May 27, 2025

Uh oh!

mathieucarbou commented May 27, 2025

Uh oh!

mathieucarbou commented Jun 1, 2025

Uh oh!

mathieucarbou commented Jun 1, 2025

Uh oh!

mathieucarbou commented Jun 1, 2025

Uh oh!

Uh oh!

mathieucarbou commented Jun 26, 2025

Uh oh!

willmmiles commented Jun 26, 2025

Uh oh!

Uh oh!

mathieucarbou May 24, 2025 •

edited

Loading