Skip to content

[✨ Triage] dotnet/runtime#118064 by amanasifkhalid - `System.Net.WebSockets.Client.Tests.CancelTest_SharedHandler_Lo ... #1281

@MihuBot

Description

@MihuBot

Triage for dotnet/runtime#118064.
Repo filter: All networking issues.
MihuBot version: 7cce3b.
Ping MihaZupan for any issues.

This is a test triage report generated by AI, aimed at helping the triage team quickly identify past issues/PRs that may be related.
Take any conclusions with a large grain of salt.

dotnet/runtime#118064: `System.Net.WebSockets.Client.Tests.CancelTest_SharedHandler_Loopback.ConnectAsync_Cancel_ThrowsCancellationException` failing with NRE on CoreCLR android-arm64 by amanasifkhalid
[Tool] Searching for SafeDeleteSslContext.WriteToConnection NullReferenceException, System.Net.WebSockets.Client.Tests.CancelTest_SharedHandler_Loopback.ConnectAsync_Cancel_ThrowsCancellationException android-arm64, WebSockets NullReferenceException android, SafeDeleteSslContext NullReferenceException (IncludeOpen=True, IncludeClosed=True, IncludeIssues=True, IncludePullRequests=True, Repository=dotnet/runtime)
[Tool] Found 49 issues, 63 comments, 49 returned results (7217 ms)

Here are the most relevant related issues and discussions for issue #118064 (System.Net.WebSockets.Client.Tests.CancelTest_SharedHandler_Loopback.ConnectAsync_Cancel_ThrowsCancellationException failing with NRE on CoreCLR android-arm64, specifically with System.NullReferenceException at System.Net.SafeDeleteSslContext.WriteToConnection):


1. Issue #25734 (April 2018) - "SafeDeleteSslContext.WriteToConnection throws System.ArgumentNullException"

  • Summary: This issue describes sporadic CI failures with exceptions in SafeDeleteSslContext.WriteToConnection, including ArgumentNullException and process crashes. The failures occur when a client opens an SSL connection asynchronously and then closes it, possibly while the handshake is still in progress.
  • Key Discussion Points:
    • The root cause was identified as locking on an object that gets nulled out in Dispose.
    • Suggestions included guarding null-outs behind SafeHandle.IsClosed or never nulling out those fields and letting GC handle them.
    • The consensus was to avoid nulling out fields in Dispose to prevent such exceptions.
  • Relevance: The stack trace and context are very similar to the current issue, indicating a long-standing race/disposal problem in this area of the code.

2. PR #117982 (July 2025) - "[Apple] Lock over the same object in SafeDeleteSslContext to serialize access to the buffers"

  • Summary: This PR addresses synchronization issues in SafeDeleteSslContext by introducing a dedicated lock object to ensure all access to shared buffers is serialized.
  • Key Discussion Points:
    • The PR was motivated by issues found while working on another PR and aims to prevent race conditions in buffer access.
    • While the PR is focused on OSX, the underlying synchronization problem is similar to what could cause NREs in other platforms.
  • Relevance: Shows recent, active work in this area, and the kind of race that could lead to NREs in WriteToConnection.

3. Issue #49600 (March 2021) - "Use after dispose bug in SafeDeleteSslContext.WriteToConnection on OSX"

  • Summary: Reports a use-after-dispose bug in SafeDeleteSslContext.WriteToConnection, exposed by changes that started nulling out a buffer in Dispose.
  • Key Discussion Points:
    • The issue is not a regression but was exposed by recent changes.
    • The root cause is similar: fields being nulled out while native code may still reference them.
  • Relevance: Directly related to the type of NRE seen in the current issue.

4. PR #49945 (March 2021) - "catch exceptions in callbacks from native code on OSX in SafeDeleteSslContext"

  • Summary: This PR changes the handling of exceptions thrown from managed callbacks invoked by native code, returning error codes instead of letting exceptions propagate.
  • Key Discussion Points:
    • The PR is OSX-specific but addresses the general problem of managed exceptions (including NRE) in native callbacks.
    • The author notes that a longer-term fix is needed to avoid using SafeHandle after dispose.
  • Relevance: Shows a mitigation for the symptoms, but not a root-cause fix.

5. Issue #55736 (July 2021) - "Mono: AOT compiler can't compile System.Net.SafeDeleteSslContext:ReadFromConnection (Mac Catalyst)"

  • Summary: Reports an AOT compilation failure for ReadFromConnection, a sibling to WriteToConnection, due to missing attributes.
  • Key Discussion Points:
    • Fixed by adding UnmanagedCallersOnly attributes (see PR #55947).
  • Relevance: Not directly about NRE, but shows platform-specific issues in this code.

6. Issue #52645 (May 2021) - "WebSockets test failures on Android"

  • Summary: Tracks various WebSockets test failures on Android, including resource handling and configuration issues.
  • Key Discussion Points:
    • Some failures were due to resource files and trimming, not NREs.
  • Relevance: Shows Android-specific test instability in WebSockets, but not directly about NRE in WriteToConnection.

7. Issue #47582 (January 2021) - "Undocumented NullReferenceException is Raised by ClientWebSocket.ReceiveAsync"

  • Summary: Reports that ReceiveAsync and SendAsync can throw NREs if called after a failed OpenAsync.
  • Key Discussion Points:
    • Suggests that a more appropriate exception should be thrown.
  • Relevance: Shows that NREs can surface in WebSocket code paths after failed or canceled operations.

8. PR #69527 (May 2022) - "avoid allocation of SafeFreeSslCredentials and SafeDeleteSslContext on Linux"

  • Summary: Refactors how SafeDeleteSslContext is used on Linux, merging it with SafeSslHandle.
  • Key Discussion Points:
    • Some discussion about null checks and disposal.
  • Relevance: Shows ongoing refactoring in this area, but not directly about NREs.

9. Issue #104599 (July 2024) - "WebSocket exception with LLVM and full AOT Enabled in .NET Android App"

  • Summary: Reports WebSocket failures on Android with LLVM and full AOT, but the stack trace is about connection failures, not NREs.
  • Key Discussion Points:
    • The issue is about connection failures, not NREs in WriteToConnection.
  • Relevance: Shows Android-specific WebSocket issues, but not the same failure.

Conclusions

  • The NRE in SafeDeleteSslContext.WriteToConnection is a long-standing, cross-platform race/disposal issue, especially when native code can call back into managed code after disposal.
  • There have been several attempts to mitigate or fix these issues, especially on OSX, but the underlying pattern is present on other platforms (including Android).
  • Recent PRs (like #117982) show ongoing work to improve synchronization and prevent these races, but the problem is not fully solved for all platforms.
  • Android-specific WebSocket test failures are common, but this particular NRE is less frequently reported on Android, suggesting a possible gap in platform-specific handling or test coverage.

Actionable Summary

  • This is a known class of issue (race/disposal in SafeDeleteSslContext) with multiple historical and recent reports, especially on OSX but also relevant to Android.
  • Recent work (PR #117982) may help, but platform-specific investigation for Android is warranted.
  • No direct duplicate for this exact NRE on Android, but the root cause and mitigation strategies are well-documented in the referenced issues and PRs.

If you need more details on any specific issue or PR, let me know!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions