Skip to content

fallback to GetAddrInfoW if GetAddrInfoExW fails #45816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 12, 2021
Merged

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Dec 9, 2020

While root cause why GetAddrInfoExW() is failing from impersonificated context is still investigated this change should make Async DNS working again on Windows when running from impersonificated context.

We do not have infrastructure for automated authenticatiobn tests but I run the repro code manually on my Windows 10 and it works now eg. both explicit DNS lookup and HttpClient Async fetch works.

fixes #29935

@ghost
Copy link

ghost commented Dec 9, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

While root cause why GetAddrInfoExW() is failing from impersonificated context is still investigated this change should make Async DNS working again on Windows when running from impersonificated context.

We do not have infrastructure for automated authenticatiobn tests but I run the repro code manually on my Windows 10 and it works now eg. both explicit DNS lookup and HttpClient Async fetch works.

Author: wfurt
Assignees: -
Labels:

area-System.Net

Milestone: -

@CarnaViire
Copy link
Member

Also, is it possible to add a test for these changes?

Copy link
Contributor

@scalablecory scalablecory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we talked to Windows team to see if there's a way we can keep using async here?

@wfurt wfurt self-assigned this Dec 9, 2020
@wfurt
Copy link
Member Author

wfurt commented Dec 9, 2020

Also, is it possible to add a test for these changes?

no, not really. This was part of the Enterprise-auth project but it got de-prioritized.
In order to test this we need secondary Windows account can credentials for it so the test can impersonififcate - that is hard to maintain within current test framework.

@wfurt
Copy link
Member Author

wfurt commented Dec 11, 2020

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

return result;
if (NameResolutionPal.TryGetAddrInfoAsync(hostName, justAddresses, addressFamily, cancellationToken, out t))
{
t.ContinueWith((resultTask) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be returning the Task returned by ContinueWith, or the ActivityId following the Dns resolution wouldn't reflect the Stop event.
This was the issue with the original Telemetry instrumentation of Dns, pointed out by @ManickaP in #41670 (comment) (and fixed in #42188).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this will break the telemetry once again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had different change where telemetry test was failing. With this on, telemetry tests are still passing. The log still be edited but I'm not sire about the ID. If we return the ContinueWith, we will need to copy result (or failure). I will look into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests will only validate that the Start/Stop are logged with the same ActivityId.

The issue here appears when you have another Start/Stop pair following the name resolution:

  • HttpRequestStart starts with ID /1/
  • NameResolutionStart starts with nested ID /1/1/
  • NameResolutionStop correctly sees /1/1/ and updates it to /1/.
  • SocketConnectStart starts on the same async context as NameResolutionStart, but it never observes the change of the ID back to /1/, so it starts with a nested ID /1/1/1

It will appear as if the Socket connect is an operation nested under the NameResolution Start/Stop, when in reality it's only following it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-factor the code and the activity ID is right as far as I can tell as well as running impersonificated still works.

I went back to the await path as the ContinueWith was getting complicated more than I wanted. The only downside is that we will miss time of the call it self.
I feel it probably does not matter comparing duration of the actual resolution.
We could probably start the timer upfront but I'm not sure if that is worth of the complication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after chatting with @MihaZupan , I decided to fix the stopwatch as well.
This should be ready for another review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry here LGTM

// instead of calling the synchronous version in the ThreadPool.
// If the OS supports it and 'hostName' is not an IP Address, resolve the name asynchronously
// instead of calling the synchronous version in the ThreadPool.
// If it fails, we will fall-back to ThreadPool as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If it fails, we will fall-back to ThreadPool as well.
// If it fails, we will fall back to ThreadPool as well.

{
// WSATRY_AGAIN indicates possible problem with reachability according to docs.
// However, if servers are really unreachable, we would still get IOPending here
// and final result would be posted via overlapped IO.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding the above comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WSATRY_AGAIN is valued error code for failures other than we see in the impersonification case. What I'm trying to say here is based on my testing and understanding, the other WSATRY_AGAIN would not be synchronous failure.
In such cases, we should return transient failure to caller rather than re-trying on thread-pool IMHO.

{
result = await ((Task<T>)NameResolutionPal.GetAddrInfoAsync(hostName, justAddresses, addressFamily, cancellationToken)).ConfigureAwait(false);
return result;
//return new Tuple<bool, Task>(true, (Task)InternalGetAddrInfoWithTelemetryAsync(task, hostName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@wfurt
Copy link
Member Author

wfurt commented Jan 12, 2021

System.Speech.Tests test failure seems unrelated.

@wfurt wfurt merged commit 1655959 into dotnet:master Jan 12, 2021
@wfurt
Copy link
Member Author

wfurt commented Jan 12, 2021

/backport to release/5.0

@github-actions
Copy link
Contributor

Started backporting to release/5.0: https://github.com/dotnet/runtime/actions/runs/479577114

@github-actions
Copy link
Contributor

@wfurt backporting to release/5.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: fallback to GetAddrInfoW if GetAddrInfoExW fails
error: sha1 information is lacking or useless (src/libraries/System.Net.NameResolution/src/System/Net/Dns.cs).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 fallback to GetAddrInfoW if GetAddrInfoExW fails
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dns.GetHostAddressesAsync fails with SocketException when called during impersonation
10 participants