Closed
Description
I was testing some new code for the Go build system and found that a simple TCP dial doesn't work on Mac anymore, at least when the binary is cross-compiled.
Code is just:
var coordDialer = &net.Dialer{
Timeout: 10 * time.Second,
KeepAlive: 15 * time.Second,
}
// dialCoordinatorTCP returns a TCP connection to the coordinator, making
// a CONNECT request to a proxy as a fallback.
func dialCoordinatorTCP(ctx context.Context, addr string) (net.Conn, error) {
tcpConn, err := coordDialer.DialContext(ctx, "tcp", addr)
... with a context.Background() for ctx.
It always times out after 10 seconds.
But if I redeploy the same code but built with Go 1.12.x, it works fine.
Metadata
Metadata
Assignees
Labels
Type
Projects
Relationships
Development
No branches or pull requests
Activity
bradfitz commentedon Apr 26, 2019
My best guess is f6b42a5 ("net: use libSystem bindings for DNS resolution on macos if cgo is unavailable").
We might need some more test coverage. Or a no-cgo darwin builder.
/cc @ianlancetaylor @grantseltzer
[-]net: buildlet doesn't work on darwin-amd64 with Go master[/-][+]net: cross-compiled cgo-less buildlet doesn't work on darwin-amd64 with Go master[/+]groob commentedon Apr 28, 2019
FWIW I can't seem to reproduce with a binary compiled on a 10.14.4 mac. I tested building with cgo disabled and setting
GODEBUG=netdns=go
.bradfitz commentedon Apr 29, 2019
I built on Linux and ran on a Mac, without setting any special environment variables.
grantseltzer commentedon Apr 29, 2019
Could this be because when you cross compile on Linux the linker doesn't have access to libSystem?
Not sure how this done for every other binding when there's cross compilation
Also, this is with CGO enabled, netcgo not specified, cross compiled for darwin on linux?
bradfitz commentedon Apr 29, 2019
@randall77?
bradfitz commentedon Apr 30, 2019
We just disabled cgo support for darwin/386 (per #31751) so we now have a CGO_ENABLED=0 Mac builder, which now hits this issue. Which is good in that we can reproduce it.
Looks like it's stuck in DNS queries, so f6b42a5 looks implicated.
https://build.golang.org/log/289a154e730768cccbc64dd0ea2af16b4b48db88
randall77 commentedon Apr 30, 2019
I don't think this should matter. We don't actually need access to
libSystem
to build a binary which dynamically links to it. Building on Linux and running on a Mac should work fine with regards to this feature.gopherbot commentedon Apr 30, 2019
Change https://golang.org/cl/174637 mentions this issue:
dashboard: add darwin-amd64-nocgo config, remove nacl-386 trybot
dashboard: add darwin-amd64-nocgo config, remove nacl-386 trybot
40 remaining items
randall77 commentedon Jun 5, 2019
I don't think the res_search in /usr/lib/system/libsystem_info.dylib ever makes it to res_9_search in /usr/lib/libresolv.dylib. The following files all just reference each other, there's no path to libresolv.dylib from our root (libSystem.B.dylib):
randall77 commentedon Jun 5, 2019
It wouldn't be hard to try adding libresolv.dylib to our imports and renaming our resolver from res_search to res_9_search.
gopherbot commentedon Jun 5, 2019
Change https://golang.org/cl/180838 mentions this issue:
net: skip questions before parsing answers
rsc commentedon Jun 5, 2019
I can't figure out what CL 166297 intended to fix or why it was important to improve anything in non-cgo-only mode. I have a bunch of fixes for it that make it resolve names successfully, which I will send out, but then I will send a CL deleting it entirely. If at some later point someone wants to bring it back, that's fine, provided they explain why.
gopherbot commentedon Jun 6, 2019
Change https://golang.org/cl/180843 mentions this issue:
net: remove non-cgo macOS resolver code
gopherbot commentedon Jun 6, 2019
Change https://golang.org/cl/180842 mentions this issue:
net: fix non-cgo macOS resolver code
grantseltzer commentedon Jun 6, 2019
@rsc the idea is so that darwin dns logic can be used instead of the netgo library for when cgo is disabled. There's a lot of overlap but the
/etc/resolver
files is the particular example.The problem I see is that when distributing binaries to darwin hosts it's likely most common to disable cgo. This means that before CL 166297 tools like the ones by hashicorp would not support
/etc/resolver
files. At my company and those of others I talked to in person and on slack this was an issue.In terms of testing, shamefully I was just doing it manually but will gladly work to write up tests. I'm taking a look at your CLs now, thanks for the help and feedback!
rsc commentedon Jun 6, 2019
@grantseltzer, I don't believe non-cgo builds of such tools are working at all today; literally all DNS lookups seem to fail, not just ones involving /etc/resolver. Even once Go-side bugs are fixed, libsystem_info's res_search fails at PTR and CNAME queries; it may also not be thread safe; and it appears not to pay any attention to /etc/resolver. Perhaps switching to libresolv will help; perhaps not. If you'd like to reintroduce a revised version of the code for Go 1.14, that's fine. For Go 1.13, though, we'll revert things to the way they were in Go 1.12. Thanks.
runtime: use default system stack size, not 64 kB, on non-cgo macOS
grantseltzer commentedon Jun 6, 2019
@rsc although your changes in 180842 make a lot of sense to me, i'm quite sure that they at least work enough to honor the
/etc/resolver
files. The thing I don't understand is that they're only working in non-test files. That behavior is worth exploring.With that in mind, it is a lot of added complexity for a corner case. I'll investigate to see how else I can get this to work, even if you are removing the code for now. Thanks to you as well.
net: fix non-cgo macOS resolver code