Don't use localhost

, ,

I recently ran into a bug where the compiled version of Simple IoT would not work on one of my computers. Interestingly, go run ... worked fine. What was the difference?

At first, it seemed a race condition was at play – the compiled version was faster and would likely have different characteristics. SIOT is a highly concurrent application with many clients running in parallel. However, after spending some time debugging I could not find any issues with startup. The NATS client in SIOT could not get data from the embedded NATS server. Again, race conditions were assumed so I tried an external vs. embedded NATS server – no difference.

I finally observed the network traffic with Wireshark and could not observe any NATS traffic in the failing version. The NATS client was not even sending any requests. I then looked at /etc/hosts and there was no localhost entry. After adding this, everything worked properly.

So why did the compiled binary fail when go run ... worked? My best guess is that the compiled binary had CGO disabled and go run does not. This results in the Go runtime using a different DNS resolver. Apparently, the Go DNS resolver does not resolve localhost if it is not in your /etc/hosts. (This is not verified, just my best guess at the moment).

On Unix systems, the resolver has two options for resolving names. It can use a pure Go resolver that sends DNS requests directly to the servers listed in /etc/resolv.conf, or it can use a cgo-based resolver that calls C library routines such as getaddrinfo and getnameinfo.

By default the pure Go resolver is used, because a blocked DNS request consumes only a goroutine, while a blocked C call consumes an operating system thread. When cgo is available, the cgo-based resolver is used instead under a variety of conditions: on systems that do not let programs make direct DNS requests (OS X), when the LOCALDOMAIN environment variable is present (even if empty), when the RES_OPTIONS or HOSTALIASES environment variable is non-empty, when the ASR_CONFIG environment variable is non-empty (OpenBSD only), when /etc/resolv.conf or /etc/nsswitch.conf specify the use of features that the Go resolver does not implement, and when the name being looked up ends in .local or is an mDNS name.

After discussing with @khem, we decided that instead of using localhost, 127.0.0.1 was probably more reliable. Interestingly, others have reached the same conclusion (see videos below). The reasons include:

  • lookup takes more time
  • localhost results in two IPs: IPv4 and IPv6
  • IPv6 is known to cause problems
  • some apps/machines don’t support localhost, or don’t have localhost configured correctly (like mine)

So to keep things simple and more likely to work in most cases, Simple IoT now sets the default NATS server to 127.0.0.1.

The pure Go resolver is pretty robust, especially when compared to other programming languages. In Node.js, for example, there’s a dns.lookup() function, which calls getaddrinfo on the Node.js thread pool and dns.resolve(), which uses the c-ares library under the hood to perform an actual DNS request over the network asynchronously. By default, they use the OS resolver (i.e. getaddrinfo) even though this can be more expensive and can fill the Node.js thread pool if many requests are issued simultaneously. My guess is that c-ares doesn’t have enough features to ensure that the DNS requests are fulfilled the same way the OS would fulfill them.

Anyway, the fact that the pure Go resolver is the default is amazing and unusual.

In any case, a DNS request for localhost should never work if it’s not present in /etc/hosts.

If you’re interested in seeing which resolver is being used, maybe try this:

A numeric netdns setting, as in GODEBUG=netdns=1, causes the resolver to print debugging information about its decisions. To force a particular resolver while also printing debugging information, join the two settings by a plus sign, as in GODEBUG=netdns=go+1.

… or just bind to 127.0.0.1 instead. :smile:

Thanks for the additional notes and insight.

I ran the compiled and siot_watch (which uses air) versions:

Compiled

[cbrake@quark go]$ siot_build

✓ Build successful! (2943ms)

go package net: cgo resolver not supported; using Go's DNS resolver

Notice, it prints out the message at build time.

Go run, air

When running SIOT with these methods, it does not say anything about the resolver.