[Dnsmasq-discuss] refused responses for simple hostnames, domain-needed, and no upstream servers

Simon Kelley simon at thekelleys.org.uk
Tue Jan 19 21:30:07 GMT 2016


On 17/01/16 23:18, Legacy, Allain wrote:
> Hi, We have noticed an inconsistency in how dnsmasq responds to
> queries for simple hostnames (no dots) depending on whether there are
> any configured upstream servers or not.   I am unsure if this is
> because we have misconfigured something, whether we are trying to do
> something that is not supported (or shouldn't be attempted), or if
> there is a bug in dnsmasq.
> 
> The scenario we are trying to implement is as follows.
> 
> +  We have a system with several nodes on the same private network.
> Most of the nodes have addresses assigned by dnsmasq via DHCP while a
> select few of those nodes have addresses in /etc/hosts on the node
> running dnsmasq.
> 
> +  The hostname of each node are simple hostnames with no domain
> (e.g.,  "server1", "server2", etc. ).
> 
> +  Some of the nodes have an IPv4 or IPv6 address while others have
> both IPv4 and IPv6.
> 
> +  Clients running on each node will attempt to resolve their peer
> node names with commands such as "curl http://server1/foobar.txt",
> "ping6 server10",  "dig server2 any", and so on.
> 
> +  Clients have a simple /etc/resolv.conf file with only the IP
> address of the server running dnsmasq.  The resolv.conf has no
> default search domain.
> 
> +  We support allowing the dnsmasq server to be configured with
> additional upstream servers if the situation requires accessing DNS
> over the system's public network interface.
> 
> +  The dnsmasq server is configured with the "domain-needed" option
> so that requests for nodes that have not been configured yet do not
> get forwarded to upstream servers (if configured).
> 
> 
> Here is the issue.
> 
> When we test with only IPv4 address throughout the system everything
> works as expected and we do not see any obvious issues or errors.
> 
> When we test with a mixture of IPv4, IPv6 or both IPv4 and IPv6
> addresses on the nodes we see failures to resolve our simple
> hostnames.  The failures manifest themselves as typical "cannot
> resolve hostname... " errors from whatever client is being run at the
> time.   The failures don't happen on all nodes but we have been able
> to correlate the failures to those nodes that have an IPv6 address
> but have no IPv4 address.   ...and this only happens when we have no
> upstream servers configured; if we configure some upstream servers
> then there are no failures.
> 
> Running tcpdump and strace on a commands such as "curl
> http://server1/foobar.txt" we noticed that the client DNS resolver
> sends out both an A query and AAAA query.  This is normal as we do
> not want to force a "-4" or "-6" option on any clients as we want
> either IPv4 or IPv6 addresses to be returned without needed to know
> ahead of time what to ask for.    The tcpdump traces shows that a
> response is returned for both the A and AAAA query.  The A has a
> status of REFUSED while the AAAA has a valid response with the
> expected IPv6 address.   Looking at the client DNS resolver code
> (glibc getaddrinfo()) we have noted that if the first response
> returned has a "REFUSED" response then the operation is aborted
> without considering the AAAA response.
> 
> Running this same test while we have upstream servers configured in
> dnsmasq we have noted that the A query returns successfully with no
> data (instead of REFUSED as in the first test), and the AAAA returns
> successfully with an IPv6 address as it did before.  Under these
> circumstances the client DNS resolver returns with the IPv6 address
> instead of an error since it didn't get a REFUSED on the first
> response received.
> 
> Looking through the dnsmasq code we think we have identified a bug
> but are looking for an opinion about whether we are doing something
> wrong or whether this is a legitimate issue.
> 
> What we think is a bug is that the OPT_NODOTS_LOCAL (domain-needed)
> is only checked where there is at least 1 upstream server
> (forward.c::search_servers()).    When there are servers and
> OPT_NODOTS_LOCAL is set then an empty response is returned for an A
> query that does not resolve to an IPv4 address.   Unfortunately, when
> there are no servers configured this code is not reached and instead
> a REFUSED is returned for an A query that has no IPv4 address.  It is
> this REFUSED response that is causing grief at the client resolver.
> 
> 
> It is my opinion that the check for OPT_NODOTS_LOCAL should be
> performed in forward.c::receive_query() when an answer is not found
> by forward.c::answer_query() instead of calling forward_query().  I
> have attached a patch file which adds an additional IF statement at
> the top of forward_query() to illustrate what I mean.   note:  as I
> said, i believe the proper way to fix this is in receive_query()
> before calling forward_query() at all, but it was easier to prototype
> this directly inside of forward_query() since the reply code already
> existed there.
> 
> Can you comment on whether this is a configuration/usecase issue or
> whether the behavior described requires a code a change?
> 
> Regards, Allain
> 
> 

Well done for coming to terms with the most gnarly, old and horrible
code in dnsmasq. I just bottled-out of totally rewriting this. It needs
to be done, but just capturing all the existing behaviour is a nightmare.

I can't disagree with the bug report or diagnosis at all. My fix is a
bit simpler, it just moved the test for daemon->servers being NULL to
after the call to search_servers. Whilst looking at the code, I noticed
that the response when out of memory is wrong too, so the commit also
fixes that.

Code in the git repo now. Please could you check that it behaves as you
expect?


Cheers,

Simon.



More information about the Dnsmasq-discuss mailing list