[Dnsmasq-discuss] refused responses for simple hostnames, domain-needed, and no upstream servers

Legacy, Allain Allain.Legacy at windriver.com
Tue Jan 19 21:51:27 GMT 2016


> -----Original Message-----
> From: Dnsmasq-discuss [mailto:dnsmasq-discuss-
> bounces at thekelleys.org.uk] On Behalf Of Simon Kelley
> Sent: Tuesday, January 19, 2016 4:30 PM
> To: dnsmasq-discuss at thekelleys.org.uk
> Subject: Re: [Dnsmasq-discuss] refused responses for simple hostnames,
> domain-needed, and no upstream servers
> 
> On 17/01/16 23:18, Legacy, Allain wrote:
> > Hi, We have noticed an inconsistency in how dnsmasq responds to
> > queries for simple hostnames (no dots) depending on whether there are
> > any configured upstream servers or not.   I am unsure if this is
> > because we have misconfigured something, whether we are trying to do
> > something that is not supported (or shouldn't be attempted), or if
> > there is a bug in dnsmasq.
> >
> > The scenario we are trying to implement is as follows.
> >
> > +  We have a system with several nodes on the same private network.
> > Most of the nodes have addresses assigned by dnsmasq via DHCP while a
> > select few of those nodes have addresses in /etc/hosts on the node
> > running dnsmasq.
> >
> > +  The hostname of each node are simple hostnames with no domain
> > (e.g.,  "server1", "server2", etc. ).
> >
> > +  Some of the nodes have an IPv4 or IPv6 address while others have
> > both IPv4 and IPv6.
> >
> > +  Clients running on each node will attempt to resolve their peer
> > node names with commands such as "curl http://server1/foobar.txt",
> > "ping6 server10",  "dig server2 any", and so on.
> >
> > +  Clients have a simple /etc/resolv.conf file with only the IP
> > address of the server running dnsmasq.  The resolv.conf has no default
> > search domain.
> >
> > +  We support allowing the dnsmasq server to be configured with
> > additional upstream servers if the situation requires accessing DNS
> > over the system's public network interface.
> >
> > +  The dnsmasq server is configured with the "domain-needed" option
> > so that requests for nodes that have not been configured yet do not
> > get forwarded to upstream servers (if configured).
> >
> >
> > Here is the issue.
> >
> > When we test with only IPv4 address throughout the system everything
> > works as expected and we do not see any obvious issues or errors.
> >
> > When we test with a mixture of IPv4, IPv6 or both IPv4 and IPv6
> > addresses on the nodes we see failures to resolve our simple
> > hostnames.  The failures manifest themselves as typical "cannot
> > resolve hostname... " errors from whatever client is being run at the
> > time.   The failures don't happen on all nodes but we have been able
> > to correlate the failures to those nodes that have an IPv6 address
> > but have no IPv4 address.   ...and this only happens when we have no
> > upstream servers configured; if we configure some upstream servers
> > then there are no failures.
> >
> > Running tcpdump and strace on a commands such as "curl
> > http://server1/foobar.txt" we noticed that the client DNS resolver
> > sends out both an A query and AAAA query.  This is normal as we do not
> > want to force a "-4" or "-6" option on any clients as we want either
> > IPv4 or IPv6 addresses to be returned without needed to know
> > ahead of time what to ask for.    The tcpdump traces shows that a
> > response is returned for both the A and AAAA query.  The A has a
> > status of REFUSED while the AAAA has a valid response with the
> > expected IPv6 address.   Looking at the client DNS resolver code
> > (glibc getaddrinfo()) we have noted that if the first response
> > returned has a "REFUSED" response then the operation is aborted
> > without considering the AAAA response.
> >
> > Running this same test while we have upstream servers configured in
> > dnsmasq we have noted that the A query returns successfully with no
> > data (instead of REFUSED as in the first test), and the AAAA returns
> > successfully with an IPv6 address as it did before.  Under these
> > circumstances the client DNS resolver returns with the IPv6 address
> > instead of an error since it didn't get a REFUSED on the first
> > response received.
> >
> > Looking through the dnsmasq code we think we have identified a bug but
> > are looking for an opinion about whether we are doing something wrong
> > or whether this is a legitimate issue.
> >
> > What we think is a bug is that the OPT_NODOTS_LOCAL (domain-needed)
> is
> > only checked where there is at least 1 upstream server
> > (forward.c::search_servers()).    When there are servers and
> > OPT_NODOTS_LOCAL is set then an empty response is returned for an A
> > query that does not resolve to an IPv4 address.   Unfortunately, when
> > there are no servers configured this code is not reached and instead a
> > REFUSED is returned for an A query that has no IPv4 address.  It is
> > this REFUSED response that is causing grief at the client resolver.
> >
> >
> > It is my opinion that the check for OPT_NODOTS_LOCAL should be
> > performed in forward.c::receive_query() when an answer is not found by
> > forward.c::answer_query() instead of calling forward_query().  I have
> > attached a patch file which adds an additional IF statement at
> > the top of forward_query() to illustrate what I mean.   note:  as I
> > said, i believe the proper way to fix this is in receive_query()
> > before calling forward_query() at all, but it was easier to prototype
> > this directly inside of forward_query() since the reply code already
> > existed there.
> >
> > Can you comment on whether this is a configuration/usecase issue or
> > whether the behavior described requires a code a change?
> >
> > Regards, Allain
> >
> >
> 
> Well done for coming to terms with the most gnarly, old and horrible code in
> dnsmasq. I just bottled-out of totally rewriting this. It needs to be done, but
> just capturing all the existing behaviour is a nightmare.
> 
> I can't disagree with the bug report or diagnosis at all. My fix is a bit simpler, it
> just moved the test for daemon->servers being NULL to after the call to
> search_servers. Whilst looking at the code, I noticed that the response when
> out of memory is wrong too, so the commit also fixes that.
> 
> Code in the git repo now. Please could you check that it behaves as you
> expect?
> 

[AL]  Thanks.   I'll take a look in the next couple of days and get back to you.

FWIW, I found that using "local=//" in my /etc/dnsmasq.conf file also improved the behavior without including my code change at all. 

Regards,
Allain



More information about the Dnsmasq-discuss mailing list