[Dnsmasq-discuss] NXDOMAIN on exisiting A record

Wed Jul 10 10:31:54 BST 2019

Hello Alex,

I would try removing all-servers and clear-on-reload statements away. I
would use just one server for testing, retesting all of them for the
same behaviour. When you do not know which server is used, it is hard to
debug better.

I think dots in server=/.X/ are not necessary and maybe even misleading.
Try it without them, just server=/X/ip

I think one second timeout is too short. Just use only localhost in
/etc/resolv.conf and debug what happens with dnsmasq. Record what
queries are sent to dnsmasq and what dnsmasq forwards to configured servers.

Note I discovered already requests without recursion desired bit set are
forwarded always, do not serve any local records. But that should not be
the issue. Try dig +rec and dig +norec to rule it out.

Regards,
Petr

On 7/7/19 10:28 PM, Alex Litvak wrote:
> (luck of sleep, fixing some mistakes in text)
> 
> Hello everyone,
> 
> I run consul services on my network where services are registered with
> <xyz>.service.consul when they start.  All containers and bare metal
> hosts are running dnsmasq 2.80.
> I noticed that if I restart one of the containers, one of the hosts
> continue failing to resolve the service name.  I assume that dnsmasq is
> a culprit because:
> 
> 1. I can resolve service xyz.service.consul against standard dns servers
> with dig.
> 2. Dnsmasq listening on 127.0.0.1 is the first line in the resolve.conf
> and when I run tcpdump against port 53 on interface lo I see it returns
> NXDOMAIN on each A record query for service in question.
> 3. If I restart dnsmasq everything is back to normal again.  Even more
> weird, if I send SIGHUP to dnsmasq, which only causes a reread of
> /etc/hosts file, everything is back to normal as far as service
> resolution goes.
> 
> I have this problem only happening  on some hosts without the pattern I
> can recognize.  For example I have two nodes with the same config, os,
> kernel version, dnsmasq version, etc ... and one of them has the problem
> 100% after service xyz.service.consul restart and the other is not.
> 
> Where do I start troubleshooting? Any ideas are welcome.
> 
> Here is a standard dnsmasq confugration.
> 
> port=53
> domain-needed
> bogus-priv
> interface=lo
> listen-address=127.0.0.1
> no-dhcp-interface=127.0.0.1
> #bind-interfaces
> no-resolv
> all-servers
> dns-forward-max=500
> 
> # If you don't want dnsmasq to read /etc/hosts, uncomment the
> # following line.
> #no-hosts
> # or if you want it to read another file, as well as /etc/hosts, use
> # this.
> #addn-hosts=/etc/banner_add_hosts
> 
> #log-queries=extra
> #log-facility=/var/log/dnsmasq.log
> log-async=25
> 
> # Set the cachesize here.
> cache-size=10000
> min-cache-ttl=5
> #neg-ttl=3600
> 
> # If you want to disable negative caching, uncomment this.
> #no-negcache
> 
> # For debugging purposes, log each DNS query as it passes through
> # dnsmasq.
> #log-queries
> clear-on-reload
> 
> server=10.0.48.12
> server=10.0.48.11
> server=10.0.21.63
> server=10.0.21.61
> 
> server=/.la.consul/10.0.73.43
> server=/.la.consul/10.0.73.40
> server=/.la.consul/10.0.73.28
> server=/.chi-pbx.consul/10.1.73.1
> server=/.chi-pbx.consul/10.1.73.2
> server=/.chi-pbx.consul/10.1.73.3
> server=/.consul/10.0.73.43
> server=/.consul/10.0.73.40
> server=/.consul/10.0.73.28
> 
> Resolver config
> 
> search ''
> options  timeout:1 attempts:1
> nameserver 127.0.0.1
> nameserver 10.0.48.11
> nameserver 10.0.48.12
> nameserver 10.0.21.63
> 
> 
> 
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

-- 
Petr Menšík
Software Engineer
Red Hat, http://www.redhat.com/
email: pemensik at redhat.com  PGP: 65C6C973