Hello Alex,

I would try removing all-servers and clear-on-reload statements away. I
would use just one server for testing, retesting all of them for the
same behaviour. When you do not know which server is used, it is hard to
debug better.

I think dots in server=/.X/ are not necessary and maybe even misleading.
Try it without them, just server=/X/ip

I think one second timeout is too short. Just use only localhost in
/etc/resolv.conf and debug what happens with dnsmasq. Record what
queries are sent to dnsmasq and what dnsmasq forwards to configured servers.

Note I discovered already requests without recursion desired bit set are
forwarded always, do not serve any local records. But that should not be
the issue. Try dig +rec and dig +norec to rule it out.


On 7/7/19 10:28 PM, Alex Litvak wrote:
> (luck of sleep, fixing some mistakes in text)
> Hello everyone,
> I run consul services on my network where services are registered with
> <xyz>.service.consul when they start.  All containers and bare metal
> hosts are running dnsmasq 2.80.
> I noticed that if I restart one of the containers, one of the hosts
> continue failing to resolve the service name.  I assume that dnsmasq is
> a culprit because:
> 1. I can resolve service xyz.service.consul against standard dns servers
> with dig.
> 2. Dnsmasq listening on is the first line in the resolve.conf
> and when I run tcpdump against port 53 on interface lo I see it returns
> NXDOMAIN on each A record query for service in question.
> 3. If I restart dnsmasq everything is back to normal again.  Even more
> weird, if I send SIGHUP to dnsmasq, which only causes a reread of
> /etc/hosts file, everything is back to normal as far as service
> resolution goes.
> I have this problem only happening  on some hosts without the pattern I
> can recognize.  For example I have two nodes with the same config, os,
> kernel version, dnsmasq version, etc ... and one of them has the problem
> 100% after service xyz.service.consul restart and the other is not.
> Where do I start troubleshooting? Any ideas are welcome.
> Here is a standard dnsmasq confugration.
> port=53
> domain-needed
> bogus-priv
> interface=lo
> listen-address=
> no-dhcp-interface=
> #bind-interfaces
> no-resolv
> all-servers
> dns-forward-max=500
> # If you don't want dnsmasq to read /etc/hosts, uncomment the
> # following line.
> #no-hosts
> # or if you want it to read another file, as well as /etc/hosts, use
> # this.
> #addn-hosts=/etc/banner_add_hosts
> #log-queries=extra
> #log-facility=/var/log/dnsmasq.log
> log-async=25
> # Set the cachesize here.
> cache-size=10000
> min-cache-ttl=5
> #neg-ttl=3600
> # If you want to disable negative caching, uncomment this.
> #no-negcache
> # For debugging purposes, log each DNS query as it passes through
> # dnsmasq.
> #log-queries
> clear-on-reload
> server=
> server=
> server=
> server=
> server=/.la.consul/
> server=/.la.consul/
> server=/.la.consul/
> server=/.chi-pbx.consul/
> server=/.chi-pbx.consul/
> server=/.chi-pbx.consul/
> server=/.consul/
> server=/.consul/
> server=/.consul/
> Resolver config
> search ''
> options  timeout:1 attempts:1
> nameserver
> nameserver
> nameserver
> nameserver
