[Dnsmasq-discuss] dnsmasq socket file disappearing

Simon Kelley simon at thekelleys.org.uk
Sun Feb 4 15:13:26 GMT 2018


You're using --bind-dynamic, so dnsmasq creates a set of listener
sockets, one for each if the addresses on the machine's interfaces. When
such an address is removed or created, dnsmasq gets an event via
netlink, and it does two things,

1) Enumerates the current set of addresses on the machines interfaces.
2) Updates the set of listening sockets to reflect that set: creating
new sockets for new addresses, and deleting sockets for addresses which
are no longer in use.

That process is an obvious potential source of the behaviour you're seeing.

I'm slightly confused that your description talks about THE socket file:
there should be one for each address possessed by the machine. To try
and get a handle on what's happening, we need  to see what's happening
to all the members of that set.

"DNS resolutions sent to system[dnsmasq] (127.0.0.1:53) time out"

implies that 127.0.1:53 the socket listening on 127.0.0.1 is going, but
are there still sockets listening on port 53 for other addresses, or are
all the UDP sockets going?

There is a second set of sockets listening on the same addresses/ports
for TCP connections. It would be interesting to see if the TCP sockets
go as well, or is it's only the UDP socket that disappears?

Is the network config of docker containers changed at any time? If so,
forcing that is an obvious way of trying to reproduce this problem.
Given that 127.0.0.1 is an address which dispappears, anything which
fiddles with the lo interface is of particular suspicion.



Cheers,


Simon.




On 31/01/18 21:12, Zi Dvbelju wrote:
> *
> 
> I’m experiencing an issue where allDNS resolutions sent to dnsmasq
> timeout, but only after the dnsmasq service has been successfullyrunning
> for a period of time (anecdotally, after a few weeks of time). After a
> lot of digging, I’ve discovered that dnsmasq’s UDP socket file will
> eventually “disappear”. The issue can be resolved by restarting the
> dnsmasq service.
> 
> 
> I haven’t been able to reproduce it yet, but it has happened numerous
> times on servers which are running dozens of docker containers. From
> what I know, nothing should be removing this socket file and I can’t
> find anything relevant in the dnsmasq logs. Is anyone aware of any
> situations that can cause socket files to disappear?
> 
> 
> Environment
> 
> Ubuntu 16.04.3 LTS
> 
> 8 Cores, 16GB of RAM
> 
> Dnsmasq 2.75-1ubuntu0.16.04.4
> 
> 
> Background
> 
> I’m using dnsmasq to forward requests to Consul
> <https://www.consul.io/docs/guides/forwarding.html>, which is used for
> service discovery. The Consul agent listens on port 8600 and is
> configured to bind to allinterfaces (the relevant interface here is
> 172.17.0.1, which docker creates).  
> 
> 
> Resolv.conf
> 
> ```
> 
> # Dynamic resolv.conf(5) file for glibc resolver(3) generated by
> resolvconf(8)
> 
> nameserver 127.0.0.1
> 
> ```
> 
> 
> Dnsmasq.conf
> 
> ```
> 
> server=/consul/172.17.0.1#8600 <http://172.17.0.1/#8600>
> 
> server=/10.in-addr.arpa/172.17.0.1#8600 <http://172.17.0.1/#8600>
> 
> bind-dynamic
> 
> ```
> 
> 
> Systemd config for Docker
> 
> ```
> 
> ExecStart=/usr/bin/dockerd --bip=172.17.0.1/24 <http://172.17.0.1/24>
> --dns=172.17.0.1 -H fd://
> 
> ```
> 
> While investigating the servers in the broken state, I observed the
> following:
> 
> 
>   *
> 
>     nslookup / dig DNS resolutions are timing out
> 
>   *
> 
>     Docker logs show containers are also timing out on DNS resolutions
> 
>   *
> 
>     Systemd reports that dnsmasq is still running, pid still exists
> 
>   *
> 
>     DNS resolutions sent directly to the consul agent (127.0.0.1:8600
>     <http://127.0.0.1:8600/>) succeed
> 
>   *
> 
>     DNS resolutions sent to system[dnsmasq] (127.0.0.1:53
>     <http://127.0.0.1:53/>) time out
> 
>   *
> 
>     IPV6 UDP (::1) resolutions sent to dnsmasq succeeded
> 
>   *
> 
>     Netstat shows that the IPV4 UDP socket file for dnsmasq is missing
> 
>   *
> 
>     No relevant messages in kernel log (specifically, no dnsmasq OOM
>     kill events)
> 
>   *
> 
>     File descriptor usage for the entire server was normal
> 
>   *
> 
>     File descriptor usage for the individual dnsmasq process was normal
> 
>   *
> 
>     CPU, RAM, and storage all look good
> 
> 
> Thanks in advance for any discussion at all - I've been really
> struggling with this one for a while now.
> 
> 
> Zach
> 
> *
> 
> 
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> 




More information about the Dnsmasq-discuss mailing list