<div dir="ltr">Hey Simon,<br><br>Thanks for looking into this. It just happened again and I have more data. <div><br></div><div>To clarify a few things, docker is configured to send *all* dns requests to <span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><a href="http://172.17.0.1:53">172.17.0.1:53</a> (docker itself creates the 172.17.0.1 interface)</span><b style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">.</b><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"> I implemented dnsmasq with bind-dynamic because docker would typically start </span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><i>after</i></span><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"> dnsmasq; dnsmasq would never get a chance to bind to 172.17.0.1. The ipv6 sockets that dnsmasq binds to (shown below) are never actually utilized by anything in my system, just a side effect of the current configuration. </span></div><div><br></div><div>Sample ifconfig</div><div>```</div><div><div>docker0 Link encap:Ethernet HWaddr 02:42:ea:b9:37:ce</div><div> inet addr:172.17.0.1 Bcast:172.17.0.255 Mask:255.255.255.0</div><div> inet6 addr: fe80::42:eaff:feb9:37ce/64 Scope:Link</div><div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div><div> RX packets:1137922568 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:1183858585 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:0</div><div> RX bytes:694264821785 (694.2 GB) TX bytes:731462199754 (731.4 GB)</div><div><br></div><div>ens5 Link encap:Ethernet HWaddr 02:f3:39:28:91:08</div><div> inet addr:10.3.4.228 Bcast:10.3.4.255 Mask:255.255.255.0</div><div> inet6 addr: fe80::f3:39ff:fe28:9108/64 Scope:Link</div><div> UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1</div><div> RX packets:1414749050 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:1409286954 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:1000</div><div> RX bytes:746355077483 (746.3 GB) TX bytes:716453625946 (716.4 GB)</div><div><br></div><div>lo Link encap:Local Loopback</div><div> inet addr:127.0.0.1 Mask:255.0.0.0</div><div> inet6 addr: ::1/128 Scope:Host</div><div> UP LOOPBACK RUNNING MTU:65536 Metric:1</div><div> RX packets:39175242 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:39175242 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:1</div><div> RX bytes:15795890457 (15.7 GB) TX bytes:15795890457 (15.7 GB)</div></div><div><br></div><div>```</div><div><br>Failed Node (<span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">netstat)</span><div>```</div><div><div>udp6 0 0 fe80::888a:23ff:fe7a:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::d817:59ff:feb4:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::7c83:e5ff:fe73:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::387c:fff:fe8b::53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::844c:15ff:fe94:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::8402:6bff:fefa:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::740e:c1ff:feef:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::f841:8fff:fefb:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::d0b3:42ff:fe20:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::984b:45ff:fe2b:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::e4cc:c4ff:fe87:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::704d:afff:fec4:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::707c:7eff:fea2:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::c8c3:70ff:fe8c:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::506b:10ff:fe1a:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::88c:2eff:fe38::53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::4478:d5ff:fea0:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::1c34:96ff:fe2c:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::1079:21ff:fe36:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::4dc:53ff:fe8a::53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::c851:e0ff:fe4c:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::4c0:e3ff:fe15::53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::fcca:afff:fe23:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::a067:8cff:fefb:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::1081:54ff:feca:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::fc67:89ff:fe1a:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::a892:65ff:fe5a:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::689e:e5ff:fe4d:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::9cc5:59ff:fe94:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::f031:faff:feb8:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::ccf2:e4ff:fe3f:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::5c66:efff:fe89:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::2caf:fcff:fe07:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::ac43:e1ff:fe44:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::1c3f:ccff:fe94:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::2ca6:27ff:feac:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::dc16:3bff:fe2b:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::58e8:ddff:fe94:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::9c60:f7ff:fe12:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::d02d:a9ff:fe52:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::42:eaff:feb9:3:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::c840:a8ff:fe6d:53 :::* 1359/dnsmasq</div><div>udp6 0 0 ::1:53 :::* 1359/dnsmasq</div><div>udp6 0 0 fe80::f3:39ff:fe28:9:53 :::* 1359/dnsmasq</div></div><div>```</div><div><br></div><div>Healthy Node (netstat)</div><div>```</div><div><div><b>udp 0 0 <a href="http://127.0.0.1:53">127.0.0.1:53</a> 0.0.0.0:* 1400/dnsmasq</b></div><div><b>udp 0 0 <a href="http://10.3.1.79:53">10.3.1.79:53</a> 0.0.0.0:* 1400/dnsmasq</b></div><div><b>udp 0 0 <a href="http://172.17.0.1:53">172.17.0.1:53</a> 0.0.0.0:* 1400/dnsmasq</b></div><div>udp6 0 0 fe80::54d7:eff:fe7a::53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::6ccd:80ff:fe39:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::b870:1dff:fe4f:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::b8f9:20ff:fe9b:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::7416:9bff:fed7:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::94d7:29ff:fe2a:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::b4e1:a0ff:fe7e:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::84b8:20ff:feb2:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::603f:dbff:feec:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::284f:5fff:fe4f:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::100b:7aff:fe23:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::b088:7eff:fe20:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::4434:10ff:fe9e:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::f052:16ff:fe10:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::789b:19ff:fefd:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::c66:baff:fe98::53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::f08c:f7ff:fe7d:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::9898:dff:fe21::53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::3847:85ff:fe30:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::98cd:23ff:fe7f:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::c9b:6bff:fed6::53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::b002:f6ff:fe8a:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::8cf8:63ff:fe9a:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::4095:1eff:fe09:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::e8cf:10ff:fe32:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::fca4:12ff:fed7:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::d4e6:d4ff:fe7e:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::8c74:57ff:fe24:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::c0bb:f3ff:fe49:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::b462:beff:fe75:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::c816:f6ff:fec8:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::409f:6dff:feb8:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::30b8:caff:fec8:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::42:12ff:fe82:5:53 :::* 1400/dnsmasq</div><div>udp6 0 0 ::1:53 :::* 1400/dnsmasq</div><div>udp6 0 0 fe80::468:88ff:fe00::53 :::* 1400/dnsmasq</div></div><div><br></div><div>```</div></div><div><br></div><div>The above 'udp' interfaces (not udp6) are actually the only ones I care about - these are the ones that are mysteriously disappearing.</div><div><br></div><div>Failed Node Logs</div><div>```</div><div><div>/var/log/syslog.48:2018-02-03T13:36:23.540231+00:00 dnsmasq[1359]: failed to create listening socket for fe80::fc42:b2ff:fe85:7702: No such device</div><div>/var/log/syslog.48:2018-02-03T13:36:23.540442+00:00 dnsmasq[1359]: failed to create listening socket for fe80::fc42:b2ff:fe85:7702: No such device</div><div>/var/log/syslog.69:2018-02-02T16:56:16.907906+00:00 dnsmasq[1359]: failed to create listening socket for fe80::dcb4:e5ff:fe94:9b31: No such device</div><div>/var/log/syslog.69:2018-02-02T16:56:16.908069+00:00 dnsmasq[1359]: failed to create listening socket for fe80::dcb4:e5ff:fe94:9b31: No such device</div><div>/var/log/syslog.7:2018-02-05T06:05:31.066648+00:00 dnsmasq[1359]: failed to create listening socket for <a href="http://172.17.0.1">172.17.0.1</a>: Address already in use</div><div>/var/log/syslog.7:2018-02-05T06:05:31.066813+00:00 dnsmasq[1359]: failed to create listening socket for <a href="http://10.3.4.228">10.3.4.228</a>: Address already in use</div><div>/var/log/syslog.7:2018-02-05T06:05:31.066917+00:00 dnsmasq[1359]: failed to create listening socket for <a href="http://127.0.0.1">127.0.0.1</a>: Address already in use</div></div><div>```</div><div><br></div><div>Hopefully this helps clarify what I'm seeing. Please let me know if you need any additional information.<br><br>Thanks,<br>Zach</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Feb 4, 2018 at 10:13 AM, Simon Kelley <span dir="ltr"><<a href="mailto:simon@thekelleys.org.uk" target="_blank">simon@thekelleys.org.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">You're using --bind-dynamic, so dnsmasq creates a set of listener<br>
sockets, one for each if the addresses on the machine's interfaces. When<br>
such an address is removed or created, dnsmasq gets an event via<br>
netlink, and it does two things,<br>
<br>
1) Enumerates the current set of addresses on the machines interfaces.<br>
2) Updates the set of listening sockets to reflect that set: creating<br>
new sockets for new addresses, and deleting sockets for addresses which<br>
are no longer in use.<br>
<br>
That process is an obvious potential source of the behaviour you're seeing.<br>
<br>
I'm slightly confused that your description talks about THE socket file:<br>
there should be one for each address possessed by the machine. To try<br>
and get a handle on what's happening, we need to see what's happening<br>
to all the members of that set.<br>
<span class=""><br>
"DNS resolutions sent to system[dnsmasq] (<a href="http://127.0.0.1:53" rel="noreferrer" target="_blank">127.0.0.1:53</a>) time out"<br>
<br>
</span>implies that 127.0.1:53 the socket listening on 127.0.0.1 is going, but<br>
are there still sockets listening on port 53 for other addresses, or are<br>
all the UDP sockets going?<br>
<br>
There is a second set of sockets listening on the same addresses/ports<br>
for TCP connections. It would be interesting to see if the TCP sockets<br>
go as well, or is it's only the UDP socket that disappears?<br>
<br>
Is the network config of docker containers changed at any time? If so,<br>
forcing that is an obvious way of trying to reproduce this problem.<br>
Given that 127.0.0.1 is an address which dispappears, anything which<br>
fiddles with the lo interface is of particular suspicion.<br>
<br>
<br>
<br>
Cheers,<br>
<br>
<br>
Simon.<br>
<br>
<br>
<br>
<br>
On 31/01/18 21:12, Zi Dvbelju wrote:<br>
> *<br>
><br>
> I’m experiencing an issue where allDNS resolutions sent to dnsmasq<br>
<span class="">> timeout, but only after the dnsmasq service has been successfullyrunning<br>
> for a period of time (anecdotally, after a few weeks of time). After a<br>
> lot of digging, I’ve discovered that dnsmasq’s UDP socket file will<br>
> eventually “disappear”. The issue can be resolved by restarting the<br>
> dnsmasq service.<br>
><br>
><br>
> I haven’t been able to reproduce it yet, but it has happened numerous<br>
> times on servers which are running dozens of docker containers. From<br>
> what I know, nothing should be removing this socket file and I can’t<br>
> find anything relevant in the dnsmasq logs. Is anyone aware of any<br>
> situations that can cause socket files to disappear?<br>
><br>
><br>
> Environment<br>
><br>
> Ubuntu 16.04.3 LTS<br>
><br>
> 8 Cores, 16GB of RAM<br>
><br>
> Dnsmasq 2.75-1ubuntu0.16.04.4<br>
><br>
><br>
> Background<br>
><br>
> I’m using dnsmasq to forward requests to Consul<br>
</span>> <<a href="https://www.consul.io/docs/guides/forwarding.html" rel="noreferrer" target="_blank">https://www.consul.io/docs/<wbr>guides/forwarding.html</a>>, which is used for<br>
<span class="">> service discovery. The Consul agent listens on port 8600 and is<br>
</span>> configured to bind to allinterfaces (the relevant interface here is<br>
<span class="">> 172.17.0.1, which docker creates). <br>
><br>
><br>
> Resolv.conf<br>
><br>
> ```<br>
><br>
> # Dynamic resolv.conf(5) file for glibc resolver(3) generated by<br>
> resolvconf(8)<br>
><br>
> nameserver 127.0.0.1<br>
><br>
> ```<br>
><br>
><br>
> Dnsmasq.conf<br>
><br>
> ```<br>
><br>
</span>> server=/consul/<a href="http://172.17.0.1#8600" rel="noreferrer" target="_blank">172.17.0.1#8600</a> <<a href="http://172.17.0.1/#8600" rel="noreferrer" target="_blank">http://172.17.0.1/#8600</a>><br>
><br>
> server=/10.in-addr.arpa/<a href="http://172.17.0.1#8600" rel="noreferrer" target="_blank">172.<wbr>17.0.1#8600</a> <<a href="http://172.17.0.1/#8600" rel="noreferrer" target="_blank">http://172.17.0.1/#8600</a>><br>
<span class="">><br>
> bind-dynamic<br>
><br>
> ```<br>
><br>
><br>
> Systemd config for Docker<br>
><br>
> ```<br>
><br>
</span>> ExecStart=/usr/bin/dockerd --bip=<a href="http://172.17.0.1/24" rel="noreferrer" target="_blank">172.17.0.1/24</a> <<a href="http://172.17.0.1/24" rel="noreferrer" target="_blank">http://172.17.0.1/24</a>><br>
<span class="">> --dns=172.17.0.1 -H fd://<br>
><br>
> ```<br>
><br>
> While investigating the servers in the broken state, I observed the<br>
> following:<br>
><br>
><br>
</span>> *<br>
<span class="">><br>
> nslookup / dig DNS resolutions are timing out<br>
><br>
</span>> *<br>
<span class="">><br>
> Docker logs show containers are also timing out on DNS resolutions<br>
><br>
</span>> *<br>
<span class="">><br>
> Systemd reports that dnsmasq is still running, pid still exists<br>
><br>
</span>> *<br>
<span class="">><br>
> DNS resolutions sent directly to the consul agent (<a href="http://127.0.0.1:8600" rel="noreferrer" target="_blank">127.0.0.1:8600</a><br>
</span>> <<a href="http://127.0.0.1:8600/" rel="noreferrer" target="_blank">http://127.0.0.1:8600/</a>>) succeed<br>
><br>
> *<br>
<span class="">><br>
> DNS resolutions sent to system[dnsmasq] (<a href="http://127.0.0.1:53" rel="noreferrer" target="_blank">127.0.0.1:53</a><br>
</span>> <<a href="http://127.0.0.1:53/" rel="noreferrer" target="_blank">http://127.0.0.1:53/</a>>) time out<br>
><br>
> *<br>
<span class="">><br>
> IPV6 UDP (::1) resolutions sent to dnsmasq succeeded<br>
><br>
</span>> *<br>
<span class="">><br>
> Netstat shows that the IPV4 UDP socket file for dnsmasq is missing<br>
><br>
</span>> *<br>
<span class="">><br>
> No relevant messages in kernel log (specifically, no dnsmasq OOM<br>
> kill events)<br>
><br>
</span>> *<br>
<span class="">><br>
> File descriptor usage for the entire server was normal<br>
><br>
</span>> *<br>
<span class="">><br>
> File descriptor usage for the individual dnsmasq process was normal<br>
><br>
</span>> *<br>
<span class="">><br>
> CPU, RAM, and storage all look good<br>
><br>
><br>
> Thanks in advance for any discussion at all - I've been really<br>
> struggling with this one for a while now.<br>
><br>
><br>
> Zach<br>
><br>
</span>> *<br>
<div class="HOEnZb"><div class="h5">><br>
><br>
> ______________________________<wbr>_________________<br>
> Dnsmasq-discuss mailing list<br>
> <a href="mailto:Dnsmasq-discuss@lists.thekelleys.org.uk">Dnsmasq-discuss@lists.<wbr>thekelleys.org.uk</a><br>
> <a href="http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss" rel="noreferrer" target="_blank">http://lists.thekelleys.org.<wbr>uk/mailman/listinfo/dnsmasq-<wbr>discuss</a><br>
><br>
<br>
<br>
______________________________<wbr>_________________<br>
Dnsmasq-discuss mailing list<br>
<a href="mailto:Dnsmasq-discuss@lists.thekelleys.org.uk">Dnsmasq-discuss@lists.<wbr>thekelleys.org.uk</a><br>
<a href="http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss" rel="noreferrer" target="_blank">http://lists.thekelleys.org.<wbr>uk/mailman/listinfo/dnsmasq-<wbr>discuss</a><br>
</div></div></blockquote></div><br></div>