[Dnsmasq-discuss] dnsmasq socket file disappearing

Simon Kelley simon at thekelleys.org.uk
Tue Feb 6 18:26:47 GMT 2018


That's useful, thank you. I think we're on the right lines that
dnsmasq's response to a change in the interface configuration is where
it's all going wrong.


Am I correct that this event is happening long after dnsmasq starts?

Do you have anything in the log just before this happens to indicate
that some other part of the system may be reconfiguring the networking,
ie adding or removing addresses from interfaces?


Cheers,

Simon.


On 05/02/18 15:43, Zi Dvbelju wrote:
> Hey Simon,
> 
> Thanks for looking into this. It just happened again and I have more data. 
> 
> To clarify a few things, docker is configured to send *all* dns requests
> to 172.17.0.1:53 <http://172.17.0.1:53> (docker itself creates the
> 172.17.0.1 interface)*.* I implemented dnsmasq with bind-dynamic because
> docker would typically start /after/dnsmasq; dnsmasq would never get a
> chance to bind to 172.17.0.1. The ipv6 sockets that dnsmasq binds to
> (shown below) are never actually utilized by anything in my system, just
> a side effect of the current configuration. 
> 
> Sample ifconfig
> ```
> docker0   Link encap:Ethernet  HWaddr 02:42:ea:b9:37:ce
>           inet addr:172.17.0.1  Bcast:172.17.0.255  Mask:255.255.255.0
>           inet6 addr: fe80::42:eaff:feb9:37ce/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1137922568 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1183858585 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:694264821785 (694.2 GB)  TX bytes:731462199754 (731.4 GB)
> 
> ens5      Link encap:Ethernet  HWaddr 02:f3:39:28:91:08
>           inet addr:10.3.4.228  Bcast:10.3.4.255  Mask:255.255.255.0
>           inet6 addr: fe80::f3:39ff:fe28:9108/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
>           RX packets:1414749050 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1409286954 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:746355077483 (746.3 GB)  TX bytes:716453625946 (716.4 GB)
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:65536  Metric:1
>           RX packets:39175242 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:39175242 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1
>           RX bytes:15795890457 (15.7 GB)  TX bytes:15795890457 (15.7 GB)
> 
> ```
> 
> Failed Node (netstat)
> ```
> udp6       0      0 fe80::888a:23ff:fe7a:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::d817:59ff:feb4:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::7c83:e5ff:fe73:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::387c:fff:fe8b::53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::844c:15ff:fe94:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::8402:6bff:fefa:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::740e:c1ff:feef:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::f841:8fff:fefb:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::d0b3:42ff:fe20:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::984b:45ff:fe2b:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::e4cc:c4ff:fe87:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::704d:afff:fec4:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::707c:7eff:fea2:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::c8c3:70ff:fe8c:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::506b:10ff:fe1a:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::88c:2eff:fe38::53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::4478:d5ff:fea0:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::1c34:96ff:fe2c:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::1079:21ff:fe36:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::4dc:53ff:fe8a::53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::c851:e0ff:fe4c:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::4c0:e3ff:fe15::53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::fcca:afff:fe23:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::a067:8cff:fefb:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::1081:54ff:feca:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::fc67:89ff:fe1a:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::a892:65ff:fe5a:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::689e:e5ff:fe4d:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::9cc5:59ff:fe94:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::f031:faff:feb8:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::ccf2:e4ff:fe3f:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::5c66:efff:fe89:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::2caf:fcff:fe07:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::ac43:e1ff:fe44:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::1c3f:ccff:fe94:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::2ca6:27ff:feac:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::dc16:3bff:fe2b:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::58e8:ddff:fe94:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::9c60:f7ff:fe12:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::d02d:a9ff:fe52:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::42:eaff:feb9:3:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 fe80::c840:a8ff:fe6d:53 :::*                       
>         1359/dnsmasq
> udp6       0      0 ::1:53                  :::*                       
>                   1359/dnsmasq
> udp6       0      0 fe80::f3:39ff:fe28:9:53 :::*                       
>         1359/dnsmasq
> ```
> 
> Healthy Node (netstat)
> ```
> *udp        0      0 127.0.0.1:53 <http://127.0.0.1:53>           
> 0.0.0.0:*                           1400/dnsmasq*
> *udp        0      0 10.3.1.79:53 <http://10.3.1.79:53>           
> 0.0.0.0:*                           1400/dnsmasq*
> *udp        0      0 172.17.0.1:53 <http://172.17.0.1:53>         
>  0.0.0.0:*                           1400/dnsmasq*
> udp6       0      0 fe80::54d7:eff:fe7a::53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::6ccd:80ff:fe39:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::b870:1dff:fe4f:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::b8f9:20ff:fe9b:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::7416:9bff:fed7:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::94d7:29ff:fe2a:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::b4e1:a0ff:fe7e:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::84b8:20ff:feb2:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::603f:dbff:feec:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::284f:5fff:fe4f:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::100b:7aff:fe23:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::b088:7eff:fe20:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::4434:10ff:fe9e:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::f052:16ff:fe10:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::789b:19ff:fefd:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::c66:baff:fe98::53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::f08c:f7ff:fe7d:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::9898:dff:fe21::53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::3847:85ff:fe30:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::98cd:23ff:fe7f:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::c9b:6bff:fed6::53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::b002:f6ff:fe8a:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::8cf8:63ff:fe9a:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::4095:1eff:fe09:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::e8cf:10ff:fe32:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::fca4:12ff:fed7:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::d4e6:d4ff:fe7e:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::8c74:57ff:fe24:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::c0bb:f3ff:fe49:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::b462:beff:fe75:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::c816:f6ff:fec8:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::409f:6dff:feb8:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::30b8:caff:fec8:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 fe80::42:12ff:fe82:5:53 :::*                       
>         1400/dnsmasq
> udp6       0      0 ::1:53                  :::*                       
>                  1400/dnsmasq
> udp6       0      0 fe80::468:88ff:fe00::53 :::*                       
>         1400/dnsmasq
> 
> ```
> 
> The above 'udp' interfaces (not udp6) are actually the only ones I care
> about - these are the ones that are mysteriously disappearing.
> 
> Failed Node Logs
> ```
> /var/log/syslog.48:2018-02-03T13:36:23.540231+00:00 dnsmasq[1359]:
> failed to create listening socket for fe80::fc42:b2ff:fe85:7702: No such
> device
> /var/log/syslog.48:2018-02-03T13:36:23.540442+00:00 dnsmasq[1359]:
> failed to create listening socket for fe80::fc42:b2ff:fe85:7702: No such
> device
> /var/log/syslog.69:2018-02-02T16:56:16.907906+00:00 dnsmasq[1359]:
> failed to create listening socket for fe80::dcb4:e5ff:fe94:9b31: No such
> device
> /var/log/syslog.69:2018-02-02T16:56:16.908069+00:00 dnsmasq[1359]:
> failed to create listening socket for fe80::dcb4:e5ff:fe94:9b31: No such
> device
> /var/log/syslog.7:2018-02-05T06:05:31.066648+00:00 dnsmasq[1359]: failed
> to create listening socket for 172.17.0.1 <http://172.17.0.1>: Address
> already in use
> /var/log/syslog.7:2018-02-05T06:05:31.066813+00:00 dnsmasq[1359]: failed
> to create listening socket for 10.3.4.228 <http://10.3.4.228>: Address
> already in use
> /var/log/syslog.7:2018-02-05T06:05:31.066917+00:00 dnsmasq[1359]: failed
> to create listening socket for 127.0.0.1 <http://127.0.0.1>: Address
> already in use
> ```
> 
> Hopefully this helps clarify what I'm seeing. Please let me know if you
> need any additional information.
> 
> Thanks,
> Zach
> 
> 
> On Sun, Feb 4, 2018 at 10:13 AM, Simon Kelley <simon at thekelleys.org.uk
> <mailto:simon at thekelleys.org.uk>> wrote:
> 
>     You're using --bind-dynamic, so dnsmasq creates a set of listener
>     sockets, one for each if the addresses on the machine's interfaces. When
>     such an address is removed or created, dnsmasq gets an event via
>     netlink, and it does two things,
> 
>     1) Enumerates the current set of addresses on the machines interfaces.
>     2) Updates the set of listening sockets to reflect that set: creating
>     new sockets for new addresses, and deleting sockets for addresses which
>     are no longer in use.
> 
>     That process is an obvious potential source of the behaviour you're
>     seeing.
> 
>     I'm slightly confused that your description talks about THE socket file:
>     there should be one for each address possessed by the machine. To try
>     and get a handle on what's happening, we need  to see what's happening
>     to all the members of that set.
> 
>     "DNS resolutions sent to system[dnsmasq] (127.0.0.1:53
>     <http://127.0.0.1:53>) time out"
> 
>     implies that 127.0.1:53 the socket listening on 127.0.0.1 is going, but
>     are there still sockets listening on port 53 for other addresses, or are
>     all the UDP sockets going?
> 
>     There is a second set of sockets listening on the same addresses/ports
>     for TCP connections. It would be interesting to see if the TCP sockets
>     go as well, or is it's only the UDP socket that disappears?
> 
>     Is the network config of docker containers changed at any time? If so,
>     forcing that is an obvious way of trying to reproduce this problem.
>     Given that 127.0.0.1 is an address which dispappears, anything which
>     fiddles with the lo interface is of particular suspicion.
> 
> 
> 
>     Cheers,
> 
> 
>     Simon.
> 
> 
> 
> 
>     On 31/01/18 21:12, Zi Dvbelju wrote:
>     > *
>     >
>     > I’m experiencing an issue where allDNS resolutions sent to dnsmasq
>     > timeout, but only after the dnsmasq service has been successfullyrunning
>     > for a period of time (anecdotally, after a few weeks of time). After a
>     > lot of digging, I’ve discovered that dnsmasq’s UDP socket file will
>     > eventually “disappear”. The issue can be resolved by restarting the
>     > dnsmasq service.
>     >
>     >
>     > I haven’t been able to reproduce it yet, but it has happened numerous
>     > times on servers which are running dozens of docker containers. From
>     > what I know, nothing should be removing this socket file and I can’t
>     > find anything relevant in the dnsmasq logs. Is anyone aware of any
>     > situations that can cause socket files to disappear?
>     >
>     >
>     > Environment
>     >
>     > Ubuntu 16.04.3 LTS
>     >
>     > 8 Cores, 16GB of RAM
>     >
>     > Dnsmasq 2.75-1ubuntu0.16.04.4
>     >
>     >
>     > Background
>     >
>     > I’m using dnsmasq to forward requests to Consul
>     > <https://www.consul.io/docs/guides/forwarding.html
>     <https://www.consul.io/docs/guides/forwarding.html>>, which is used for
>     > service discovery. The Consul agent listens on port 8600 and is
>     > configured to bind to allinterfaces (the relevant interface here is
>     > 172.17.0.1, which docker creates).  
>     >
>     >
>     > Resolv.conf
>     >
>     > ```
>     >
>     > # Dynamic resolv.conf(5) file for glibc resolver(3) generated by
>     > resolvconf(8)
>     >
>     > nameserver 127.0.0.1
>     >
>     > ```
>     >
>     >
>     > Dnsmasq.conf
>     >
>     > ```
>     >
>     > server=/consul/172.17.0.1#8600 <http://172.17.0.1#8600>
>     <http://172.17.0.1/#8600>
>     >
>     > server=/10.in-addr.arpa/172.17.0.1#8600 <http://172.17.0.1#8600>
>     <http://172.17.0.1/#8600>
>     >
>     > bind-dynamic
>     >
>     > ```
>     >
>     >
>     > Systemd config for Docker
>     >
>     > ```
>     >
>     > ExecStart=/usr/bin/dockerd --bip=172.17.0.1/24
>     <http://172.17.0.1/24> <http://172.17.0.1/24>
>     > --dns=172.17.0.1 -H fd://
>     >
>     > ```
>     >
>     > While investigating the servers in the broken state, I observed the
>     > following:
>     >
>     >
>     >   *
>     >
>     >     nslookup / dig DNS resolutions are timing out
>     >
>     >   *
>     >
>     >     Docker logs show containers are also timing out on DNS resolutions
>     >
>     >   *
>     >
>     >     Systemd reports that dnsmasq is still running, pid still exists
>     >
>     >   *
>     >
>     >     DNS resolutions sent directly to the consul agent (127.0.0.1:8600 <http://127.0.0.1:8600>
>     >     <http://127.0.0.1:8600/>) succeed
>     >
>     >   *
>     >
>     >     DNS resolutions sent to system[dnsmasq] (127.0.0.1:53 <http://127.0.0.1:53>
>     >     <http://127.0.0.1:53/>) time out
>     >
>     >   *
>     >
>     >     IPV6 UDP (::1) resolutions sent to dnsmasq succeeded
>     >
>     >   *
>     >
>     >     Netstat shows that the IPV4 UDP socket file for dnsmasq is missing
>     >
>     >   *
>     >
>     >     No relevant messages in kernel log (specifically, no dnsmasq OOM
>     >     kill events)
>     >
>     >   *
>     >
>     >     File descriptor usage for the entire server was normal
>     >
>     >   *
>     >
>     >     File descriptor usage for the individual dnsmasq process was normal
>     >
>     >   *
>     >
>     >     CPU, RAM, and storage all look good
>     >
>     >
>     > Thanks in advance for any discussion at all - I've been really
>     > struggling with this one for a while now.
>     >
>     >
>     > Zach
>     >
>     > *
>     >
>     >
>     > _______________________________________________
>     > Dnsmasq-discuss mailing list
>     > Dnsmasq-discuss at lists.thekelleys.org.uk
>     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
>     > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>     <http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss>
>     >
> 
> 
>     _______________________________________________
>     Dnsmasq-discuss mailing list
>     Dnsmasq-discuss at lists.thekelleys.org.uk
>     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
>     http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>     <http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss>
> 
> 




More information about the Dnsmasq-discuss mailing list