[Dnsmasq-discuss] [PATCH] fix for netlink ENOBUF problem

Ivan Kokshaysky ink at jurassic.park.msu.ru
Mon Jul 4 16:29:22 BST 2016


Hi Simon and all,

recently we fired up IPv6 in our network. There are several rather big
linux routers for student dormitories, each one serves 2-3 thousand
clients. We use vlan-per-user, so the number of interfaces on each machine
is accordingly vast, and DHCP is quite problematic. We tried several
DHCPv6 servers (isc-dhcp, dhcp6s, dibbler), but all of them scale
quite poorly. We found that the only combination that works reasonably
well is isc-dhcp server in a separate network namespace with a single
interface, and dnsmasq running in DHCPv6 relay mode.

However, we noticed that dnsmasq gets stuck from time to time (about
once a day on average) using 100% CPU and needs to be restarted.
I've found that the problem is triggered by occasional ENOBUF error
on netlink read in netlink.c:iface_enumerate(). In this case dnsmasq
simply starts new request, but the problem is that netlink buffer is still
full of messages that the kernel had put there in response to previous
request before "out of memory" condition occurred. Up to several hundred
messages in our case. These messages are picked up after next request,
dnsmasq sees incorrect sequence number, puts them on the async queue
and then gets stuck enumerating interfaces forever. The problem is
the same with kernels 3.16 and 4.1.

To fix that we need to purge the netlink buffer on ENOBUF error.
With the appended patch dnsmasq is running flawlessly for about
a month.

Ivan.

diff --git a/src/netlink.c b/src/netlink.c
index 049247b..9ccbbf2 100644
--- a/src/netlink.c
+++ b/src/netlink.c
@@ -181,7 +181,15 @@ int iface_enumerate(int family, void *parm, int (*callback)())
 	{
 	  if (errno == ENOBUFS)
 	    {
+	      int rcnt = -1;
+
 	      sleep(1);
+	      do {
+		len = netlink_recv();
+		rcnt++;
+	      } while (len != -1);
+	      my_syslog(LOG_INFO, _("%d messages cleared after ENOBUFS, last errno = %d"),
+			rcnt, errno);
 	      goto again;
 	    }
 	  return 0;




More information about the Dnsmasq-discuss mailing list