[Dnsmasq-discuss] [PATCH] fix for netlink ENOBUF problem

Thu Jul 7 21:05:49 BST 2016

On Thu, Jul 07, 2016 at 08:16:52PM +0100, Simon Kelley wrote:
> Great, many thanks. Is this patch on top of the original one, or an
> alternative?

It's alternative, I've dropped the old one.

> Once it's all resolved, I'm happy to take the final patch.

Excellent, thanks a lot! I hope this one is final and we'll give it
the green light on Monday :)

Ivan.

> Cheers,
> 
> Simon.
> 
> On 07/07/16 19:23, Ivan Kokshaysky wrote:
> > On Thu, Jul 07, 2016 at 03:32:11PM +0100, Simon Kelley wrote:
> >> On 06/07/16 14:55, Ivan Kokshaysky wrote:
> >>> On Mon, Jul 04, 2016 at 01:58:43PM -0400, wkitty42 at gmail.com wrote:
> >>>> On 07/04/2016 11:29 AM, Ivan Kokshaysky wrote:
> >>>>> To fix that we need to purge the netlink buffer on ENOBUF error. With the
> >>>>> appended patch dnsmasq is running flawlessly for about a month.
> >>>>
> >>>> why are the messages not removed from the buffer when they are processed? or are 
> >>>> they and there's simply too many messages coming in to handle?
> >>>
> >>> Good questions, thanks.
> >>>
> >>> It's certainly possible to drop these messages during normal processing -
> >>> if the message process ID is correct, but sequence number is wrong,
> >>> the message must be dropped (instead of putting it on async queue
> >>> like it happens now, and it's a real bug, BTW). 
> >>
> >> So that real bug relates to this:
> >>
> >>    if (h->nlmsg_seq != seq ||
> >>        h->nlmsg_pid != netlink_pid ||
> >>        h->nlmsg_type == NLMSG_ERROR)
> >> 	  {
> >> 	    /* May be multicast arriving async */
> >> 	    nl_async(h);
> >> 	  }
> >>
> >> Which should be something like
> >>
> >> if (h->nlmsg_pid != netlink_pid ||
> >>     h->nlmsg_type == NLMSG_ERROR)
> >>      {
> >>         /* May be multicast arriving async */
> >>         nl_async(h);
> >>      }
> >> else if (h->nlmsg_seq != seq)
> >>      {
> >>         /* drop message */
> >>      }
> >> else
> >>      /* process message */
> > 
> > Yes, precisely.
> > 
> >> Also, I wonder if the loop which drops messages should check for async
> >> stuff and handle it - that won't get re-tried by the kernel, will it?
> > 
> > Indeed that won't, I guess...
> > 
> >> Also, also, you still have the sleep(1) in there. Is that actually
> >> necessary? Handling things this way looks better than just sleeping and
> >> hoping?
> > 
> > I tend to agree. Since I don't need debugging anymore, this way looks simpler
> > and cleaner.
> > 
> > Well, I've just patched dnsmasq like this, and restarted it on one of
> > our routers. I'm pretty certain it works, but I'd like to let it run, say,
> > until monday, just to be absolutely sure. If something goes wrong,
> > I'll know immediately - we've set a cron job that checks dnsmasq CPU load
> > every ten minutes, and if it exceeds 80% I'll get email with a backtrace.
> > In either case I'll let you know.
> > 
> > Thanks,
> > Ivan
> > 
> > diff --git a/src/netlink.c b/src/netlink.c
> > index 049247b..8cd51af 100644
> > --- a/src/netlink.c
> > +++ b/src/netlink.c
> > @@ -188,11 +188,17 @@ int iface_enumerate(int family, void *parm, int (*callback)())
> >  	}
> >  
> >        for (h = (struct nlmsghdr *)iov.iov_base; NLMSG_OK(h, (size_t)len); h = NLMSG_NEXT(h, len))
> > -	if (h->nlmsg_seq != seq || h->nlmsg_pid != netlink_pid || h->nlmsg_type == NLMSG_ERROR)
> > +	if (h->nlmsg_pid != netlink_pid || h->nlmsg_type == NLMSG_ERROR)
> >  	  {
> >  	    /* May be multicast arriving async */
> >  	    nl_async(h);
> >  	  }
> > +	else if (h->nlmsg_seq != seq)
> > +	  {
> > +	    /* May be part of incomplete response to previous request after
> > +	       ENOBUFS. Drop it. */
> > +	    continue;
> > +	  }
> >  	else if (h->nlmsg_type == NLMSG_DONE)
> >  	  return callback_ok;
> >  	else if (h->nlmsg_type == RTM_NEWADDR && family != AF_UNSPEC && family != AF_LOCAL)
> > 
> 
> 
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss