[Dnsmasq-discuss] [PATCH] fix for netlink ENOBUF problem
Ivan Kokshaysky
ink at jurassic.park.msu.ru
Thu Jul 7 21:05:49 BST 2016
On Thu, Jul 07, 2016 at 08:16:52PM +0100, Simon Kelley wrote:
> Great, many thanks. Is this patch on top of the original one, or an
> alternative?
It's alternative, I've dropped the old one.
> Once it's all resolved, I'm happy to take the final patch.
Excellent, thanks a lot! I hope this one is final and we'll give it
the green light on Monday :)
Ivan.
> Cheers,
>
> Simon.
>
> On 07/07/16 19:23, Ivan Kokshaysky wrote:
> > On Thu, Jul 07, 2016 at 03:32:11PM +0100, Simon Kelley wrote:
> >> On 06/07/16 14:55, Ivan Kokshaysky wrote:
> >>> On Mon, Jul 04, 2016 at 01:58:43PM -0400, wkitty42 at gmail.com wrote:
> >>>> On 07/04/2016 11:29 AM, Ivan Kokshaysky wrote:
> >>>>> To fix that we need to purge the netlink buffer on ENOBUF error. With the
> >>>>> appended patch dnsmasq is running flawlessly for about a month.
> >>>>
> >>>> why are the messages not removed from the buffer when they are processed? or are
> >>>> they and there's simply too many messages coming in to handle?
> >>>
> >>> Good questions, thanks.
> >>>
> >>> It's certainly possible to drop these messages during normal processing -
> >>> if the message process ID is correct, but sequence number is wrong,
> >>> the message must be dropped (instead of putting it on async queue
> >>> like it happens now, and it's a real bug, BTW).
> >>
> >> So that real bug relates to this:
> >>
> >> if (h->nlmsg_seq != seq ||
> >> h->nlmsg_pid != netlink_pid ||
> >> h->nlmsg_type == NLMSG_ERROR)
> >> {
> >> /* May be multicast arriving async */
> >> nl_async(h);
> >> }
> >>
> >> Which should be something like
> >>
> >> if (h->nlmsg_pid != netlink_pid ||
> >> h->nlmsg_type == NLMSG_ERROR)
> >> {
> >> /* May be multicast arriving async */
> >> nl_async(h);
> >> }
> >> else if (h->nlmsg_seq != seq)
> >> {
> >> /* drop message */
> >> }
> >> else
> >> /* process message */
> >
> > Yes, precisely.
> >
> >> Also, I wonder if the loop which drops messages should check for async
> >> stuff and handle it - that won't get re-tried by the kernel, will it?
> >
> > Indeed that won't, I guess...
> >
> >> Also, also, you still have the sleep(1) in there. Is that actually
> >> necessary? Handling things this way looks better than just sleeping and
> >> hoping?
> >
> > I tend to agree. Since I don't need debugging anymore, this way looks simpler
> > and cleaner.
> >
> > Well, I've just patched dnsmasq like this, and restarted it on one of
> > our routers. I'm pretty certain it works, but I'd like to let it run, say,
> > until monday, just to be absolutely sure. If something goes wrong,
> > I'll know immediately - we've set a cron job that checks dnsmasq CPU load
> > every ten minutes, and if it exceeds 80% I'll get email with a backtrace.
> > In either case I'll let you know.
> >
> > Thanks,
> > Ivan
> >
> > diff --git a/src/netlink.c b/src/netlink.c
> > index 049247b..8cd51af 100644
> > --- a/src/netlink.c
> > +++ b/src/netlink.c
> > @@ -188,11 +188,17 @@ int iface_enumerate(int family, void *parm, int (*callback)())
> > }
> >
> > for (h = (struct nlmsghdr *)iov.iov_base; NLMSG_OK(h, (size_t)len); h = NLMSG_NEXT(h, len))
> > - if (h->nlmsg_seq != seq || h->nlmsg_pid != netlink_pid || h->nlmsg_type == NLMSG_ERROR)
> > + if (h->nlmsg_pid != netlink_pid || h->nlmsg_type == NLMSG_ERROR)
> > {
> > /* May be multicast arriving async */
> > nl_async(h);
> > }
> > + else if (h->nlmsg_seq != seq)
> > + {
> > + /* May be part of incomplete response to previous request after
> > + ENOBUFS. Drop it. */
> > + continue;
> > + }
> > else if (h->nlmsg_type == NLMSG_DONE)
> > return callback_ok;
> > else if (h->nlmsg_type == RTM_NEWADDR && family != AF_UNSPEC && family != AF_LOCAL)
> >
>
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
More information about the Dnsmasq-discuss
mailing list