[Dnsmasq-discuss] following RFC6106 triggers bug in network-manager

Dan Williams dcbw at redhat.com
Tue Nov 5 18:05:19 GMT 2013


On Tue, 2013-11-05 at 09:21 +0100, Gui Iribarren wrote:
> Hello,
> so, we started suffering frequent, periodic disconnects on clients since 
> upgrading dnsmasq 2.62 -> 2.66
> 
> tracking down the issue, it came down to a network-manager bug while 
> maintaining the RDNSS list, where an unhandled expiring RDNSS lifetime 
> results in a full reconnection

We've fixed a number of NM bugs in this area, specifically (a) adding
some elasticity before deciding the DNS servers have expired, and (b)
sending Router Solicitations before they have expired, to get updated
lifetimes.

> problem is, the kernel only understands the *router* lifetime, but 
> ignores everything about the RDNSS lifetime; and if the latter is 
> shorter than the former, then the RDNSS expires before the kernel sends 
> a RS to handle the *router* expiring lifetime.

RFC6106 says the lifetime SHOULD be bounded by "MaxRtrAdvInterval <=
Lifetime <= 2*MaxRtrAdvInterval", which allows at least one dropped RA,
and which NM should compensate for with the above mentioned fixes.

As you say below, the bug was fixed in NM 0.9.6 (released 2012-08-07, so
over a year ago) and I'd recommend that the distro just upgrade to get
the fixes instead of hacking around the issue in dnsmasq, which is
following the RFC.  It was clearly a NetworkManager bug.

Dan

> in dnsmasq 2.62, router lifetime was equal to RDNSS lifetime, as shown 
> below:
> 
> # rdisc6 wlan0
> Soliciting ff02::2 (ff02::2) on wlan0...
> 
> Hop limit                 :           64 (      0x40)
> Stateful address conf.    :           No
> Stateful other conf.      :           No
> Mobile home agent         :           No
> Router preference         :       medium
> Neighbor discovery proxy  :           No
> Router lifetime           :         1800 (0x00000708) seconds
> [...]
>   Recursive DNS server     : fe80::fad1:11ff:fe54:3381
>    DNS server lifetime     :         1800 (0x00000708) seconds
>   from fe80::fad1:11ff:fe54:3381
> 
> this prevented the situation where the network-manager bug would happen: 
> as the kernel would issue a RS to renew the router lifetime, the RDNSS 
> was renewed as well, just in time
> 
> in network-manager 0.9.6 the bug is fixed (NM sends a RS by itself, 
> before RDNSS expires, independent of RtrAdvLifetime)
> but notably debian squeeze still ships 0.9.4, which reconnects to the 
> network every 20 minutes when talking to a dnsmasq v2.66 (worked well 
> against v2.62)
> 
> Router lifetime           :         1800 (0x00000708) seconds
>    DNS server lifetime     :         1200 (0x000004b0) seconds
> 
> then, even though it's the debian/etc maintainers who should fix their 
> packages...
> 
>    https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/993571
> 
> can we anyway consider going back to the old behaviour in dnsmasq, to 
> help mitigation?
> (RtrAdvLifetime = RDNSSLifetime)
> 
> i understand v2.66 follows RFC6106
> 
>       Lifetime      32-bit unsigned integer.  The maximum time, in
>                     seconds (relative to the time the packet is sent),
>                     over which this RDNSS address MAY be used for name
>                     resolution.  Hosts MAY send a Router Solicitation to
>                     ensure the RDNSS information is fresh before the
>                     interval expires.  In order to provide fixed hosts
>                     with stable DNS service and allow mobile hosts to
>                     prefer local RDNSSes to remote RDNSSes, the value of
>                     Lifetime SHOULD be bounded as
>                     MaxRtrAdvInterval <= Lifetime <= 2*MaxRtrAdvInterval
>                     where MaxRtrAdvInterval is the Maximum RA Interval
>                     defined in [RFC4861].  A value of all one bits
>                     (0xffffffff) represents infinity.  A value of zero
>                     means that the RDNSS address MUST no longer be used.
> 
> but this RFC has been criticised already[1] (since it creates a fragile 
> situation, where a single or couple of RA packet losses - common in wifi 
> scenarios - are enough to lose the race condition)
> 
>      [1]: https://bugzilla.redhat.com/show_bug.cgi?id=753482#c38
> 
> and using RtrAdvLifetime = RDNSSLifetime only defies the "SHOULD" 
> keyword used in the RFC, strictly speaking.
> in addition, dnsmasq (contrary to radvd) actually provides the RDNSS 
> service itself, so it's shouldn't be much of an issue to announce a 
> longer lifetime for that?
> 
> just a thought :)
> 
> Cheers!
> 
> gui
> 
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss





More information about the Dnsmasq-discuss mailing list