[Dnsmasq-discuss] [BUG] [PATCH] RA are sent too fast and slows down the machine

Maarten de Vries maarten+dnsmasq at m.de-vri.es
Thu Sep 26 18:59:00 BST 2019


It's perfectly valid to have multiple distinct prefixes configured on an 
interface, so just remembering one subnet isn't good enough in the 
general case. Although it's certainly an improvement over a single address.

I think a complete fix would be to remember all (interface, prefix) 
pairs that we're doing RAs on,  and only (re)start fast RAs for the 
interface if the subnet isn't already being served RA's. I imagine this 
list already exists somewhere, since the RAs are being sent there. But 
it's been a while since I looked through the code.

-- Maarten

On 11-09-2019 23:40, Simon Kelley wrote:
> That's nasty.
>
> I'm not sure how to properly solve this. I'm inclined to apply your
> patch, on the grounds that it at least works better.....
>
>
>
> Simon.
>
>
>
> On 02/09/2019 18:45, Petr Mensik wrote:
>> Yes, it seems originating system is auto configuring interface on behalf
>> own RA. I have modified the test to include ip monitor output. It
>> receives autoconfiguration few seconds after bridge interface comes up.
>>
>> Don't know how much is involved fact network namespace is used on a
>> bridge, it should not matter. A bit suspicious is STALE router just
>> before autoconfiguration. I doubt it is related, but Avahi is trying
>> mdns on that interfaces. Of course, Network Manager is touching it also.
>>
>> Since it is custom interface created in namespace, any other host cannot
>> send RA to it. So I am positive it autoconfigures itself, at least on my
>> Fedora 29. Has same results when only bridge is used and when loopback
>> is also used.
>>
>> 14:32:22.711> 2: simbr    inet6 fc58:a22:180d:7800::1/64 scope global
>> ...
>> 14:32:25.289> fe80::6887:6dff:fe07:6f54 dev simbr lladdr
>> 6a:87:6d:07:6f:54 router STALE
>> 14:32:25.293> prefix fc58:a22:180d:7800::/64dev simbr onlink autoconf
>> valid 1800 preferred 1800
>> 14:32:27.317> 2: simbr    inet6
>> fc58:a22:180d:7800:6887:6dff:fe07:6f54/64 scope global dynamic mngtmpaddr
>> 14:32:27.318> valid_lft 1798sec preferred_lft 1798sec
>>
>> Cheers,
>> Petr
>>
>> On 8/30/19 11:26 PM, Simon Kelley wrote:
>>> This is useful information, but what I don't understand, is where the
>>> flooding comes from. Sure, this confusion means that unsolicted ra will
>>> run every time there's a "new address" event, even if the new address
>>> isn't on the expected interface, but I can't see how it generates more
>>> "new address events" and therefore a flood of packets.
>>>
>>>
>>> Unless, the originating system receives _its_own_ RA and that generates
>>> a "new address" event?
>>>
>>> Simon.
>>>
>>>
>>>
>>> On 28/08/2019 20:38, Petr Mensik wrote:
>>>> Hi,
>>>>
>>>> I have found what is going on.
>>>>
>>>> That RA seems to be switching between dynamically assigned address and
>>>> manually assigned address. It is just wrong to assume there is one
>>>> address on physical interface, especially in IPv6 world.
>>>>
>>>> It seems my patch (attached), checking just subnet and not caring for
>>>> exact address inside, fixes advertisement floods. But I am not sure
>>>> whether it also does not stop announces for new dynamic addresses as it
>>>> should. It might help to use valid parameter to distinguish between
>>>> static address and dynamic. I am unsure if it is required for both or
>>>> just dynamic one?
>>>>
>>>> I am sure it would send once for newly created interface. I think it
>>>> should be enough, right?
>>>>
>>>> Some notes from debugging:
>>>>
>>>> Breakpoint 1, construct_worker (scope=<optimized out>, flags=<optimized
>>>> out>, preferred=<optimized out>, valid=1800,
>>>>      vparam=0x7ffc9afc2b60, if_index=2, prefix=64, local=0xa6dda4) at
>>>> dhcp6.c:685
>>>> 2: /x *local = {__in6_u = {__u6_addr8 = {0xfc, 0x58, 0xa, 0x22, 0x18,
>>>> 0xd, 0x78, 0x0, 0x8, 0x21, 0xd1, 0xff, 0xfe, 0x74, 0xec,
>>>>        0x2a}, __u6_addr16 = {0x58fc, 0x220a, 0xd18, 0x78, 0x2108, 0xffd1,
>>>> 0x74fe, 0x2aec}, __u6_addr32 = {0x220a58fc, 0x780d18,
>>>>        0xffd12108, 0x2aec74fe}}}
>>>>
>>>> Breakpoint 1, construct_worker (scope=<optimized out>, flags=<optimized
>>>> out>, preferred=<optimized out>, valid=-1,
>>>>      vparam=0x7ffc9afc2b60, if_index=2, prefix=64, local=0xa6ddec) at
>>>> dhcp6.c:685
>>>> 685			ra_start_unsolicited(param->now, template);
>>>> 2: /x *local = {__in6_u = {__u6_addr8 = {0xfc, 0x58, 0xa, 0x22, 0x18,
>>>> 0xd, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1},
>>>>      __u6_addr16 = {0x58fc, 0x220a, 0xd18, 0x78, 0x0, 0x0, 0x0, 0x100},
>>>> __u6_addr32 = {0x220a58fc, 0x780d18, 0x0, 0x1000000}}}
>>>>
>>>> Cooperative ip link:
>>>> 2: simbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
>>>> UP group default qlen 1000
>>>>      link/ether 0a:21:d1:74:ec:2a brd ff:ff:ff:ff:ff:ff
>>>>      inet 172.30.16.1/24 scope global simbr
>>>>         valid_lft forever preferred_lft forever
>>>>      inet6 fc58:a22:180d:7800:821:d1ff:fe74:ec2a/64 scope global dynamic
>>>> mngtmpaddr
>>>>         valid_lft 1699sec preferred_lft 1699sec
>>>>      inet6 fc58:a22:180d:7800::1/64 scope global
>>>>         valid_lft forever preferred_lft forever
>>>>      inet6 fe80::821:d1ff:fe74:ec2a/64 scope link
>>>>         valid_lft forever preferred_lft forever
>>>>
>>>>
>>>> Regards,
>>>> Petr
>>>>
>>>> On 8/27/19 10:42 PM, Maarten de Vries wrote:
>>>>> Hey,
>>>>>
>>>>> I haven't dug very deep yet, but I can comment on the intent of the
>>>>> particular commit: without it, dnsmasq didn't do any unsolicited RAs on
>>>>> interfaces that are created after dnsmasq was started. It definitely
>>>>> should do unsolicited RAs on those interfaces too, although obviously
>>>>> not quite so many so often. I'm not sure why that happens. Note that the
>>>>> commit didn't introduce the fast RAs, it only enabled unsolicited RAs
>>>>> (including fast) for newly created interfaces too.
>>>>>
>>>>> I wonder why this happens in those test cases and at-least one Raspberry
>>>>> Pi, but not on my server. Is there any information you could provide to
>>>>> pinpoint when exactly this bug triggers and when not? For example: what
>>>>> happens if the virtual interface is created before dnsmasq starts? Does
>>>>> it also trigger on bridge interfaces (which is what I personally tested
>>>>> the commit with) for you?
>>>>>
>>>>> I will attempt to investigate too, but I'm somewhat swamped for time so
>>>>> I can't promise fast results.
>>>>>
>>>>> Kinds regards,
>>>>>
>>>>> Maarten
>>>>>
>>>>>
>>>>> On 27-08-2019 10:45, Iain Lane wrote:
>>>>>> On Wed, Aug 21, 2019 at 08:59:07PM +0200, Petr Mensik wrote:
>>>>>>> Hi Simon and Maarten,
>>>>>>>
>>>>>>> we discovered when playing with NetworkManager-ci [1], that lastest
>>>>>>> release is somehow broken. Test running dnsmasq are quite slow on latest
>>>>>>> release.
>>>>>>>
>>>>>>> I have created repeatable started script that reproduces it. Then used
>>>>>>> git bisect to find when it was broken. It seems fast sending were
>>>>>>> intentional in commit 0a496f059c1e9 [2], but maybe way it affects the
>>>>>>> system were underestimated. It is significant for systems that hit such
>>>>>>> issue. I think it has to be fixed to slow it down to short time
>>>>>>> interval, not endless loop. Reported as Fedora bug [3].
>>>>>> Thanks for this Petr. Would you be able to share the script you've used,
>>>>>> so that perhaps an upstream developer could recreate the bug?
>>>>>>
>>>>>> Mainly I wanted to chime in and say that (in addition to the other
>>>>>> instance referenced), we found this in the NetworkManager testsuite in
>>>>>> Ubuntu. I didn't come up with a nice reproducer at the time, but we did
>>>>>> identify the same commit and we've reverted it in Ubuntu. I posted on
>>>>>> the ML back then but we didn't get much traction and I didn't follow up
>>>>>> very aggressively.
>>>>>>
>>>>>>    
>>>>>> http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2018q4/012709.html
>>>>>>
>>>>>>
>>>>>>    
>>>>>> https://launchpadlibrarian.net/405377161/dnsmasq_2.80-1_2.80-1ubuntu1.diff.gz
>>>>>>
>>>>>>     (the commit ID referenced in the changelog there seems or from
>>>>>>     somewhere else, it's the same patch)
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Dnsmasq-discuss mailing list
>>>>>> Dnsmasq-discuss at lists.thekelleys.org.uk
>>>>>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>>>>>
>>>>> _______________________________________________
>>>>> Dnsmasq-discuss mailing list
>>>>> Dnsmasq-discuss at lists.thekelleys.org.uk
>>>>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>>>>>
>>>>
>>>> _______________________________________________
>>>> Dnsmasq-discuss mailing list
>>>> Dnsmasq-discuss at lists.thekelleys.org.uk
>>>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>>>>
>>>
>>> _______________________________________________
>>> Dnsmasq-discuss mailing list
>>> Dnsmasq-discuss at lists.thekelleys.org.uk
>>> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>>>
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss



More information about the Dnsmasq-discuss mailing list