[Dnsmasq-discuss] More RA testing

Simon Kelley simon at thekelleys.org.uk
Mon Dec 3 14:11:15 GMT 2012


On 03/12/12 12:46, Gene Czarcinski wrote:
> On 12/03/2012 04:39 AM, Simon Kelley wrote:
>> On 02/12/12 22:18, Simon Kelley wrote:
>>> On 02/12/12 16:32, Gene Czarcinski wrote:
>>>> All tests run with dnsmasq 2.64rc2
>>>>
>>>> Rebuilt libvirt so that I was sure what was running.
>>>>
>>>> 1. libvirt gc2.35: bind-dynamic interface=virbr__
>>>>
>>>> 2. libvirt gc2.36: bind-interfaces except-interface=lo
>>>> interface=virbr__
>>>>
>>>> Wireshark run on p33p1 for each test.
>>>>
>>>> After updating libvirt, waiting for stable situation, stopping net6 &
>>>> net8, then
>>>>
>>>> -- start wireshark
>>>> -- service restart libvirtd.service
>>>>
>>>> Check to see what happened. For both tests, the results were the same:
>>>>
>>>> -- net6 and net8 RAs issued from the dnsmasq on a virtual network (see
>>>> ICMPv6 Option Source link layer address)
>>>>
>>>> -- In each case, the Ethernet II, Src is from p33p1
>>>>
>>>> I continue to work on this.
>>>>
>>> Great work.
>>>
>>> A couple of questions spring to mind:
>>>
>>> 1) Is this a transient effect? Do you see subsequent RA's from/to the
>>> correct places?
>>>
>>> 2) Look in src/forward.c at the function send_from(), which is used to
>>> send the ICMP: there's a nasty hack in there:
>>>
>>> /* certain Linux kernels seem to object to setting the source */
>>> /* address in the IPv6 stack by returning EINVAL from sendmsg. */
>>> /* In that case, try again without setting the source address, */
>>> /* since it will nearly alway be correct anyway.  IPv6 stinks. */
>>>        if (errno == EINVAL && msg.msg_controllen)
>>>          {
>>>            msg.msg_controllen = 0;
>>>            goto retry;
>>>          }
>>>
>>> it would be very interesting to known if the "goto retry" code path is
>>> being taken in this case.
>>>
>> OK, further thoughts about this. Here is my hypothesis about what's
>> going on.
>>
>> 1) New virbr interface created, still doing Duplicate Address Detection
>> (DAD) of the link-local address.
>>
>> 2) Creation of new interface triggers dnsmasq to send RA and the code
>> path reaches the above, with the link-local address and interface index
>> of the new interface as the source address and sending interface, storge
>> in "msg".
>>
>> 3) call to sendto() fails with EINVAL, because source address has not
>> yet to pass DAD.
>>
>> 4) sendto() called again with unspecified interface and source address:
>> kernel picks p33pl  address of p33pl......
>>
>> 5) ... which causes exactly the problem we have seen.
>>
>>
>> This should be easy to duplicate here: I'll try this evening. If I'm
>> right the fix is just to remove the above hack, which is very old, and
>> came about because I didn't understand DAD on Ipv6 when I wrote it.
>>
>>
>>
> I do believe we have got it!!!
> 
> When I was examining the send_from() code and say the above code
> fragment I thought that looked a bit suspicious and I/we were correct! 
> I have attached the patch I used for my testing.
> 
> I did not trust things to either continue looping until success or to
> not give it some retries, so I put in a little counter for 5 retries. 
> Testing was the usual after installing the updated dnsmasq: stop all
> autostarted networks and then restart libvirtd.
> 
> The first test with two autostarted networks resulted in one "failed to
> send packet" message but p33p1 was clean.
> 
> The second test with four autostarted network resulted in about 6
> messages but p33p1 was clean and when things settled there was
> RTR-ADVERT on all four networks.
> 
> I must say that this is a relief!  Not only is it fixed but I understand
> why it is fixed.
> 
> I naturally assume that this will be in 2.64 final.
> 

It will. My take on the problem now in 2.64rc3 / git master. I'm
suppressing RAs until the target interface finishes doing DAD. It also
handles transmits through interfaces in DAD to upstream DNS servers better.

Cheers,

Simon.







More information about the Dnsmasq-discuss mailing list