[Dnsmasq-discuss] RA support in dnsmasq

Sat Dec 1 15:08:29 GMT 2012

On 01/12/12 14:54, Gene Czarcinski wrote:
> On 11/30/2012 05:23 PM, Gene Czarcinski wrote:
>> On 11/30/2012 04:18 PM, Simon Kelley wrote:
>>> On 30/11/12 21:03, Gene Czarcinski wrote:
>>>> On 11/30/2012 12:45 PM, Simon Kelley wrote:
>>>>> On 30/11/12 17:20, Gene Czarcinski wrote:
>>>>>> On 11/30/2012 11:32 AM, Simon Kelley wrote:
>>>>>>> On 30/11/12 15:54, Gene Czarcinski wrote:
>>>>>>>> On 11/29/2012 04:18 PM, Simon Kelley wrote:
>>>>>>>>> On 29/11/12 20:31, Gene Czarcinski wrote:
>>>>>>>>>
>>>>>>>>>> I spoke too quickly.
>>>>>>>>>>
>>>>>>>>>> The cause of the problem is libvirt related but I am not sure
>>>>>>>>>> what
>>>>>>>>>> just
>>>>>>>>>> yet.
>>>>>>>>>>
>>>>>>>>>> I was running a libvirt that had a lot of "stuff" on it but
>>>>>>>>>> seemed to
>>>>>>>>>> work OK. Then, earlier today I update to a point that appears
>>>>>>>>>> to be
>>>>>>>>>> somewhat beyond the leading edge and, although I was not
>>>>>>>>>> getting any
>>>>>>>>>> RTR-ADVERT messages, it turned out that there were/are big-time
>>>>>>>>>> problems
>>>>>>>>>> running qemu-kvm. So, back off/downgrade to the previous version.
>>>>>>>>>> Qemu-kvm now works but the RTR-ADVERT messages are back.
>>>>>>>>>>
>>>>>>>>>> This may be a bit time-consuming to debug!
>>>>>>>>>>
>>>>>>>>> Are you seeing the new log message in netlink.c?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> The good news is that libvirt is working again (I must have done a
>>>>>>>> git-pull in the middle of an update).  Thus, I am not seeing the
>>>>>>>> large
>>>>>>>> numbers of RTR-ADVERT.
>>>>>>>>
>>>>>>>> Yes, I am seeing the new log message and I have a question about
>>>>>>>> that.
>>>>>>>> Every time a new virtual network interface is started, something
>>>>>>>> must be
>>>>>>>> doing some type of broadcast because all of the dnsmasq
>>>>>>>> instances (the
>>>>>>>> new one and all the "old" ones) suddenly wake up and issue a
>>>>>>>> flurry of
>>>>>>>> RA packets and related syslog messages.  To kick the flurry off,
>>>>>>>> there
>>>>>>>> one of the new "unsolicited" syslog messages from each dnsmasq
>>>>>>>> instance.
>>>>>>>>
>>>>>>>> Is this something you would expect?  Is this "normal?" The libvirt
>>>>>>>> folks they are not doing it.
>>>>>>> I'd expect it. The code you instrumented gets run whenever a "new
>>>>>>> address" event happens, which is whenever an address is added to an
>>>>>>> interface. "Every time a new virtual network interface is
>>>>>>> started" is a
>>>>>>> good proxy for that.
>>>>>>>
>>>>>>> The dnsmasq code isn't very discriminating, it updates it's idea of
>>>>>>> which interfaces hace which addresses, and then does a minute of
>>>>>>> fast
>>>>>>> advertisements on all of them. It might be possible to only do
>>>>>>> the fast
>>>>>>> advertisements on new interfaces, but implementing that isn't
>>>>>>> totally
>>>>>>> trivial.
>>>>>>>
>>>>>>>
>>>>>> Yes, I doubt very much if it would be trivial.  However, I do not
>>>>>> believe that this is the basic problem.
>>>>>>
>>>>>> When the problem occurs, one of the networks "suddenly" attempts
>>>>>> to work
>>>>>> with the real NIC rather than the virtual one defined in its config
>>>>>> file.  I slightly changed the IPv4 and IPv6 addresses defined for
>>>>>> this
>>>>>> network and the problem went away.  I have also "just" seen the
>>>>>> problem
>>>>>> happen on another system which also had that virtual address defined.
>>>>>>
>>>>>> BTW, these configurations all use interface= and bind-dynamic rather
>>>>>> than the "old" bind-interface with listen-address= specified for each
>>>>>> specified IPv4 and IPv6 address.  I had not noticed the problem
>>>>>> previously.  Why it occurs at all with just this specific address is
>>>>>> puzzling.
>>>>>>
>>>>>> The configuration in which causes problems is:
>>>>>> ------------------------------------------
>>>>>> # dnsmasq conf file created by libvirt
>>>>>> strict-order
>>>>>> domain-needed
>>>>>> domain=net6
>>>>>> expand-hosts
>>>>>> local=/net6/
>>>>>> pid-file=/var/run/libvirt/network/net6.pid
>>>>>> bind-dynamic
>>>>>> interface=virbr11
>>>>>> dhcp-range=192.168.6.128,192.168.6.254
>>>>>> dhcp-no-override
>>>>>> dhcp-leasefile=/var/lib/libvirt/dnsmasq/net6.leases
>>>>>> dhcp-lease-max=127
>>>>>> dhcp-hostsfile=/var/lib/libvirt/dnsmasq/net6.hostsfile
>>>>>> addn-hosts=/var/lib/libvirt/dnsmasq/net6.addnhosts
>>>>>> dhcp-range=fd00:beef:10:6::1,ra-only
>>>>>> -------------------------------------------------
>>>>>>
>>>>>> When I changed all the "6" to "160", the problem, disappeared. And
>>>>>> there is another network defined almost the same with "8" instead
>>>>>> of "6"
>>>>>> and I have had no problems with it.
>>>>>>
>>>>>> The real NIC is configured as a DHCP client  for both IPv4 and
>>>>>> IPv6. It
>>>>>> is assigned "nailed" addresses of 192.168.17.2/24 and
>>>>>> fd00:dead:beef:17::2.
>>>>>>
>>>>>> And I just discovered why crazy stuff is happening (but I do not know
>>>>>> what causes it) ... the P33p1 NIC has:
>>>>>>    inet6 fd00:beef:10:6:3285:a9ff:fe8f:e982/64 scope global dynamic
>>>>>
>>>>> Is that the "real NIC"?
>>>>>
>>>> Yes, p33p1 is the real NIC.  This is going to be a real PITA to debug
>>>> because I believe part of the problem is a race condition.
>>>> NetworkManager has this really long dance it goes through to bring up
>>>> the IPv6 interface.
>>>>
>>>> But, I do not have any proof of that and as I just proved to myself,
>>>> getting things to repeat are going to be difficult.
>>>>
>>>> At this point I am not sure that bind-dynamic was related.  I went
>>>> through the syslogs I still have and the first occurrence was on  8
>>>> November.  That is well before bind-dynamic was integrated in.
>>>>
>>>> Attached are some limited copies of syslogs that I thought you might
>>>> find of interest.  It seems like the "strangeness" seem to happen right
>>>> after I update libvirt and libvirtd is restarted which then gets
>>>> dnsmasq
>>>> started.
>>>>
>>>> If I cannot get this figured out and "fixed", I will need to disable
>>>> use
>>>> of dnsmasq for RA service and fall back on radvd.
>>>>
>>>> Frustrating .. so close and yet so far!
>>>>
>>>
>>> I wonder if the virbr* interfaces are bridged to the "real" NICs,
>>> such that when a prefix is advertised on the virbr interface, it
>>> causes the real interface to add an address for that prefix. Because
>>> dnsmasq is configured to advertise the prefix, that then causes the
>>> advertisements via the real NIC.
>>>
>>> Just a thought.
>>>
>> If I had not done the ip addr to get the above, I would still be
>> scratching my head.
>>
>> Anyway, here is ip addr:
>> -----------------------------------------------
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>     inet 127.0.0.1/8 scope host lo
>>     inet6 ::1/128 scope host
>>        valid_lft forever preferred_lft forever
>> 2: p33p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> state UP qlen 1000
>>     link/ether 30:85:a9:8f:e9:82 brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.17.2/24 brd 192.168.17.255 scope global p33p1
>>     inet6 fd00:dead:beef:17:1::2/128 scope global
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::3285:a9ff:fe8f:e982/64 scope link
>>        valid_lft forever preferred_lft forever
>> 10: virbr11: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
>> noqueue state DOWN
>>     link/ether 52:54:00:0b:84:5c brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.6.1/24 brd 192.168.6.255 scope global virbr11
>>     inet6 fd00:beef:10:6::1/64 scope global
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::5054:ff:fe0b:845c/64 scope link
>>        valid_lft forever preferred_lft forever
>> 11: virbr11-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast
>> master virbr11 state DOWN qlen 500
>>     link/ether 52:54:00:0b:84:5c brd ff:ff:ff:ff:ff:ff
>> ------------------------------------
>>
>> And here is brctl show:
>> -----------------------------------------
>> bridge name    bridge id        STP enabled    interfaces
>> virbr11        8000.5254000b845c    yes        virbr11-nic
>> ---------------------------------------
>>
>> I think I will give it a rest until tomorrow!
>>
> I ran yet another test and this time it happened.  I am getting a little
> more info such as:
> Dec  1 05:52:36 falcon dnsmasq-dhcp[23358]: ra_start_unsolicted(),
> len=60, type=14, flags=0, pid=5b5e
>
> where most look like:
> Dec  1 05:52:37 falcon dnsmasq-dhcp[23394]: ra_start_unsolicted(),
> len=64, type=14, flags=0, pid=0
>
>
> 1. Is there other information I can/should print out?

No, it's clear what's happening, I think.
>
> 2. Is there anything I can do to identify why dnsmasq is "suddenly"
> using interface p33p1 when it was specifically configured to use
> interface virbr11?  I thought that this bind-interface and bind-dynamic
> were support to lock that dnsmasq instance into only servicing the
> specified interface.
>

I think that's known: it's because p33pl gets an address on 
fd00:beef:10:6:: network, which dnsmasq is configured to advertise.
The difficult question is why it's getting that address.

(The fd00:beef:10:6:: addresss on p33pl is not shown in your latest 
dump, but is is shown in earlier ones. The existance of that address on 
p33pl should be a big red flag when you're trying to diagnose this.)

> Previously, libvirt's parameters to dnsmasq were bind-interface,
> listen-address=.  This is now replaced with bind-dynamic, interface= to
> fix a serious problem.  So, my question is whether the "correct"
> configuration should be bind-interface, interface= ?
>
> The object is that no matter haw it is specified, the goal is that
> dnsmasq ONLY service the networks defined on a specific interface and to
> ignore anything from other interfaces whether that have the same network
> defined or not.

The RA code uses address matching to decide where to advertise. This 
could arguably be augmented by filtering on --interface, but it isn't at 
the moment.

>
> Question, what do you think would happen if there were to different
> processes on two different hardware platforms which share a common
> hardware-network fabric were to run stateful RA on one system and
> stateless RA on the other system?  Could that be happening here and, if
> so, why?

I don't understand this, what is stateful RA? Do you mean stateful DHCP?
>
>
> Gene
> Gene
>
>
> _______________________________________________
> Dnsmasq-discuss mailing list
> Dnsmasq-discuss at lists.thekelleys.org.uk
> http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>