[Dnsmasq-discuss] RA support in dnsmasq

Sat Dec 1 14:54:52 GMT 2012

On 11/30/2012 05:23 PM, Gene Czarcinski wrote:
> On 11/30/2012 04:18 PM, Simon Kelley wrote:
>> On 30/11/12 21:03, Gene Czarcinski wrote:
>>> On 11/30/2012 12:45 PM, Simon Kelley wrote:
>>>> On 30/11/12 17:20, Gene Czarcinski wrote:
>>>>> On 11/30/2012 11:32 AM, Simon Kelley wrote:
>>>>>> On 30/11/12 15:54, Gene Czarcinski wrote:
>>>>>>> On 11/29/2012 04:18 PM, Simon Kelley wrote:
>>>>>>>> On 29/11/12 20:31, Gene Czarcinski wrote:
>>>>>>>>
>>>>>>>>> I spoke too quickly.
>>>>>>>>>
>>>>>>>>> The cause of the problem is libvirt related but I am not sure 
>>>>>>>>> what
>>>>>>>>> just
>>>>>>>>> yet.
>>>>>>>>>
>>>>>>>>> I was running a libvirt that had a lot of "stuff" on it but
>>>>>>>>> seemed to
>>>>>>>>> work OK. Then, earlier today I update to a point that appears 
>>>>>>>>> to be
>>>>>>>>> somewhat beyond the leading edge and, although I was not 
>>>>>>>>> getting any
>>>>>>>>> RTR-ADVERT messages, it turned out that there were/are big-time
>>>>>>>>> problems
>>>>>>>>> running qemu-kvm. So, back off/downgrade to the previous version.
>>>>>>>>> Qemu-kvm now works but the RTR-ADVERT messages are back.
>>>>>>>>>
>>>>>>>>> This may be a bit time-consuming to debug!
>>>>>>>>>
>>>>>>>> Are you seeing the new log message in netlink.c?
>>>>>>>>
>>>>>>>>
>>>>>>> The good news is that libvirt is working again (I must have done a
>>>>>>> git-pull in the middle of an update).  Thus, I am not seeing the 
>>>>>>> large
>>>>>>> numbers of RTR-ADVERT.
>>>>>>>
>>>>>>> Yes, I am seeing the new log message and I have a question about 
>>>>>>> that.
>>>>>>> Every time a new virtual network interface is started, something
>>>>>>> must be
>>>>>>> doing some type of broadcast because all of the dnsmasq 
>>>>>>> instances (the
>>>>>>> new one and all the "old" ones) suddenly wake up and issue a 
>>>>>>> flurry of
>>>>>>> RA packets and related syslog messages.  To kick the flurry off, 
>>>>>>> there
>>>>>>> one of the new "unsolicited" syslog messages from each dnsmasq
>>>>>>> instance.
>>>>>>>
>>>>>>> Is this something you would expect?  Is this "normal?" The libvirt
>>>>>>> folks they are not doing it.
>>>>>> I'd expect it. The code you instrumented gets run whenever a "new
>>>>>> address" event happens, which is whenever an address is added to an
>>>>>> interface. "Every time a new virtual network interface is 
>>>>>> started" is a
>>>>>> good proxy for that.
>>>>>>
>>>>>> The dnsmasq code isn't very discriminating, it updates it's idea of
>>>>>> which interfaces hace which addresses, and then does a minute of 
>>>>>> fast
>>>>>> advertisements on all of them. It might be possible to only do 
>>>>>> the fast
>>>>>> advertisements on new interfaces, but implementing that isn't 
>>>>>> totally
>>>>>> trivial.
>>>>>>
>>>>>>
>>>>> Yes, I doubt very much if it would be trivial.  However, I do not
>>>>> believe that this is the basic problem.
>>>>>
>>>>> When the problem occurs, one of the networks "suddenly" attempts 
>>>>> to work
>>>>> with the real NIC rather than the virtual one defined in its config
>>>>> file.  I slightly changed the IPv4 and IPv6 addresses defined for 
>>>>> this
>>>>> network and the problem went away.  I have also "just" seen the 
>>>>> problem
>>>>> happen on another system which also had that virtual address defined.
>>>>>
>>>>> BTW, these configurations all use interface= and bind-dynamic rather
>>>>> than the "old" bind-interface with listen-address= specified for each
>>>>> specified IPv4 and IPv6 address.  I had not noticed the problem
>>>>> previously.  Why it occurs at all with just this specific address is
>>>>> puzzling.
>>>>>
>>>>> The configuration in which causes problems is:
>>>>> ------------------------------------------
>>>>> # dnsmasq conf file created by libvirt
>>>>> strict-order
>>>>> domain-needed
>>>>> domain=net6
>>>>> expand-hosts
>>>>> local=/net6/
>>>>> pid-file=/var/run/libvirt/network/net6.pid
>>>>> bind-dynamic
>>>>> interface=virbr11
>>>>> dhcp-range=192.168.6.128,192.168.6.254
>>>>> dhcp-no-override
>>>>> dhcp-leasefile=/var/lib/libvirt/dnsmasq/net6.leases
>>>>> dhcp-lease-max=127
>>>>> dhcp-hostsfile=/var/lib/libvirt/dnsmasq/net6.hostsfile
>>>>> addn-hosts=/var/lib/libvirt/dnsmasq/net6.addnhosts
>>>>> dhcp-range=fd00:beef:10:6::1,ra-only
>>>>> -------------------------------------------------
>>>>>
>>>>> When I changed all the "6" to "160", the problem, disappeared. And
>>>>> there is another network defined almost the same with "8" instead 
>>>>> of "6"
>>>>> and I have had no problems with it.
>>>>>
>>>>> The real NIC is configured as a DHCP client  for both IPv4 and 
>>>>> IPv6. It
>>>>> is assigned "nailed" addresses of 192.168.17.2/24 and
>>>>> fd00:dead:beef:17::2.
>>>>>
>>>>> And I just discovered why crazy stuff is happening (but I do not know
>>>>> what causes it) ... the P33p1 NIC has:
>>>>>    inet6 fd00:beef:10:6:3285:a9ff:fe8f:e982/64 scope global dynamic
>>>>
>>>> Is that the "real NIC"?
>>>>
>>> Yes, p33p1 is the real NIC.  This is going to be a real PITA to debug
>>> because I believe part of the problem is a race condition.
>>> NetworkManager has this really long dance it goes through to bring up
>>> the IPv6 interface.
>>>
>>> But, I do not have any proof of that and as I just proved to myself,
>>> getting things to repeat are going to be difficult.
>>>
>>> At this point I am not sure that bind-dynamic was related.  I went
>>> through the syslogs I still have and the first occurrence was on  8
>>> November.  That is well before bind-dynamic was integrated in.
>>>
>>> Attached are some limited copies of syslogs that I thought you might
>>> find of interest.  It seems like the "strangeness" seem to happen right
>>> after I update libvirt and libvirtd is restarted which then gets 
>>> dnsmasq
>>> started.
>>>
>>> If I cannot get this figured out and "fixed", I will need to disable 
>>> use
>>> of dnsmasq for RA service and fall back on radvd.
>>>
>>> Frustrating .. so close and yet so far!
>>>
>>
>> I wonder if the virbr* interfaces are bridged to the "real" NICs, 
>> such that when a prefix is advertised on the virbr interface, it 
>> causes the real interface to add an address for that prefix. Because 
>> dnsmasq is configured to advertise the prefix, that then causes the 
>> advertisements via the real NIC.
>>
>> Just a thought.
>>
> If I had not done the ip addr to get the above, I would still be 
> scratching my head.
>
> Anyway, here is ip addr:
> -----------------------------------------------
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 scope host lo
>     inet6 ::1/128 scope host
>        valid_lft forever preferred_lft forever
> 2: p33p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
> state UP qlen 1000
>     link/ether 30:85:a9:8f:e9:82 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.17.2/24 brd 192.168.17.255 scope global p33p1
>     inet6 fd00:dead:beef:17:1::2/128 scope global
>        valid_lft forever preferred_lft forever
>     inet6 fe80::3285:a9ff:fe8f:e982/64 scope link
>        valid_lft forever preferred_lft forever
> 10: virbr11: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc 
> noqueue state DOWN
>     link/ether 52:54:00:0b:84:5c brd ff:ff:ff:ff:ff:ff
>     inet 192.168.6.1/24 brd 192.168.6.255 scope global virbr11
>     inet6 fd00:beef:10:6::1/64 scope global
>        valid_lft forever preferred_lft forever
>     inet6 fe80::5054:ff:fe0b:845c/64 scope link
>        valid_lft forever preferred_lft forever
> 11: virbr11-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast 
> master virbr11 state DOWN qlen 500
>     link/ether 52:54:00:0b:84:5c brd ff:ff:ff:ff:ff:ff
> ------------------------------------
>
> And here is brctl show:
> -----------------------------------------
> bridge name    bridge id        STP enabled    interfaces
> virbr11        8000.5254000b845c    yes        virbr11-nic
> ---------------------------------------
>
> I think I will give it a rest until tomorrow!
>
I ran yet another test and this time it happened.  I am getting a little 
more info such as:
Dec  1 05:52:36 falcon dnsmasq-dhcp[23358]: ra_start_unsolicted(), 
len=60, type=14, flags=0, pid=5b5e

where most look like:
Dec  1 05:52:37 falcon dnsmasq-dhcp[23394]: ra_start_unsolicted(), 
len=64, type=14, flags=0, pid=0

1. Is there other information I can/should print out?

2. Is there anything I can do to identify why dnsmasq is "suddenly" 
using interface p33p1 when it was specifically configured to use 
interface virbr11?  I thought that this bind-interface and bind-dynamic 
were support to lock that dnsmasq instance into only servicing the 
specified interface.

Previously, libvirt's parameters to dnsmasq were bind-interface, 
listen-address=.  This is now replaced with bind-dynamic, interface= to 
fix a serious problem.  So, my question is whether the "correct" 
configuration should be bind-interface, interface= ?

The object is that no matter haw it is specified, the goal is that 
dnsmasq ONLY service the networks defined on a specific interface and to 
ignore anything from other interfaces whether that have the same network 
defined or not.

Question, what do you think would happen if there were to different 
processes on two different hardware platforms which share a common 
hardware-network fabric were to run stateful RA on one system and 
stateless RA on the other system?  Could that be happening here and, if 
so, why?

Gene
Gene
-------------- next part --------------
A non-text attachment was scrubbed...
Name: RTR-ADVERT-falcon-p33p1-2.log
Type: text/x-log
Size: 13267 bytes
Desc: not available
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20121201/ee3ddfa7/attachment.bin>