[Dnsmasq-discuss] Why is dnsmasq handing out the same IP to different MACs?

Paul Smith psmith at gnu.org
Mon Apr 12 20:16:03 BST 2010


Hi guys.  I have a strange problem.  I'm running Red Hat EL 5.3 with
dnsmasq 2.45 (Red Hat's package dnsmasq-2.45-1.el5_2.1 to be precise) on
a server to which a lot of blades are attached: there are 96 blades with
2 NICs per blade, for a total of 192 different IP addresses.  I've got a
dnsmasq config like:

dhcp-lease-max=255
dhcp-range=10.0.0.17,10.0.15.254,infinite


There's a very odd thing happening when I stop dnsmasq, remove my leases
file, then restart dnsmasq, then I restart all the blades at once: I'm
seeing dnsmasq hand out the same IP address to >1 different MAC address.
For example:

Apr 12 12:18:18 NZ80123-H1 dnsmasq[14036]: started, version 2.45 cachesize 150
Apr 12 12:18:18 NZ80123-H1 dnsmasq[14036]: compile time options: IPv6 GNU-getopt no-ISC-leasefile no-DBus no-I18N TFTP
Apr 12 12:18:18 NZ80123-H1 dnsmasq[14036]: DHCP, IP range 10.0.2.0 -- 10.0.15.254, lease time infinite
    ...
Apr 12 12:20:05 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:08:05 
Apr 12 12:20:05 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:08:05 
    ...
Apr 12 12:20:11 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b 
Apr 12 12:20:11 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:02:0b
    ...
Apr 12 12:20:14 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:06:07
    ...
Apr 12 12:20:15 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:0a:03
    ...
Apr 12 12:20:15 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:0c:01
    ...
Apr 12 12:20:15 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:04:09


Note that it's offered 10.0.5.15 to six different IP addresses... there
are plenty of IP's in the range so why do we overload them like this?

Then, later on, we get a problem when the first one registers, which
works fine, but then all the others get a NAK:

Apr 12 12:20:22 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.15 00:06:72:00:08:05
Apr 12 12:20:22 NZ80123-H1 dnsmasq[14036]: DHCPACK(bond2) 10.0.5.15 00:06:72:00:08:05
    ...
Apr 12 12:20:22 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.15 00:06:72:00:02:0b
Apr 12 12:20:22 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.15 00:06:72:00:02:0b address in use
    ...
Apr 12 12:20:23 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.15 00:06:72:00:06:07
Apr 12 12:20:23 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.15 00:06:72:00:06:07 address in use
    ...
Apr 12 12:20:23 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.15 00:06:72:00:0a:03
Apr 12 12:20:23 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.15 00:06:72:00:0a:03 address in use
    ...
Apr 12 12:20:24 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.15 00:06:72:00:0c:01
Apr 12 12:20:24 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.15 00:06:72:00:0c:01 address in use
    ...
Apr 12 12:20:24 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.15 00:06:72:00:04:09
Apr 12 12:20:24 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.15 00:06:72:00:04:09 address in use

After this, the other interfaces re-acquire a new IP address, but these
also end up being used already, until finally we get one that works for
us; for example here's the 02:0b MAC:

Apr 12 12:20:11 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:20:11 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:02:0b
Apr 12 12:20:16 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:20:16 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:02:0b
Apr 12 12:20:18 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:20:18 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.15 00:06:72:00:02:0b
Apr 12 12:20:22 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.15 00:06:72:00:02:0b
Apr 12 12:20:22 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.15 00:06:72:00:02:0b address in use
    ...
Apr 12 12:20:51 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:20:51 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.16 00:06:72:00:02:0b
Apr 12 12:20:55 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.16 00:06:72:00:02:0b
Apr 12 12:20:55 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.16 00:06:72:00:02:0b address in use
    ...
Apr 12 12:21:39 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:21:39 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.17 00:06:72:00:02:0b
Apr 12 12:22:36 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:22:36 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.17 00:06:72:00:02:0b
Apr 12 12:22:51 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.17 00:06:72:00:02:0b
Apr 12 12:22:51 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.17 00:06:72:00:02:0b address in use
    ...
Apr 12 12:22:51 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:22:51 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.18 00:06:72:00:02:0b
Apr 12 12:22:55 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.18 00:06:72:00:02:0b
Apr 12 12:22:55 NZ80123-H1 dnsmasq[14036]: DHCPNAK(bond2) 10.0.5.18 00:06:72:00:02:0b address in use
    ...
Apr 12 12:23:03 NZ80123-H1 dnsmasq[14036]: DHCPDISCOVER(bond2) 00:06:72:00:02:0b
Apr 12 12:23:03 NZ80123-H1 dnsmasq[14036]: DHCPOFFER(bond2) 10.0.5.19 00:06:72:00:02:0b
Apr 12 12:23:07 NZ80123-H1 dnsmasq[14036]: DHCPREQUEST(bond2) 10.0.5.19 00:06:72:00:02:0b
Apr 12 12:23:07 NZ80123-H1 dnsmasq[14036]: DHCPACK(bond2) 10.0.5.19 00:06:72:00:02:0b

Finally we find a free one... but note that this takes us 3 minutes!!

By that time the monitor programs that I use to verify that all 96
blades are up, has timed out: it's not expecting to have to wait all
this extra time for the DHCP to complete.

Note that once the system is up, then if I stop and restart it it works
fine, since the leases file gives a valid set of IP addresses.  But if I
delete my leases file and start from scratch, the problem re-occurs.

Is this expected behavior?  Although I suppose it does not violate the
standard since until both sides accept either side can change its mind
(IIRC from the DHCP handshake spec) it seems... sub-optimal :-).  Is
this a known issue with this version of dnsmasq?  Is it resolved in
newer versions?  Is there something I can do to work around it without
rolling my own newer version of dnsmasq?

Unfortunately for reasons that are too complicated to go into, I run
into the above situation a good bit during my internal testing and it's
causing heartburn.


Thanks all!




More information about the Dnsmasq-discuss mailing list