[Dnsmasq-discuss] Why is dnsmasq handing out the same IP to different MACs?
psmith at gnu.org
Mon Apr 12 22:47:10 BST 2010
On Mon, 2010-04-12 at 20:51 +0100, Simon Kelley wrote:
> You've hit an unfortunate set of circumstances. What happens is that,
> for the first phase of the DHCP trasnsaction (DHCPDISCOVER/DHCPOFFER)
> dnsmasq picks an address to offer based on a hash of the MAC address.
> That means it doesn't have to record any information about what
> addresses it is offering to which host. This makes the database
Yes, that does sound like a nice simplification.
> You are seeing problems because you are running lots of hosts through
> the address-aquisition process simultaneously and their MAC addresses
> are all very similar because they have the same manufacturer. This is
> causing the rather unsophisticated hash function to generate lots of
Would it be better to give the lower octets in the MAC more impact on
the hash, on the assumption they will be "more random" in general than
the higher octets which are vendor-based?
The other problem is our bucket count is not well-chosen for hashing:
more than likely it's a nice even number of potential IP addresses. I
don't know what we can do about that, though.
> All non-ancient versions of dnsmasq use the same hash function so you
> can't improve things by changing versions. I'm not clear why you need
> to delete the lease database, the problem would be fixed by leaving it
> in place and using long leases. You would only need to take the pain
> of address-allocation once, and if you batched the blades you could
> probably avoid the collisions.
I can't really batch the blade boot: the reboot happens as the result of
a multicast UDP operation to the entire subnet (individually contacting
so many blades one by one is a pain). Plus I don't WANT my boots to
take any longer than they already do: they're already too slow. The
BIOS POST/PXE/etc. takes a long time so in order to delay the boot
enough to make a difference I'd need to wait a pretty long time, before
the IP was really allocated. And who knows if my systems are batched in
the right way to avoid the collisions?
I delete the lease database when the system is reallocated or
reconfigured, to be sure that there aren't left-over leases. Currently
I have an infinite lease timeout, mainly because my blades are running a
small embedded system and there's no DHCP client daemon on them to renew
(they do a one-off request using BusyBox's udhcpc which exits once they
have a lease). This does mean (as I understand it) that I need to
delete the lease database periodically because once an IP is assigned it
will never be reassigned if it has an infinite lease (is that right?):
if so over time as blades come and go I'll end up running out of
leases... I can bump the max leases up from 255 to make that less
likely, but still.
Hm. Maybe I can do some cleanup by removing any lease without a
hostname assigned. All my blades get a hostname based on their position
in the system so new blades will "steal" hostnames from the blades they
replace. Any lease without a hostname can be removed as not needed.
I'll think about that.
> If you can fiddle with that, and get something which works better, I'd
> be interested to see the patch.
I'll poke at it although obviously what works for my environment might
not be so useful to others, with different MAC distributions.
I do wonder whether we can improve things on the NAK end, though: it
seems like it would be nice to improve the failure case so that it
wasn't pathological: first round we get N collisions with 1 success and
N-1 failures; the failures all add one to their IP addresses and we get
N-1 collisions on the second round, etc., which basically means we'll
have to run N rounds of NAK resolution before everyone has their own IP.
Couldn't we add a random number to this on NAK, or maybe add the least
significant octet in the MAC (+1 in case the value is 0 :-)) instead of
adding 1, or something like that, so we are less likely to collide?
More information about the Dnsmasq-discuss