[Dnsmasq-discuss] [PATCH] Offered IPv4 DHCP multiple times

Petr Menšík pemensik at redhat.com
Mon Dec 20 19:45:15 UTC 2021


On 12/16/21 14:51, Simon Kelley wrote:
> On 13/12/2021 23:04, Petr Menšík wrote:
>> Hello Simon and others.
>>
>> In certain situations, dnsmasq DHCP will offer multiple different
>> clients single IP address. Later it ACKs the first client and NACK the
>> second. It relies on ability of those clients to retry, but it seems
>> netbooting software often cannot recover from such behaviour.
> Thne netbooting software is broken. This is not a surprise.
No, I do not think netbooting software should be blamed. DHCP server
should track different clients and should not rely on clients being able
to recover from NACK. dnsmasq should NOT offer the same address to
multiple clients IF enough unused addresses are available. It seems
other implementations do not require client to handle such unusual
situation. Using NACKs slows down client interface configuration speed,
I think we want to avoid it. I expect more Internet of Things
implementations would have issues with current approach, but haven't
tested that.
>> Attaching script I use to reproduce this issue. Just create some local
>> bridge, have a limited IP pool on its dhcp-range option. Just few
>> addresses above actual number instances. They start roughtly at the same
>> time.
>>
>> There is also pcap file in linked bug with some other reports. Good
>> summary is in commetn 85 [3].
>>
>> In attached patches, I introduced thing I call temporary leases. Those
>> leases are never saved into leased file. They have short time duration,
>> set to 30 s same as ping timeout. It ensures even with
>> dhcp-sequential-ip, different clients have reservations for different
>> addresses. It helps especially in case --no-ping is used. Without this
>> change it takes quite long to retry multiple times
>> discover-offer-request solutions. Because pings contain sort of
>> workaround for this deficiency, but will cover only 6 different
>> addresses in default configuration. Then it switches to overload,
>> similar to no-ping. Then it offers multiple clients the same address,
>> but when the 2nd client requests the lease, it denies it again.
> The ping-cache was never intended to fix this problem, not least because
> it can be disabled with --no-ping. The way it's intended to work with a
> busy server is that clients are offered addresses based on a hash of
> their hardware address, and the pool of addresses is large enough to
> avoid most collisions. This clearly doesn't work when sequential
> addresses are enabled, since the offered addresses are not randomised,
> so the ping-cache is used as a band-aid. It doesn't make things work at
> high loads.
>
> If dnsmasq is configured suitably for the requirements, ie --no-ping
> set, --dhcp-sequential-ip NOT set and an address pool significantly
> larger than the expected number of clients, is there still a problem?
I think it is still there, just less likely to be visible. I think with
less IPv4 addresses available server deployments cannot rely on large
unused address pool. It is not problem with small home deployments of
private addresses, but when working with public addresses on cloud
infrastructure it is visible. Much less likely, but still possible hard
to detect issues. Because dnsmasq is also used on cloud technologies
like lxc or openstack, I do not think we can just waive it and recommend
using double count of addresses than actually used. Public IPv4
addresses are expensive.
>> I think I were able to find relative simple algorithm. I think IPv6
>> should receive similar approach. We side-stepped this by offering
>> different address in ACK in thread [4]. While it seems that works, I
>> think it would be better to not offer address it later rejects itself.
>> We test DHCP clients abilities for no good reason.
>>
>> With this patches, even multiple clients without ping boot fast enough,
>> even when they start at similar time. Starting at similar time if common
>> thing on boot of cloud hosting instances, which may use dnsmasq for
>> local caching. OpenStack is example that recorded it, but it can happen
>> even in normal machines. For example in a classroom with 10 computers.
>>
>> Would you look at it or test it, whether some issues with those changes
>> can be found?
>>
> I'll take a look.
>
>
> Cheers,
>
> Simon.


Thank you. Merry Christmas Simon.

>
>> Cheers,
>> Petr
>>
>> 3. https://bugzilla.redhat.com/show_bug.cgi?id=2028704#c85
>> 4.
>> https://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2021q3/015585.html
>>
>> On 12/8/21 01:18, Petr Menšík wrote:
>>> Hi Simon and others,
>>>
>>> I am debugging strange issue, which happens inside OpenStack in
>>> certain situations. It seems under not precisely defined conditions
>>> dnsmasq returns "no address available" error even in situation, when
>>> not yet all leases are used.
>>>
>>> It seems do_icmp_ping is responsible for ruling out recently tried IP
>>> addresses. It seems a bit weird address allocation happens only for
>>> addresses recently not pinged. I have found another place which does
>>> do_icmp_ping, but does not use hash value computed from hardware
>>> address. Even when it is already known at that time. First patch
>>> attached adds hash also to second place. That should mean single
>>> address would use shared ping. The second patch simplifies a bit
>>> do_icmp_patch and its return value. Instead of artificially ensuring
>>> hash would match, just return correct value when hash matches. The
>>> second change is just optional optimization.
>>>
>>> Few details are at RH bug #2028704 [1]. Original tested version 2.79
>>> did not contain commit 0669ee7a69a
>>> <http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=0669ee7a69a004ce34fed41e50aa575f8e04427b>
>>> [2], which improves the situation. But I think there remain cases when
>>> ping is not accepted when it should be. Testing with latest release
>>> did not work according to report. I think the first patch may fix
>>> still missing part.
>>>
>>> Cheers,
>>> Petr
>>>
>>> 1. https://bugzilla.redhat.com/show_bug.cgi?id=2028704
>>> 2.
>>> http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=0669ee7a69a004ce34fed41e50aa575f8e04427b
>>>
-- 
Petr Menšík
Software Engineer
Red Hat, http://www.redhat.com/
email: pemensik at redhat.com
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB




More information about the Dnsmasq-discuss mailing list