[Dnsmasq-discuss] [PATCH] --bind-dynamic and fast netlink changes

Simon Kelley simon at thekelleys.org.uk
Mon Mar 1 00:02:33 UTC 2021


That looks sensible except for one thing. I wasn't sure about the
logging in the first place, and having to add Yet Another Config Option
to control it is the last straw; I think it's better just removed.


Will apply the others and test tomorrow.


Cheers,
Simon.



On 25/02/2021 18:48, Petr Menšík wrote:
> I have found important issue in first patch, sending a new set.
> This time, it should really fix all issues I reproduced using my test.
> I dropped NETLINK_NO_ENOBUFS setting. It only prevents kernel sending
> userspace notice some messages were dropped. Which I think is important
> information. Instead I made restarting interface reading, including
> resetting not found interfaces. If ENOBUFS arrives, it would force
> re-reading both IPv6 and IPv4 interfaces. Should reduce attempts to
> listen on interfaces removed in the mean time.
> 
> It reduces number of resynchronizations, because it always enqueue only
> one event per all netlinks events. It reads events in loop inside
> netlink_multicast() until EAGAIN is received. It should help draining
> asynchronous messages much more quickly.
> 
> I made just one big patch for core fix, patch 0001. I am leaving broken
> down commits at github [1], if a better work steps are visible. I
> rebased several times, it does not show my exactl workflow.
> 
> Following patches are supposed to be improvements.
> - Patch 0002 removes fnctl in netlink_multicast, reduces number of
> kernel syscalls per message.
> - Patch 0003 adds a new --log-listen option and removes debug logging by
> default. It was me who added that and I still find it quite useful. But
> I think logging can be quite slow, slowing significantly on many address
> changes. Might be better with --log-async. User would have to use
> --log-listen if interested in watching listeners changes.
> - Patch 0004 - Obtain MTU only in case it would be used. Attempt to
> reduce innecessary syscall inside iface_enumerate loop in some cases.
> 
> 1. https://github.com/InfrastructureServices/dnsmasq/tree/netlink-tests
> 
> On 2/25/21 1:19 AM, Simon Kelley wrote:
>> On 18/02/2021 11:56, Petr Menšík wrote:
>>> Hi Simon and others,
>>>
>>> I have started checking behaviour of dnsmasq on fast netlink changes,
>>> reported originally on RHEL7 bug[1]. Found commit 1627d577[2] helps a
>>> lot on RHEL 7, which is already in current version. But for some reason,
>>> even latest dnsmasq 2.84 does not pass my test[3] on RHEL8 (bug #1927973
>>> [4]).
>>>
>>> The core of the test is repeating few times this:
>>> one_retry() {
>>> 	local MAX=254
>>> 	local DOWN_DELAY=0
>>> 	for X in `seq 1 $MAX`; do
>>> 		ifconfig eth0:${X} 100.123.1.${X} netmask 255.255.255.0
>>> 		#ip address add 100.123.1.${X} dev eth0 label eth0:${X}
>>> 	done && sleep $DOWN_DELAY && for X in `seq 1 $MAX`; do
>>> 		ifconfig eth0:${X} down
>>> 		#ip address del 100.123.1.${X} dev eth0 label eth0:${X}
>>> 	done
>>> }
>>>
>>> Then checking dnsmasq stops listening on them again.
>>>
>>> I found by analysing strace of test, it re-reads current addresses
>>> innecessary often. It enqueues re-reading every time multicast netlink
>>> arrives. That means a lot of CPU cycles, when it just sends newaddress()
>>> event request every time. It seems to me, just once per multicast
>>> messages row is enough. It needs to know just current state to run
>>> newaddress() just last time.
>>>
>>> I have tested my patches with my test and it helps, works also on RHEL8.
>>> I think it takes a lot less netlink reading without affecting accuracy.
>>> Patches 1 and 2 are fixing things. Patches 3 and 4 are just attempt to
>>> optimize speed of netlink handling a bit. Changed netlink_multicast to
>>> read in a row, until EAGAIN appears. Should help to drain netlink socket
>>> faster, avoiding ENOBUFFS.
>>
>> I need to stare at this for a bit longer, but it looks as if it's on the
>> right lines.
>>
>> One thing to consider: the actions following a NEWROUTE event are to
>> resend a DNS packet when the first DNS packet triggers a modem
>> dial-on-demand. Modems doing dial-on-demand is pretty quaint, and
>> something that could probably be removed without inconveniencing anyone
>> in the 2020s.
> 
> Depends whether it isn't used also in other circumstances, like
> automatic restart after lost link on uplink interface. Number of route
> messages is not small, maybe manual option for requesting route watching
> should be used. I think it could be removed, unless it has hidden benefits.
>>>
>>> I think it should handle errors better in create_listeners. If tcp fails
>>> to bind on address, just UDP received would listen. Problem is
>>> create_bound_listeners does not know it is incomplete, it would be fixed
>>> only by deletion of interface address and re-adding it again. But that
>>> is related but different problem, I have no fix candidate yet.
>>
>> In newaddress, check for this error after calling
>> create_bound_listeners() and loop back to enumerate_interfaces() until
>> it gets a clean run? It that in danger of looping forever?
> It could be repeated only few time, let's say 5 times.
>>
>> The nasty logic that inhibits interface enumeration more than once per
>> select loop in enumerate_interfaces() would have to be finessed.
> But I modified enumerate_interfaces() to restart always both ipv4 and
> ipv6 on ENOBUFS. It improved a lot and at least my test haven't hit this
> issue again.
>>
>>>
>>> When it fails, lines similar to this appear in log:
>>> failed to create listening socket for 100.123.1.250: Cannot assign
>>> requested address
>>>
>>> What do you think? Have you ever got dnsmasq --bind-dynamic into state,
>>> where it required restart to continue serving correct addresses?
>>
>> I'm not aware of that happening, but I suspect I don't do the sort of
>> things which make it happen.
> I admit it is rare and requires frequent change in addresses together
> with limited computing power when reading netlink.
>>
>>
>>
>> Cheers,
>>
>> Simon.
>>
>>>
>>> Also opinions welcome.
>>>
>>> Thanks,
>>> Petr
>>>
>>> 1. https://bugzilla.redhat.com/show_bug.cgi?id=1887649
>>> 2.
>>> http://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=1627d577af03cdf747285e79fa747b6aaae8033f
>>> 3.
>>> https://github.com/InfrastructureServices/dnsmasq-tests/tree/master/Regression/netlink-interface-fast-changes
>>> 4. https://bugzilla.redhat.com/show_bug.cgi?id=1927973
>>>




More information about the Dnsmasq-discuss mailing list