[Dnsmasq-discuss] [PATCH] Refuse to start with EADDRINUSE in --bind-dynamic mode

Simon Kelley simon at thekelleys.org.uk
Mon Nov 27 23:18:03 UTC 2023



On 25/11/2023 16:51, Petr Menšík wrote:
> Yes, the problem is 3) has a condition we wait until it changes then 
> retry. But for a lot (most?) of errors we lack any indication from the 
> system it has changed.
> 
> For example insufficient memory or insufficient file descriptors. It may 
> change, but unlike watching up and down interfaces, there is no hook 
> which would retry listener creation. It fails once and then just maybe 
> retries on explicit reload. That is why I think it is absolutely 
> necessary to log any failure we pass somewhere, unless we know we will 
> retry later.

You're right, the only error from bind() that should be ignored is 
EADDRNOTAVAIL. everything else should be a fatal error during startup or 
logged once  the daemon is running.

I've just pushed a patch to that effect.

Cheers,

Simon.

> 
> More below...
> 
> On 11/23/23 13:47, Simon Kelley wrote:
>> That's a good point, but I don't think there needs to be any non-fatal 
>> error logging. There are three situations during startup.
>>
>> 1) bind() succeeds.
>> 2) bind fails for a reason which won't change - fatal error.
>> 3) bind fails for a reason which may change - startup and wait until 
>> it does change and try again.
>>
>> The canonical example of 3) is the one I gave before, 
>> --listen-address=1.2.3.4 but not local interface has address 1.2.3.4. 
>> The intention is that when a new interface comes up with address 
>> 1.2.3.4 then a new socket will be created and bound. This is long 
>> after startup, so the only option if it fails then is to log the event.
> Of course, this is very special case somehow well handled. I agree there 
> is not much else to do. We could only make the error fatal, but I don't 
> think that is desired.
>>
>> If the only situation where we want to wait is the one above, then the 
>> solution to to make EADDRNOTAVAIL at startup the only one where we 
>> keep waiting, and all the others are fatal. I think when I originally 
>> wrote this I wasn't sure if that was the only non-fatal error which is 
>> why the code is as it is.
>>
>> This is not a complete solution to your original problem of enforcing 
>> only one dnsmasq daemon process in any case. For example if you 
>> configure a single listen-address which doesn't exist on the machine, 
>> then you can start as many dnsmasq processes as you like and they'll 
>> all start up and be waiting for the interface with that address to be 
>> created. Once it is, all will try and bind it, and all but one will 
>> fail, but they'll all still exist. Managing daemon processes is really 
>> the job of sysvinit or systemd, but the authors of the bug seem to 
>> sant protection from just running the binary from the command line.
> 
> We at Fedora support only services managed by systemd. But even for 
> that, it needs to get some feedback of failure. If the process 
> terminates with non-zero status code, unit will be marked failed. We 
> *need* that. Alternative might be support for libsystemd with notify 
> socket, which would work with Type=notify services. Now it will report 
> failed startup only with Type=forking. Later failure is logged only as a 
> warning regardless of type of the error. I think we want unexpected 
> error types to be logged as errors, especially for insufficient 
> resources errors like ENOMEM. Or made them fatal. With systemd unit 
> Restart=on-failure, it might be able to recover from memory leaks if 
> such errors were fatal. Not sure we want that, might break a lot of 
> deployments, but also fix some.
> 
>>
>> TLDR;
>>
>> We either pick a set of errors which are Ok to continue 
>> (EADDRNOTAVAIL, what others?) and fail fatally at startup for all 
>> others, or we pick a set of errors to fail fatally at startup 
>> (EADDRINUSE, EACCESS, what others?) and continue for all others.
>>
>>
>> Cheers,
>>
>> Simon.
> 
> I would say safer would be to fatal error everything except explicitly 
> waived, for now just EADDRNOTAVAIL and EINTR? I think most of these 
> errors means incomplete degraded service anyway, without reliable 
> self-repair code present. If it had repeat timer with exponentially 
> increasing time of retry (with some upper bound), then we might want it 
> to start anyway. But I think it is safer to prevent half-initialized 
> service. Systemd can provide autorecovery with smart settings. Do we 
> have a way to specify I do not require TCP listening socket for DNS? It 
> should be clearly discouraged, but for some kinds of tests it might be 
> acceptable.
> 
> Cheers,
> Petr
> 
>>
>>
>> On 23/11/2023 11:13, Petr Menšík wrote:
>>> To fix problem with multiple instances correctly refusing running on 
>>> the same machine and namespaces, yes, it would be sufficient.
>>>
>>> But I think part of the problem is hiding all problems during startup 
>>> and not showing them at all, in any source. I think that is okay for 
>>> EADDRNOTAVAIL to not be printed. But I think in other cases we want 
>>> at least warning somewhere. This way you also get exact error message 
>>> printed. For example selinux policy hardening may prevent your 
>>> process to listen on port 53, even though it has NET_BIND_SERVICE.
>>>
>>> With my modification it will print errors for listeners used. Note 
>>> 10.1.2.3 is hidden at that phase. You would not know it without 
>>> strace analysis. I expect there can be different errors, for example 
>>> running out of file descriptors or memory. Hiding something 
>>> non-standard happened during startup is quite a bad design. Only some 
>>> kinds of errors are okay during startup.
>>>
>>> $ sudo -u nobody fedora-like/dnsmasq -d --bind-dynamic 
>>> --listen-address=10.1.2.3
>>> dnsmasq: failed to create listening socket for 127.0.0.1: Permission 
>>> denied
>>> dnsmasq: failed to create listening socket for 127.0.0.1: Permission 
>>> denied
>>> dnsmasq: failed to create listening socket for ::1: Permission denied
>>> dnsmasq: failed to create listening socket for ::1: Permission denied
>>>
>>> dnsmasq: process is missing required capability NET_BIND_SERVICE
>>>
>>> # Compare this with:
>>>
>>> $ sudo -u nobody fedora-like/dnsmasq -d --bind-interfaces 
>>> --listen-address=10.1.2.3
>>> dnsmasq: failed to create listening socket for 127.0.0.1: Permission 
>>> denied
>>>
>>> I think we want any errors printed, even if they are not made fatal. 
>>> Except carefully chosen type of errors, which are expected and would 
>>> raise just false alarms. Not sure how to trigger other types of 
>>> errors, but I am sure I would like to see them, even if they did not 
>>> cause the process to die. That is why I have used more complicated 
>>> approach, which should print everything unexpected, even when dnsmasq 
>>> is not stopped. In order to investigate you first have to know 
>>> something unusual has happened.
>>>
>>> On 23. 11. 23 0:29, Simon Kelley wrote:
>>>> Isn't this sufficient to fix the problem?
>>>>
>>>> Not calling die() when bind-dynamic is set is intended to handle the 
>>>> case that  bind returns EADDRNOTAVAIL because you've configured 
>>>> --listen-address=1.2.3.4 but there's not a local interface with that 
>>>> address. dnsmasq runs anyway in the expectation that such an 
>>>> interface will appear in the future and a socket will be bound then.
>>>>
>>>> I don't think there's a die()/syslog() conflict at all.
>>>>
>>>>
>>>> diff --git a/src/network.c b/src/network.c
>>>> index ca9fada..db1d528 100644
>>>> --- a/src/network.c
>>>> +++ b/src/network.c
>>>> @@ -924,7 +924,7 @@ static int make_sock(union mysockaddr *addr, int 
>>>> type, int dienow)
>>>>         {
>>>>           /* failure to bind addresses given by --listen-address at 
>>>> this point
>>>>              is OK if we're doing bind-dynamic */
>>>> -         if (!option_bool(OPT_CLEVERBIND))
>>>> +         if (!option_bool(OPT_CLEVERBIND) || errno == EADDRINUSE)
>>>>             die(s, daemon->addrbuff, EC_BADNET);
>>>>         }
>>>>        else
>>>>
>>>> Cheers,
>>>>
>>>> Simon.
>>>>
>>>> On 22/11/2023 19:27, Petr Menšík wrote:
>>>>> Hello everyone,
>>>>>
>>>>> I have received error report RHEL-16398 [1], which I think makes 
>>>>> sense to fix even in the lastest version. I believe it allows 
>>>>> non-intentional another instance running without error. What is 
>>>>> worse, it does not even show any warning that initialization is 
>>>>> incomplete.
>>>>>
>>>>> Of course the problem at start is those errors happen in time when 
>>>>> no log is available. I think that can be fixed easily by using 
>>>>> stderr at that time. That is patch #1.
>>>>>
>>>>> Second makes EADDRNOTAVAIL bind errors still hidden, but prints all 
>>>>> other errors at least to stderr. On a system with systemd that 
>>>>> should make it present in journalctl -u dnsmasq anyway. EADDRINUSE 
>>>>> is made fatal, because that would not be usually handled by new 
>>>>> addresses added later. If there is a need to start another dnsmasq 
>>>>> instance without TCP listeners, I think that should be specified 
>>>>> more explicitly. Makes EADDRINUSE fatal the same way as with 
>>>>> --bind-interfaces.
>>>>>
>>>>> Would you find any other errors, which should be hidden or made 
>>>>> fatal? What would you think of those changes?
>>>>>
>>>>> 1. https://issues.redhat.com/browse/RHEL-16398
>>>



More information about the Dnsmasq-discuss mailing list