[Dnsmasq-discuss] [PATCH] Do not crash when Ubus connection fails.
simon at thekelleys.org.uk
Sun Jun 27 17:48:49 UTC 2021
On 27/06/2021 00:22, Etan Kissling wrote:
> When using multiple dnsmasq instances Ubus only connects on one of them.
> Since 3c93e8eb41952a9c91699386132d6fe83050e9be dnsmasq crashes instead.
> This change avoids the crash, leading to a graceful retry + error log.
> Signed-off-by: Etan Kissling <etan.kissling at gmail.com>
> src/dnsmasq.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
> diff --git a/src/dnsmasq.c b/src/dnsmasq.c
> index 04582da..2b4291b 100644
> --- a/src/dnsmasq.c
> +++ b/src/dnsmasq.c
> @@ -449,10 +449,8 @@ int main (int argc, char **argv)
> if (option_bool(OPT_UBUS))
> #ifdef HAVE_UBUS
> - char *err;
> daemon->ubus = NULL;
> - if ((err = ubus_init()))
> - die(_("UBus error: %s"), err, EC_MISC);
> + (void) ubus_init(); /* Logging not set up yet. */
> die(_("UBus not available: set HAVE_UBUS in src/config.h"), NULL, EC_BADCONF);
The gen on my_syslog and die() are:
1) During startup, before logging is set up, it's not allowed to call
my_syslog(). It may appear to work, but won't in all cases. The only
errors allowed are fatal errors with die().
2) The crucial points are at lines 627 and 700 in dnsmasq.c, before 627
you can die(). After 627 you can't die(), dnsmasq has returned success
return code and forked. After 700 you can log stuff. Between 627 and 700
is in the "you are not expected to understand this" class. If something
fails before 627 that you want to log, save the information in variables
local to main() and call my_syslog() after line 700.
My change made the ubus code work in the same way as DBus. It expects
that ubus_init() will return a non-NULL error report if something
unexpected and nasty happened. (maybe a configuration that can never
work.) If the Ubus connection cannot be made, but that's expected to
change then ubus_init() should return NULL, and leave daemon->ubus set
to NULL. In that case ubus_init() will be called again, and can either
succeed, leave daemon->ubus still as NULL (in which case it will be
called again and again....) or return a fatal error, which can by now
only be logged. ubus_init() will continue to be called each time through
the event loop.
Since the only way ubus_init() can return a non-NULL is if
ubus_add_object() fails, are we saying that that is not a fatal error
and a retry may work? Or is it the case that if it fails once, it will
always fail, but you want dnsmasq to run without ubus anyway? If the
former, then just don't return the error string when ubus_add_object()
fails. If the later, we may need to extend the error-handling mechanism.
In any case, I'm minded, for both ubus and dbus, to treat an error
return from [ub]bus_init() _after_ dnsmasq has started as a reason to
turn off the subsystem, and discontinue all calls to *_init(). The
current code looks like an invitation to an endlessly spinning CPU and
Looking in src/ubus.c, there seems to be a mechanism to reconnect to the
ubus, and if that fails, daemon->ubus can end up as NULL again, having
been set up correctly beforehand, so that ubus_init() will start to get
called again. That code path feels dodgy, and it would be nice to see
what actually happens when it's run.
Looking again, there's also a code path in check_ubus_listeners() that
can delete the ubus connection and set daemon->ubus back to NULL. That
will start calls to ubus_init() again. I wonder if that is correct?
TL;DR If ubus_add_object() failures are soft, then don't return an error
string from ubus_init() when they happen. If ubus_add_object() failures
are hard, then the call to die() looks good to me, but if you want
dnsmasq to continue without a ubus connection then we need to arrange
that ubus_init() is not called repeatedly in that case. There are a
couple of paths in the ubus code that set daemon->ubus back to NULL, and
I'm not sure things will work sensibly if they are ever taken.
More information about the Dnsmasq-discuss