[Dnsmasq-discuss] Dnsmasq not resolving addresses for an hour
Albert ARIBAUD
albert.aribaud at free.fr
Wed Oct 19 06:53:04 BST 2016
Hi John,
Le Tue, 18 Oct 2016 22:36:07 +0000
John Knight <John.Knight at belkin.com> a écrit:
> Hi All,
> The main while(1) loop uses select() to determine if it has work to
> do. In most cases, it appears to use timeout of 0, which I believe
> means just wait indefinitely for work on the file descriptors. Other
> times, it appears that the timeout is set to a quarter second when
> doing a tftp transfer or polling the dbus.
>
> Now what concerns me is that when a "retry later" condition occurs,
> we may get stuck on the select() for a long period of time. Alas, I
> do not know how frequent one might expect to see work arrive on the
> file descriptors that select is watching, so I don't really know if
> this is a long time or not. It seems though that in this failure
> scenario, the poll_resolv() function does NOT get called very often
> at all.
Actually, if dnsmasq does not receive any request from clients, it does
not need to poll servers, so I would ask: does the select() include
descriptors for client requests (either UDP datagrams received, or TCP
connections opened)? If so, I think it will exit just when necessary
and no tiemout is needed; otherwise, you are right that a timeout is
required.
Also, it may be improbable that select() does not return for a whole
hour; but then, is every return from select() followed by a resolv file
poll, or can select() return and then be entered again without polling
the resolv files? I am thinking, for instance, about cached answers
which do not need servers if their TTL is long enough.
> My gut feeling is that there always needs to be a timeout on the
> select call as the poll_resolv() should be called fairly frequently.
> The code that exists today where poll_resolv() normally is called
> from this loop suggests a poll rate of about once a second. This
> definitely does not happen today. By just adding a my_syslog()
> message to the top of poll_resolv(), it is very clear from the
> logfile that it is not called often, and way to infrequently to
> resolve the "retry later" condition in a timely manner.
Can you compare when poll_resolv() is called wrt when the select() is
exited -- and for what reason?
> Going forward, as the next thing for me to try, I am going to add a
> timeout for the select... perhaps a modest once a second or two.
I would personally investigate further on a gut feeling without
changing the code behavior, because my changes might have unwanted
effects which can actually hide the root cause I am looking for -- but
to each his/her own.
> But I would like to know what you all of think of this... does this
> make sense to do? Is there ever a case where we might not get any
> work on the files select is monitoring for nearly an hour? I am
> trying to make sense of this issue.
Not entirely sure what you mean with "Is there ever a case where we
might not get any work on the files select is monitoring for nearly an
hour"; I will assume you mean "Is there a normal case where dnsmasq
would not poll for changes in resolv files for an hour". If so, then I
would say it depends on how much traffic dnsmasq receives and how much
of it can be answered from cache.
> Thanks,
>
> John Knight
Amicalement,
--
Albert.
More information about the Dnsmasq-discuss
mailing list