[Dnsmasq-discuss] Dnsmasq not resolving addresses for an hour

Vladislav Grishenko themiron.ru at gmail.com
Fri Oct 14 07:52:33 BST 2016


Hi, Albert,

> 1. HAVE_BROKEN_RTC should be used for, well, broken RTCs. Here, we are
> not dealing with broken RTC.

Root issue from original mail:
> One of which acknowledges potential problem if the clock goes backwards...
As for me it's indeed broken RTC behavior, not?

> 2. The man mage for times() states that "a portable application would be
>    wise to avoid using [the] value [returned by times()]. To measure
>    changes in elapsed time, use clock_gettime(2) instead".

Because start value of posix's times() return value may vary across kernel versions & UNIX impl., combined with possibility of value overflow the clock_t range.
Since we don’t care neither about initial boot value nor about sleep/suspended time (files can't be modified when suspended, right?) - the only possible issue is overflow.
Since times() not counting CPU ticks in sleep/supspended mode, suggestion clock_gettime here is about using CLOCK_MONOTONIC which is almost the same, but with no-overflow API.

> - otherwise, if CLOCK_MONOTONIC is defined (it should always) and if
>   clock_gettime(CLOCK_MONOTONIC,...) succeeds at run time, use that;

Even with defined CLOCK_MONOTONIC, the real presence of this clock source can be retrieved from kernel in runtime only.
Yes, there're old running kernels with no CLOCK_MONOTONIC, clock_gettime() returns EINVAL. Same check is true for CLOCK_BOOTTIME. 

Best Regards, Vladislav Grishenko

> -----Original Message-----
> From: Albert ARIBAUD [mailto:albert.aribaud at free.fr]
> Sent: Friday, October 14, 2016 2:40 AM
> To: Vladislav Grishenko <themiron.ru at gmail.com>
> Cc: John Knight <john.knight at belkin.com>; dnsmasq-
> discuss at lists.thekelleys.org.uk; Simon Kelley <simon at thekelleys.org.uk>
> Subject: Re: [Dnsmasq-discuss] Dnsmasq not resolving addresses for an hour
> 
> Hi,
> 
> I think it is preferable not to use HAVE_BROKEN_RTC for at least two
> reasons, in increasing order of importance:
> 
> 1. HAVE_BROKEN_RTC should be used for, well, broken RTCs. Here, we are
> not dealing with broken RTC.
> 
> 2. The man mage for times() states that "a portable application would be
>    wise to avoid using [the] value [returned by times()]. To measure
>    changes in elapsed time, use clock_gettime(2) instead".
> 
> But you are right that CLOCK_BOOTTIME is Linux specific (I did mention that,
> in fact).
> 
> So my proposal would become:
> 
> - if CLOCK_BOOTTIME is defined as compile time, and if
>   clock_gettime(CLOCK_BOOTTIME,...) succeeds at run time, use that;
> 
> - otherwise, if CLOCK_MONOTONIC is defined (it should always) and if
>   clock_gettime(CLOCK_MONOTONIC,...) succeeds at run time, use that;
> 
> - otherwise, if CLOCK_REALTIME is defined (it should always) and if
>   clock_gettime(CLOCK_REALTIME,...) succeeds at run time, use that;
> 
> - otherwise, as a last resort, use times().
> 
> Amicalement,
> Albert.
> 
> Le Thu, 13 Oct 2016 20:15:15 +0000 (UTC) Vladislav Grishenko
> <themiron.ru at gmail.com> a écrit:
> 
> > Hi,
> > Why not just use existing HAVE_BROKEN_RTC?CLOCK_BOOTIME is
> > linux-specific, non-portable, absent in older (but still running)
> > kernels and logically is the same as CLOCK_MONOTONIC except counting
> > suspended/sleep time. In turn using CLOCK_MONOTONIC is already there
> > in times() form when HAVE_BROKEN_RTC is enabled.
> >
> > Best Regards, Vladislav Grishenko
> >
> > 		_____________________________
> > From: John Knight <john.knight at belkin.com>
> > Sent: четверг, октября 13, 2016 11:00 ПП
> > Subject: Re: [Dnsmasq-discuss] Dnsmasq not resolving addresses for an
> > hour To: Albert ARIBAUD <albert.aribaud at free.fr>
> > Cc:  <dnsmasq-discuss at lists.thekelleys.org.uk>
> >
> >
> > Hi Albert,
> >
> > That sounds like a very good idea to use CLOCK_BOOTTIME. Good
> > suggestion.
> >
> > When I did a search for difftime in the source code... there are quite
> > a few calls... each one is a potential issue with respect to time
> > going backwards.  I only see one instance that actually considers the
> > case if time goes backwards and that is in dnsmasq.c where it does
> > difftime(now, daemon->last_resolv) and compares the
> > result to both > 1.0 and < -1.0.   So in general, I am somewhat
> > concerned about possible affects of changing time on dnsmasq.  We have
> > seen some issues in the past which we suspected were probably caused
> > by changing the time, so your suggested change could potentially fix
> > some other issues.
> >
> > Thanks!
> >
> > John
> >
> >
> >
> >
> > -----Original Message-----
> > From: Albert ARIBAUD [mailto:albert.aribaud at free.fr]
> > Sent: Thursday, October 13, 2016 2:16 AM
> > To: John Knight
> > Cc: dnsmasq-discuss at lists.thekelleys.org.uk
> > Subject: Re: [Dnsmasq-discuss] Dnsmasq not resolving addresses for an
> > hour
> >
> > Hi,
> >
> > Just a generic comment: from what I can see, all absolute times in
> > dnsmasq are returned bu dnsmasq_time() which calls either times() or
> > time(). This, IIUC, corresponds to CLOCK_REALTIME in clock_gettime(),
> > which is indeed affected when (re)setting the time.
> >
> > Maybe a fix to time jump issues would be (in Linux at least) to
> > replace time() with clock_gettime(CLOCK_BOOTTIME,...) which seems to
> > have been designed to get around discontinuities caused by
> > settimeofday().
> >
> > Note: maybe dates used for logging purposes should still use time() or
> > clock_gettime(CLOCK_REALTIME) in order to remain comparable to other
> > logs in the same system -- or maybe not.
> >
> > Sources: man times, man time, man clock_gettime.
> >
> > HTH,
> >
> > Amicalement,
> > Albert.
> >
> > Le Wed, 12 Oct 2016 23:50:11 +0000
> > John Knight <John.Knight at belkin.com> a écrit:
> >
> > > Hi,
> > >
> > > I think I may know what the issue is... it appears that the time may
> > > be changed by ntp in my failure scenario as suggested by the URLs
> > > referencing ntp in the dnsmasq.log file.  There are numerous
> > > references to difftime in dnsmasq code.  One of which acknowledges
> > > potential problem if the clock goes backwards... and is handled by
> > > comparing last_resolv >1.0 and < -1.0 to accommodate such a
> > > possibility.  However, in function poll_resolv(), the difftime()
> > > call checks for > 0.0, assuming the modification time of the file is
> > > greater than the last_change time.  If the time had changed on the
> > > router, then its possible that the modification time of the
> > > /etc/resolv.conf could be less than that of the last_change.  I
> > > think this needs to be a check for != 0.  If the time is changed
> > > negatively, then the existing code will not work properly me thinks.
> > > Its imperative that latest gets set in order for the
> > > reload_servers() code to run... and if the time is not right, then
> > > the reload_servers() won't get called.  This specific code
> > > (poll_resolv) hasn't changed, and if I am right, it is also broken
> > > in 2.76.
> > >
> > > What do you think?  I am going to make the change locally and
> > > re-test and see if I can make it fail again.  Unfortunately, it
> > > doesn't always fail, but I have reproduced it twice now, hopefully
> > > it will happen again if my fix is not right.
> > >
> > > Best Regards,
> > >
> > > John Knight
> > >
> > >
> __________________________________________________________
> ________
> > > Confidential This e-mail and any files transmitted with it are the
> > > property of Belkin International, Inc. and/or its affiliates, are
> > > confidential, and are intended solely for the use of the individual
> > > or entity to whom this e-mail is addressed. If you are not one of
> > > the named recipients or otherwise have reason to believe that you
> > > have received this e-mail in error, please notify the sender and
> > > delete this message immediately from your computer. Any other use,
> > > retention, dissemination, forwarding, printing or copying of this
> > > e-mail is strictly prohibited. Pour la version fran?aise:
> > > http://www.belkin.com/email-notice/French.html F?r die deutsche
> > > ?bersetzung: http://www.belkin.com/email-notice/German.html
> > >
> __________________________________________________________
> ________
> >
> >
> >
> > Amicalement,
> > --
> > Albert.
> >
> >
> __________________________________________________________
> ________
> > Confidential This e-mail and any files transmitted with it are the
> > property of Belkin International, Inc. and/or its affiliates, are
> > confidential, and are intended solely for the use of the individual or
> > entity to whom this e-mail is addressed. If you are not one of the
> > named recipients or otherwise have reason to believe that you have
> > received this e-mail in error, please notify the sender and delete
> > this message immediately from your computer. Any other use, retention,
> > dissemination, forwarding, printing or copying of this e-mail is
> > strictly prohibited. Pour la version française:
> > http://www.belkin.com/email-notice/French.html Für die deutsche
> > Übersetzung: http://www.belkin.com/email-notice/German.html
> >
> __________________________________________________________
> ________
> > _______________________________________________ Dnsmasq-
> discuss
> > mailing list Dnsmasq-discuss at lists.thekelleys.org.uk
> > http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
> >
> >
> >
> >
> 
> 
> 
> Amicalement,
> --
> Albert.




More information about the Dnsmasq-discuss mailing list