<div id="compose" contenteditable="true" style="padding-left: 16px; padding-right: 16px; padding-bottom: 8px;"><div>Hi, all</div><div><br></div><div>Just FYI, common way to really control resolv.conf updates w/o races and even with sym&hardlinks is to kick dnsmasq with signal after resolv.conf update is fully finished in any way.</div><div>This also triggers hosts/ether/etc reread and corresponding reconf.</div><div><br><div class="acompli_signature">Best Regards, Vladislav Grishenko</div><br></div></div>
<div class="gmail_quote">_____________________________<br>From: Albert ARIBAUD <<a dir="ltr" href="mailto:albert.aribaud@free.fr" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="0">albert.aribaud@free.fr</a>><br>Sent: вторник, октября 25, 2016 2:51 ДП<br>Subject: Re: [Dnsmasq-discuss] Dnsmasq not resolving addresses for an hour<br>To: John Knight <<a dir="ltr" href="mailto:john.knight@belkin.com" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="1">john.knight@belkin.com</a>><br>Cc: <<a dir="ltr" href="mailto:dnsmasq-discuss@lists.thekelleys.org.uk" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="2">dnsmasq-discuss@lists.thekelleys.org.uk</a>><br><br><br>Hi John,<br><br>Yes, you can submit patches to the list.<br><br>However, 2.55 is quite old with respect to the current release of<br>dnsmasq, which is 2.76 IIRC.<br><br>Amicalement,<br>Albert.<br><br>Le Mon, 24 Oct 2016 17:57:03 +0000<br>John Knight <<a dir="ltr" href="mailto:John.Knight@belkin.com" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="3">John.Knight@belkin.com</a>> a écrit:<br><br>> Hi Albert,<br>> <br>> I have finished making my changes to dnsmasq 2.55 and I have a patch<br>> file. However, I am not sure how to submit it... do I send it to the<br>> discussion list?<br>> <br>> Thanks,<br>> <br>> John Knight<br>> <br>> <br>> -----Original Message-----<br>> From: John Knight<br>> Sent: Wednesday, October 19, 2016 12:57 PM<br>> To: 'Albert ARIBAUD'<br>> Cc: <a dir="ltr" href="mailto:dnsmasq-discuss@lists.thekelleys.org.uk" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="4">dnsmasq-discuss@lists.thekelleys.org.uk</a><br>> Subject: RE: [Dnsmasq-discuss] Dnsmasq not resolving addresses for an<br>> hour<br>> <br>> Hi Albert,<br>> <br>> My comments inline.<br>> <br>> John<br>> <br>> > Hi All, <br>> <br>> > The main while(1) loop uses select() to determine if it has work to<br>> > do. In most cases, it appears to use timeout of 0, which I believe<br>> > means just wait indefinitely for work on the file descriptors.<br>> > Other times, it appears that the timeout is set to a quarter second<br>> > when doing a tftp transfer or polling the dbus.<br>> ><br>> > Now what concerns me is that when a "retry later" condition occurs,<br>> > we may get stuck on the select() for a long period of time. Alas,<br>> > I do not know how frequent one might expect to see work arrive on<br>> > the file descriptors that select is watching, so I don't really<br>> > know if this is a long time or not. It seems though that in this<br>> > failure scenario, the poll_resolv() function does NOT get called<br>> > very often at all. <br>> <br>> Albert: Actually, if dnsmasq does not receive any request from<br>> clients, it does not need to poll servers, so I would ask: does the<br>> select() include descriptors for client requests (either UDP<br>> datagrams received, or TCP connections opened)? If so, I think it<br>> will exit just when necessary and no tiemout is needed; otherwise,<br>> you are right that a timeout is required.<br>> <br>> Albert: Also, it may be improbable that select() does not return for<br>> a whole hour; but then, is every return from select() followed by a<br>> resolv file poll, or can select() return and then be entered again<br>> without polling the resolv files? I am thinking, for instance, about<br>> cached answers which do not need servers if their TTL is long enough.<br>> <br>> John: I have made a simple change that provides a one second timeout<br>> for select. I have found that dnsmasq is much more responsive now to<br>> changes made to /etc/resolv.conf. With code that calls poll_resolv,<br>> it rate limits the calls to once every two seconds, which I believe<br>> is fine and responsive enough.<br>> <br>> John: Given I am testing this in a lab situation and just me on the<br>> console and one idle PC connected to the router, there is little use<br>> of DNS. In my experience since the initial failure, I believe I did<br>> see poll_resolv polled in one case at an interval of about 20<br>> minutes. I don’t think this poll interval should be driven by how<br>> active the users are and how much they use dns; just my personal<br>> feeling about that.<br>> <br>> John: It should be noted that if I had been doing a tftp transfer,<br>> the code would set the select timeout for 250ms. I am not sure why<br>> the tftp transfer being active would warrant the much quicker<br>> timeout? Anyhow, what I did was an else statement... if tftp<br>> transfer, set timeout to 250ms else set timeout to 1 second.<br>> <br>> John: I don't know dnsmasq well enough to answer your other questions<br>> about select and what all of the file descriptors are associated<br>> with. Perhaps someone more knowledgeable can chime in. My change<br>> was made in response to the situation where a "retry later" situation<br>> was pending, and not getting poll_resolv was not getting polled again<br>> in a reasonable time period to do the retry.<br>> <br>> John: I believe on our router, dhcp entries have an hour TTL and we<br>> do use dnsmasq for dhcp. On an idle PC, would it have any reason to<br>> initiate a dnsmasq query? Occasionally if the browser is up and<br>> running, I do see the browser query the address of its update server,<br>> but I haven't generally speaking had my browser running on the PC<br>> while doing my dnsmasq testing. So it seems to me that the two<br>> possible sources to cause dnsmasq activity (ie. Browser and dhcp) may<br>> be idle for at least an hour... so this seems like a possibility that<br>> poll_resolv() may not be getting called in this scenario for a long<br>> time.<br>> <br>> > My gut feeling is that there always needs to be a timeout on the<br>> > select call as the poll_resolv() should be called fairly frequently.<br>> > The code that exists today where poll_resolv() normally is called<br>> > from this loop suggests a poll rate of about once a second. This<br>> > definitely does not happen today. By just adding a my_syslog()<br>> > message to the top of poll_resolv(), it is very clear from the<br>> > logfile that it is not called often, and way to infrequently to<br>> > resolve the "retry later" condition in a timely manner. <br>> <br>> Albert: Can you compare when poll_resolv() is called wrt when the<br>> select() is exited -- and for what reason?<br>> <br>> John: What I did to see relative times between select and calls to<br>> poll_resolv was to add calls to my_syslog() before the select and at<br>> the top of poll_resolv(). The timestamp in the dnsmasq logfile was<br>> used to see how much time between calls. I don't know what the<br>> reason for exiting select is... indeed, for what I was doing, I<br>> really didn't care... I just needed to know when poll_resolv() was<br>> getting called and how often.<br>> <br>> > Going forward, as the next thing for me to try, I am going to add a<br>> > timeout for the select... perhaps a modest once a second or two. <br>> <br>> Albert: I would personally investigate further on a gut feeling<br>> without changing the code behavior, because my changes might have<br>> unwanted effects which can actually hide the root cause I am looking<br>> for -- but to each his/her own.<br>> <br>> John: My boss is on my case to getting this resolved asap. Based on<br>> my trying of the select timeout, this appears to have at least solved<br>> part of the problem... poll_resolv() not getting called back in a<br>> reasonable timeframe after a "retry later" issue. I need to keep<br>> moving forward; not sure I have the time for indepth investigation.<br>> I do know other code does set select timeout, so I do know this code<br>> path is not unprecedented, so risk should be low.<br>> <br>> > But I would like to know what you all of think of this... does this<br>> > make sense to do? Is there ever a case where we might not get any<br>> > work on the files select is monitoring for nearly an hour? I am<br>> > trying to make sense of this issue. <br>> <br>> Albert: Not entirely sure what you mean with "Is there ever a case<br>> where we might not get any work on the files select is monitoring for<br>> nearly an hour"; I will assume you mean "Is there a normal case where<br>> dnsmasq would not poll for changes in resolv files for an hour". If<br>> so, then I would say it depends on how much traffic dnsmasq receives<br>> and how much of it can be answered from cache.<br>> <br>> John: Your interpretation is correct. Thanks for the info and your<br>> help Albert. I am glad I have someone listening. When I am done, I<br>> will forward the diffs for the changes I have made to dnsmasq for<br>> your review.<br>> <br>> > Thanks,<br>> ><br>> > John Knight <br>> <br>> Amicalement,<br>> --<br>> Albert.<br><br>_______________________________________________<br>Dnsmasq-discuss mailing list<br><a dir="ltr" href="mailto:Dnsmasq-discuss@lists.thekelleys.org.uk" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="5">Dnsmasq-discuss@lists.thekelleys.org.uk</a><br><a dir="ltr" href="http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss" x-apple-data-detectors="true" x-apple-data-detectors-type="link" x-apple-data-detectors-result="6">http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss</a><br><br><br></div>