[Dnsmasq-discuss] [PATCH] TCP client timeout setting

Petr Menšík pemensik at redhat.com
Thu May 25 19:32:11 UTC 2023


This problem is best tested by an example, taken from [2] but a bit 
modified.

Let's create hepothetical network issue with one forwarder, which worked 
fine a while ago.

$ sudo iptables -I INPUT -i lo -d 127.0.0.255 -j DROP

Now start dnsmasq and send tcp query to it

$ dnsmasq -d --log-queries --port 2053 --no-resolv --conf-file=/dev/null 
--server=127.0.0.255 --server=127.0.0.1
$ dig +tcp @localhost -p 2053 test

;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to 127.0.0.1#2053: timed out

; <<>> DiG 9.18.15 <<>> +tcp @localhost -p 2053 test
; (2 servers found)
;; global options: +cmd
;; no servers could be reached

Because dig waits much shorter time than dnsmasq does, it never receives 
any reply. Even when the other server is responding just fine. That is 
main advantage of having local cache running, isn't it? It should 
improve things!

Now lets be persistent and keep trying:

$ time for TRY in {1..6}; do dig +tcp @localhost -p 2053 test; done

After few timeouts, it will finally notice something is wrong and tries 
also the second server, which will answer fast. However this works only 
with dnsmasq -d, which is not used in production. If I replace it with 
dnsmasq -k, it will not answer at all!

$ dnsmasq -k --log-queries --port 2053 --no-resolv --conf-file=/dev/null 
--server=127.0.0.255 --server=127.0.0.1
$ time for TRY in {1..8}; do dig +tcp @localhost -p 2053 test; done

...
;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to ::1#2053: timed out
;; communications error to 127.0.0.1#2053: timed out

; <<>> DiG 9.18.15 <<>> +tcp @localhost -p 2053 test
; (2 servers found)
;; global options: +cmd
;; no servers could be reached


real    5m20,602s
user    0m0,094s
sys    0m0,115s

This is because with -k it spawns tcp workers, which start always with 
whatever last_server prepared by last UDP. And until any UDP query 
arrives to save the day, it will stubbornly try non-responding server 
first. Even when the other one answers in miliseconds. Notice it have 
been trying 5 minutes without success.

I think this has to be fixed somehow. This is corner case, because TCP 
queries are usually caused by UDP queries with TC bit set. But there 
exist real-world examples, where TCP only query makes sense. But dnsmasq 
does not handle them well. Summarized this at [3].

My proposal would be sending UDP query + EDNS0 header in case sending 
query failed to the main process, which can then trigger forwarders 
responsiveness and change the last_server to a working one. So 
subsequent attempts do not fall into the blackhole again and again. 
EDNS0 header would be there to increase chance for a positive reply from 
upstream, which can be cached.

Would you have other ideas, how to solve this problem?

Cheers,
Petr

[2] https://bugzilla.redhat.com/show_bug.cgi?id=2160466#c6
[3] https://bugzilla.redhat.com/show_bug.cgi?id=2160466#c13

On 19. 05. 23 13:40, Petr Menšík wrote:
> When analysing report [1] for non-responding queries over TCP, I have 
> found forwarded TCP connections have quite high timeout. If for 
> whatever reason the forwarder currently set as a last used forwarder 
> is dropping packets without reject, the TCP will timeout for about 120 
> seconds on my system. That is way too much, I think any TCP clients 
> will give up far before that. This is just quick workaround to improve 
> the situation, not final fix.
>
> ...
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2160466
>
-- 
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0x4931CA5B6C9FC5CB.asc
Type: application/pgp-keys
Size: 4560 bytes
Desc: OpenPGP public key
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20230525/ac81c71b/attachment.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20230525/ac81c71b/attachment.sig>


More information about the Dnsmasq-discuss mailing list