[Dnsmasq-discuss] [PATCH] TCP client timeout setting

Thu May 18 10:30:14 UTC 2023

When analysing report [1] for non-responding queries over TCP, I have 
found forwarded TCP connections have quite high timeout. If for whatever 
reason the forwarder currently set as a last used forwarder is dropping 
packets without reject, the TCP will timeout for about 120 seconds on my 
system. That is way too much, I think any TCP clients will give up far 
before that. This is just quick workaround to improve the situation, not 
final fix.

The problem is if the client chooses to use TCP for whatever reason, 
dnsmasq will never switch to actually working server until some UDP 
request arrives to trigger re-evaluation of last_server responsiveness. 
It might do so, but just inside single TCP process. It just stubbornly 
forwards even 5th retry to the same server now, when another one might 
be able to answer right away.

I think the proper solution would be implementation of event driven 
reading of TCP sockets in the main process. I don't think using threads 
is possible, because quite a lot of globals used. It should not fork 
another processes without --no-daemon parameter, but just use poll() to 
watch socket readiness, then read whatever is prepared already. Since 
TCP DNS message says its length at the start, it can just allocate 
buffer big enough for that connection and iteratively read without 
blocking. Once it is read, it can parse it, process it. A bit of socket 
magic would be required, but similar approach could solve also sending 
with multiple calls without blocking. That would be big change however.

I think some feedback should be delivered to main dnsmasq process from 
tcp processing children, just like cache is updated from 
cache_end_insert(). I think it should be able to switch last_server used 
due to feedback from tcp client process. I even think there should be 
different last_server for UDP and different for TCP, but not untill TCP 
can report issues too. But not sure what approach to choose. At first I 
though about special F_flag, but all bits for cache record (struct crec) 
are already used.

Alternative quick-fix might be in case the TCP request sending fails to 
some server to generate UDP request with EDNS header added from tcp 
child process to main process UDP socket. It would ensure UDP check is 
done at the main process, which might change current used resolver for 
following TCP connections too.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2160466

-- 
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-dns-tcp-timeout-option.patch
Type: text/x-patch
Size: 4870 bytes
Desc: not available
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20230518/23870acd/attachment.bin>