[Dnsmasq-discuss] [PATCH] TCP client timeout setting
Petr Menšík
pemensik at redhat.com
Thu May 18 10:30:14 UTC 2023
When analysing report [1] for non-responding queries over TCP, I have
found forwarded TCP connections have quite high timeout. If for whatever
reason the forwarder currently set as a last used forwarder is dropping
packets without reject, the TCP will timeout for about 120 seconds on my
system. That is way too much, I think any TCP clients will give up far
before that. This is just quick workaround to improve the situation, not
final fix.
The problem is if the client chooses to use TCP for whatever reason,
dnsmasq will never switch to actually working server until some UDP
request arrives to trigger re-evaluation of last_server responsiveness.
It might do so, but just inside single TCP process. It just stubbornly
forwards even 5th retry to the same server now, when another one might
be able to answer right away.
I think the proper solution would be implementation of event driven
reading of TCP sockets in the main process. I don't think using threads
is possible, because quite a lot of globals used. It should not fork
another processes without --no-daemon parameter, but just use poll() to
watch socket readiness, then read whatever is prepared already. Since
TCP DNS message says its length at the start, it can just allocate
buffer big enough for that connection and iteratively read without
blocking. Once it is read, it can parse it, process it. A bit of socket
magic would be required, but similar approach could solve also sending
with multiple calls without blocking. That would be big change however.
I think some feedback should be delivered to main dnsmasq process from
tcp processing children, just like cache is updated from
cache_end_insert(). I think it should be able to switch last_server used
due to feedback from tcp client process. I even think there should be
different last_server for UDP and different for TCP, but not untill TCP
can report issues too. But not sure what approach to choose. At first I
though about special F_flag, but all bits for cache record (struct crec)
are already used.
Alternative quick-fix might be in case the TCP request sending fails to
some server to generate UDP request with EDNS header added from tcp
child process to main process UDP socket. It would ensure UDP check is
done at the main process, which might change current used resolver for
following TCP connections too.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2160466
--
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-dns-tcp-timeout-option.patch
Type: text/x-patch
Size: 4870 bytes
Desc: not available
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20230518/23870acd/attachment.bin>
More information about the Dnsmasq-discuss
mailing list