[Dnsmasq-discuss] [PATCH] TCP client timeout setting
Geert Stappers
stappers at stappers.nl
Mon Jul 17 13:20:26 UTC 2023
On Fri, May 19, 2023 at 01:40:30PM +0200, Petr Menšík wrote:
> When analysing report [1] for non-responding queries over TCP, I have found
> forwarded TCP connections have quite high timeout. If for whatever reason
> the forwarder currently set as a last used forwarder is dropping packets
> without reject, the TCP will timeout for about 120 seconds on my system.
> That is way too much, I think any TCP clients will give up far before that.
> This is just quick workaround to improve the situation, not final fix.
>
> The problem is if the client chooses to use TCP for whatever reason, dnsmasq
> will never switch to actually working server until some UDP request arrives
> to trigger re-evaluation of last_server responsiveness. It might do so, but
> just inside single TCP process. It just stubbornly forwards even 5th retry
> to the same server now, when another one might be able to answer right away.
>
> I think the proper solution would be implementation of event driven reading
> of TCP sockets in the main process. I don't think using threads is possible,
> because quite a lot of globals used. It should not fork another processes
> without --no-daemon parameter, but just use poll() to watch socket
> readiness, then read whatever is prepared already. Since TCP DNS message
> says its length at the start, it can just allocate buffer big enough for
> that connection and iteratively read without blocking. Once it is read, it
> can parse it, process it. A bit of socket magic would be required, but
> similar approach could solve also sending with multiple calls without
> blocking. That would be big change however.
>
> I think some feedback should be delivered to main dnsmasq process from tcp
> processing children, just like cache is updated from cache_end_insert(). I
> think it should be able to switch last_server used due to feedback from tcp
> client process. I even think there should be different last_server for UDP
> and different for TCP, but not untill TCP can report issues too. But not
> sure what approach to choose. At first I though about special F_flag, but
> all bits for cache record (struct crec) are already used.
>
> Alternative quick-fix might be in case the TCP request sending fails to some
> server to generate UDP request with EDNS header added from tcp child process
> to main process UDP socket. It would ensure UDP check is done at the main
> process, which might change current used resolver for following TCP
> connections too.
>
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2160466
>
> --
> Petr Menšík
> Software Engineer, RHEL
> Red Hat, http://www.redhat.com/
> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
> From c02cfcb0a358e04636ffd2bcc595860b25b3a440 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemensik at redhat.com>
> Date: Wed, 17 May 2023 21:40:19 +0200
> Subject: [PATCH] Add --dns-tcp-timeout option
>
> Changes send timeout of outgoing TCP connections. Allows waiting just
> short time for successful connection before trying another server.
> Makes possible faster switch to working server if previous is not
> responding. Default socket timeout seems too high for DNS connections.
> ---
> man/dnsmasq.8 | 4 ++++
> src/config.h | 1 +
> src/dnsmasq.h | 1 +
> src/forward.c | 10 ++++++++--
> src/option.c | 10 +++++++++-
> 5 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/man/dnsmasq.8 b/man/dnsmasq.8
> index 30429df..94fd5b1 100644
> --- a/man/dnsmasq.8
> +++ b/man/dnsmasq.8
> @@ -860,6 +860,10 @@ where this needs to be increased is when using web-server log file
> resolvers, which can generate large numbers of concurrent queries. This
> parameter actually controls the number of concurrent queries per server group, where a server group is the set of server(s) associated with a single domain. So if a domain has it's own server via --server=/example.com/1.2.3.4 and 1.2.3.4 is not responding, but queries for *.example.com cannot go elsewhere, then other queries will not be affected. On configurations with many such server groups and tight resources, this value may need to be reduced.
> .TP
> +.B --dns-tcp-timeout=<seconds>
> +Sets send timeout for forwarded TCP connections. Can be used to reduce time of waiting
> +for successful TCP connection. Default value 0 skips the change of it.
> +.TP
Missing information about sane values.
> .B --dnssec
> Validate DNS replies and cache DNSSEC data. When forwarding DNS queries, dnsmasq requests the
> DNSSEC records needed to validate the replies. The replies are validated and the result returned as
> diff --git a/src/config.h b/src/config.h
> index 88cf72e..5fd5cdf 100644
> --- a/src/config.h
> +++ b/src/config.h
> @@ -61,6 +61,7 @@
> #define LOOP_TEST_TYPE T_TXT
> #define DEFAULT_FAST_RETRY 1000 /* ms, default delay before fast retry */
> #define STALE_CACHE_EXPIRY 86400 /* 1 day in secs, default maximum expiry time for stale cache data */
> +#define TCP_TIMEOUT 0 /* timeout of tcp outgoing connections */
>
> /* compile-time options: uncomment below to enable or do eg.
> make COPTS=-DHAVE_BROKEN_RTC
> diff --git a/src/dnsmasq.h b/src/dnsmasq.h
> index 2f95c12..4113ccb 100644
> --- a/src/dnsmasq.h
> +++ b/src/dnsmasq.h
> @@ -1256,6 +1256,7 @@ extern struct daemon {
> int tcp_pipes[MAX_PROCS];
> int pipe_to_parent;
> int numrrand;
> + int tcp_timeout;
> struct randfd *randomsocks;
> struct randfd_list *rfl_spare, *rfl_poll;
> int v6pktinfo;
> diff --git a/src/forward.c b/src/forward.c
> index ecfeebd..f323fee 100644
> --- a/src/forward.c
> +++ b/src/forward.c
> @@ -1929,8 +1929,14 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
> /* Copy connection mark of incoming query to outgoing connection. */
> if (have_mark)
> setsockopt(serv->tcpfd, SOL_SOCKET, SO_MARK, &mark, sizeof(unsigned int));
> -#endif
> -
Thanks for removing that trailing white space.
> +#endif
> +
> + if (daemon->tcp_timeout>0)
> + {
> + struct timeval tv = {daemon->tcp_timeout, 0};
> + setsockopt(serv->tcpfd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));
> + }
> +
> if ((!local_bind(serv->tcpfd, &serv->source_addr, serv->interface, 0, 1)))
> {
> close(serv->tcpfd);
> diff --git a/src/option.c b/src/option.c
> index 8322725..b94a5ff 100644
> --- a/src/option.c
> +++ b/src/option.c
> @@ -190,6 +190,7 @@ struct myoption {
> #define LOPT_FILTER_RR 381
> #define LOPT_NO_DHCP6 382
> #define LOPT_NO_DHCP4 383
> +#define LOPT_TCP_TIMEOUT 384
>
> #ifdef HAVE_GETOPT_LONG
> static const struct option opts[] =
> @@ -279,6 +280,7 @@ static const struct myoption opts[] =
> { "leasefile-ro", 0, 0, '9' },
> { "script-on-renewal", 0, 0, LOPT_SCRIPT_TIME},
> { "dns-forward-max", 1, 0, '0' },
> + { "dns-tcp-timeout", 1, 0, LOPT_TCP_TIMEOUT },
> { "clear-on-reload", 0, 0, LOPT_RELOAD },
> { "dhcp-ignore-names", 2, 0, LOPT_NO_NAMES },
> { "enable-tftp", 2, 0, LOPT_TFTP },
> @@ -3391,7 +3393,12 @@ static int one_opt(int option, char *arg, char *errstr, char *gen_err, int comma
> case '0': /* --dns-forward-max */
> if (!atoi_check(arg, &daemon->ftabsize))
> ret_err(gen_err);
> - break;
Thanks again
> + break;
> +
> + case LOPT_TCP_TIMEOUT: /* --dns-tcp-timeout */
> + if (!atoi_check(arg, &daemon->tcp_timeout))
> + ret_err(gen_err);
> + break;
>
> case 'q': /* --log-queries */
> set_option_bool(OPT_LOG);
....
> --
> 2.40.1
>
Groeten
Geert Stappers
--
Silence is hard to parse
More information about the Dnsmasq-discuss
mailing list