[Dnsmasq-discuss] [PATCH] TCP client timeout setting

Geert Stappers stappers at stappers.nl
Mon Jul 17 13:20:26 UTC 2023


On Fri, May 19, 2023 at 01:40:30PM +0200, Petr Menšík wrote:
> When analysing report [1] for non-responding queries over TCP, I have found
> forwarded TCP connections have quite high timeout. If for whatever reason
> the forwarder currently set as a last used forwarder is dropping packets
> without reject, the TCP will timeout for about 120 seconds on my system.
> That is way too much, I think any TCP clients will give up far before that.
> This is just quick workaround to improve the situation, not final fix.
> 
> The problem is if the client chooses to use TCP for whatever reason, dnsmasq
> will never switch to actually working server until some UDP request arrives
> to trigger re-evaluation of last_server responsiveness. It might do so, but
> just inside single TCP process. It just stubbornly forwards even 5th retry
> to the same server now, when another one might be able to answer right away.
> 
> I think the proper solution would be implementation of event driven reading
> of TCP sockets in the main process. I don't think using threads is possible,
> because quite a lot of globals used. It should not fork another processes
> without --no-daemon parameter, but just use poll() to watch socket
> readiness, then read whatever is prepared already. Since TCP DNS message
> says its length at the start, it can just allocate buffer big enough for
> that connection and iteratively read without blocking. Once it is read, it
> can parse it, process it. A bit of socket magic would be required, but
> similar approach could solve also sending with multiple calls without
> blocking. That would be big change however.
> 
> I think some feedback should be delivered to main dnsmasq process from tcp
> processing children, just like cache is updated from cache_end_insert(). I
> think it should be able to switch last_server used due to feedback from tcp
> client process. I even think there should be different last_server for UDP
> and different for TCP, but not untill TCP can report issues too. But not
> sure what approach to choose. At first I though about special F_flag, but
> all bits for cache record (struct crec) are already used.
> 
> Alternative quick-fix might be in case the TCP request sending fails to some
> server to generate UDP request with EDNS header added from tcp child process
> to main process UDP socket. It would ensure UDP check is done at the main
> process, which might change current used resolver for following TCP
> connections too.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2160466
> 
> -- 
> Petr Menšík
> Software Engineer, RHEL
> Red Hat, http://www.redhat.com/
> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

> From c02cfcb0a358e04636ffd2bcc595860b25b3a440 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Petr=20Men=C5=A1=C3=ADk?= <pemensik at redhat.com>
> Date: Wed, 17 May 2023 21:40:19 +0200
> Subject: [PATCH] Add --dns-tcp-timeout option
> 
> Changes send timeout of outgoing TCP connections. Allows waiting just
> short time for successful connection before trying another server.
> Makes possible faster switch to working server if previous is not
> responding. Default socket timeout seems too high for DNS connections.
> ---
>  man/dnsmasq.8 |  4 ++++
>  src/config.h  |  1 +
>  src/dnsmasq.h |  1 +
>  src/forward.c | 10 ++++++++--
>  src/option.c  | 10 +++++++++-
>  5 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/man/dnsmasq.8 b/man/dnsmasq.8
> index 30429df..94fd5b1 100644
> --- a/man/dnsmasq.8
> +++ b/man/dnsmasq.8
> @@ -860,6 +860,10 @@ where this needs to be increased is when using web-server log file
>  resolvers, which can generate large numbers of concurrent queries. This
>  parameter actually controls the number of concurrent queries per server group, where a server group is the set of server(s) associated with a single domain. So if a domain has it's own server via --server=/example.com/1.2.3.4 and 1.2.3.4 is not responding, but queries for *.example.com cannot go elsewhere, then other queries will not be affected. On configurations with many such server groups and tight resources, this value may need to be reduced.
>  .TP
> +.B --dns-tcp-timeout=<seconds>
> +Sets send timeout for forwarded TCP connections. Can be used to reduce time of waiting
> +for successful TCP connection. Default value 0 skips the change of it.
> +.TP

Missing information about sane values.


>  .B --dnssec
>  Validate DNS replies and cache DNSSEC data. When forwarding DNS queries, dnsmasq requests the 
>  DNSSEC records needed to validate the replies. The replies are validated and the result returned as 
> diff --git a/src/config.h b/src/config.h
> index 88cf72e..5fd5cdf 100644
> --- a/src/config.h
> +++ b/src/config.h
> @@ -61,6 +61,7 @@
>  #define LOOP_TEST_TYPE T_TXT
>  #define DEFAULT_FAST_RETRY 1000 /* ms, default delay before fast retry */
>  #define STALE_CACHE_EXPIRY 86400 /* 1 day in secs, default maximum expiry time for stale cache data */
> +#define TCP_TIMEOUT 0 /* timeout of tcp outgoing connections */
>   
>  /* compile-time options: uncomment below to enable or do eg.
>     make COPTS=-DHAVE_BROKEN_RTC
> diff --git a/src/dnsmasq.h b/src/dnsmasq.h
> index 2f95c12..4113ccb 100644
> --- a/src/dnsmasq.h
> +++ b/src/dnsmasq.h
> @@ -1256,6 +1256,7 @@ extern struct daemon {
>    int tcp_pipes[MAX_PROCS];
>    int pipe_to_parent;
>    int numrrand;
> +  int tcp_timeout;
>    struct randfd *randomsocks;
>    struct randfd_list *rfl_spare, *rfl_poll;
>    int v6pktinfo; 
> diff --git a/src/forward.c b/src/forward.c
> index ecfeebd..f323fee 100644
> --- a/src/forward.c
> +++ b/src/forward.c
> @@ -1929,8 +1929,14 @@ static ssize_t tcp_talk(int first, int last, int start, unsigned char *packet,
>  	  /* Copy connection mark of incoming query to outgoing connection. */
>  	  if (have_mark)
>  	    setsockopt(serv->tcpfd, SOL_SOCKET, SO_MARK, &mark, sizeof(unsigned int));
> -#endif			  
> -	  

Thanks for removing that trailing white space.

> +#endif
> +
> +	  if (daemon->tcp_timeout>0)
> +	    {
> +	      struct timeval tv = {daemon->tcp_timeout, 0};
> +	      setsockopt(serv->tcpfd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv));
> +	    }
> +
>  	  if ((!local_bind(serv->tcpfd,  &serv->source_addr, serv->interface, 0, 1)))
>  	    {
>  	      close(serv->tcpfd);
> diff --git a/src/option.c b/src/option.c
> index 8322725..b94a5ff 100644
> --- a/src/option.c
> +++ b/src/option.c
> @@ -190,6 +190,7 @@ struct myoption {
>  #define LOPT_FILTER_RR     381
>  #define LOPT_NO_DHCP6      382
>  #define LOPT_NO_DHCP4      383
> +#define LOPT_TCP_TIMEOUT   384
>  
>  #ifdef HAVE_GETOPT_LONG
>  static const struct option opts[] =  
> @@ -279,6 +280,7 @@ static const struct myoption opts[] =
>      { "leasefile-ro", 0, 0, '9' },
>      { "script-on-renewal", 0, 0, LOPT_SCRIPT_TIME},
>      { "dns-forward-max", 1, 0, '0' },
> +    { "dns-tcp-timeout", 1, 0, LOPT_TCP_TIMEOUT },
>      { "clear-on-reload", 0, 0, LOPT_RELOAD },
>      { "dhcp-ignore-names", 2, 0, LOPT_NO_NAMES },
>      { "enable-tftp", 2, 0, LOPT_TFTP },
> @@ -3391,7 +3393,12 @@ static int one_opt(int option, char *arg, char *errstr, char *gen_err, int comma
>      case '0':  /* --dns-forward-max */
>        if (!atoi_check(arg, &daemon->ftabsize))
>  	ret_err(gen_err);
> -      break;  

Thanks again


> +      break;
> +
> +    case LOPT_TCP_TIMEOUT: /* --dns-tcp-timeout */
> +      if (!atoi_check(arg, &daemon->tcp_timeout))
> +	 ret_err(gen_err);
> +      break;
>      
>      case 'q': /* --log-queries */
>        set_option_bool(OPT_LOG);
     ....
> -- 
> 2.40.1
> 

Groeten
Geert Stappers
-- 
Silence is hard to parse



More information about the Dnsmasq-discuss mailing list