[Dnsmasq-discuss] [PATCH] Fix potential untracked tcp_request sub-processes
Tijs Van Buggenhout
tijs.van.buggenhout at axsguard.com
Fri Apr 9 10:04:14 UTC 2021
Hi all,
We are using dnsmasq as part of OpenWrt for a managed VPN solution. As soon as
the VPN is up, all DNS requests are forwarded to an upstream DNS server over
the VPN.
In case of VPN transmission problems there is a delay until it is finally
decided that the tunnel is effectively down, during which DNS requests are
still accepted, but never answered.
It happens in those circumstances clients switch to TCP protocol rather than
UDP, doing so for all IP families configured for their device. The same
queries now enter both on dnsmasq TCP IPv4 and IPv6 socket. Queries are
accepted, and tcp_request sub-processes spawned, which are tracked using
tcp_pids[MAX_PROCS].
However an overcommit may happen on tcp_pids in the sense that there is an
implicit relationship (1-to-1) between the TCP listen sockets selected for
POLLIN, and the tcp_request processes spawned and accounted for in tcp_pids.
Meaning that a free spot in tcp_pids can only be guaranteed if only one (new)
TCP connection is accepted per poll cycle. If more than one is allowed, e.g.
IPv4 and IPv6, and tcp_pids has only one spot free, both connections are
accepted and result in tcp_request sub-processes, for which the last one is
not accounted for in tcp_pids.
In our situation, when the VPN tunnel went down eventually, dnsmasq is
restarted, all pids accounted for in tcp_pids are signaled the ALRM has
finished, and the main process waits for them to stop. However those
unaccounted for continued to exist for minutes after the main process was
terminated, and hence a failure to (re)start dnsmasq due to the bind failure
(address already in use).
Two problems thus arise:
* untracked tcp_request sub-processes may occur
* tcp_request sub-processes do not close inherited listen sockets
For the first problem I could think of two solutions: ensure that only one
tcp_request is accepted per poll cycle (downside is delay and a preference for
IPv4 over IPv6), or introduce a new tcp6_pids[MAX_PROC] to track IPv6
connections (in the latest code that also means an additional tcp6_pipes).
Closing listening sockets in the child process seems like the right thing to
do, but due to the many may be cumbersome to implement. The problem is less
apparent if all outstanding connections are waited for before termination of
the main process. Still, what if the main process receives a SIGKILL rather
than a SIGTERM.
Any thoughts/impressions? Below is the (simplest) patch addressing the one
tcp_request per poll cycle.
Signed-off-by: Tijs Van Buggenhout <tijs.van.buggenhout at axsguard.com>
--
diff --git a/src/dnsmasq.c b/src/dnsmasq.c
index 256d2bc..52dda2f 100644
--- a/src/dnsmasq.c
+++ b/src/dnsmasq.c
@@ -1753,7 +1753,7 @@ static void check_dns_listeners(time_t now)
struct serverfd *serverfdp;
struct listener *listener;
struct randfd_list *rfl;
- int i;
+ int i, tcp_requests = 0;
int pipefd[2];
for (serverfdp = daemon->sfds; serverfdp; serverfdp = serverfdp->next)
@@ -1797,6 +1797,10 @@ static void check_dns_listeners(time_t now)
tftp_request(listener, now);
#endif
+ /* accept one tcp request at a time over all families */
+ if (listener->tcpfd != -1 && tcp_requests > 0)
+ continue;
+
if (listener->tcpfd != -1 && poll_check(listener->tcpfd, POLLIN))
{
int confd, client_ok = 1;
@@ -1914,6 +1918,8 @@ static void check_dns_listeners(time_t now)
daemon->tcp_pipes[i] = pipefd[0];
break;
}
+
+ tcp_requests += 1;
}
close(confd);
More information about the Dnsmasq-discuss
mailing list