[Dnsmasq-discuss] dnsmasq 2.91test9 + TCP + stale cache

Fri Jan 31 23:20:27 UTC 2025

On 1/31/25 16:19, Dominik Derigs wrote:
> Hey Simon,
> 
> we have found another (small) thing. The requirements for reintroducing 
> are:
> 
> 1. Using --use-stale-cache
> 2. A query is received via TCP
> 3. The cached record is stale
> 
> Querying the stale record causes the query to be "refreshed". However, 
> when, at the same time, the client disconnects (and the TCP fork exits 
> accordingly), the received reply will never be received and find its way 
> in the mother process's cache.
> 
> Could we postpone the shutdown of TCP forks in case a refreshment query 
> is still ongoing?
> 

That's not what happens, or at least that's not what is supposed to happen.

The control flow in tcp_request() when stale data is found in the cache 
goes as follows.

1) read query from TCP client connection.
2) Lookup in cache and get stale answer.
3) Return stale answer to requestor over TCP connection.
4) Close client TCP connection server-side. This forces the client to 
open another connection and create a new process if it has more queries. 
The reason for this is that the existing process now
5) sends the query upstream and blocks awaiting the answer.
6) receives the answer and caches it in the local process This also 
serialises the answer into the pipe to the parent process.
7) return from tcp_request() and the process exits.

At this point the data has either been read from the pipe by the parent 
process and inserted into its cache, or it is still in the pipe buffer 
and will shortly be read. This is why the process-management code in 
dnsmasq.c doesn't free a process slot until _both_ the process has gone 
and the pipe has returned EOF and been closed.

Pipes don't disappear and lose data until both ends have been closed. If 
the write end is closed but there is queued data, then the read end can 
still read the queued data.

The control flow in tcp_request() is hard to follow, since it repeats 
the loop both to read another query and the get an answer to it, and to 
get an answer to a query which has already been answered and send that 
back to mother.

The pseudocode which describes what a TCP child process does looks like 
this.

do-stale = FALSE

while (1)
{
    have_answer = FALSE

    if (!do_stale)
      {
        read_query()

        if (no_query_client_connection_closed())
           exit();

        if (query_in_cache())
           have_answer = TRUE
      }

    if (!have_answer)
      {
         send_query_upstream()
	get_answer_from_upstream()
	insert_answer_into_cache_and_pipe_to_parent()
      }

    if (do_stale)
       exit();

    return_answer()

    if (answer_was_stale())
       {
          do_stale = TRUE;
          close_client_connection()
       }
}

The client disconnecting doesn't cause the process to exit before the 
new data has been pushed into the pipe. If you can demonstrate that it 
does, that's a bug.

Cheers,

Simon.