[Dnsmasq-discuss] dnsmasq 2.91test9 + TCP + stale cache
Simon Kelley
simon at thekelleys.org.uk
Sat Feb 1 10:12:01 UTC 2025
On 1/31/25 23:20, Simon Kelley wrote:
>
>
> On 1/31/25 16:19, Dominik Derigs wrote:
>> Hey Simon,
>>
>> we have found another (small) thing. The requirements for
>> reintroducing are:
>>
>> 1. Using --use-stale-cache
>> 2. A query is received via TCP
>> 3. The cached record is stale
>>
>> Querying the stale record causes the query to be "refreshed". However,
>> when, at the same time, the client disconnects (and the TCP fork exits
>> accordingly), the received reply will never be received and find its
>> way in the mother process's cache.
>>
>> Could we postpone the shutdown of TCP forks in case a refreshment
>> query is still ongoing?
>>
>
> That's not what happens, or at least that's not what is supposed to happen.
>
> The control flow in tcp_request() when stale data is found in the cache
> goes as follows.
>
> 1) read query from TCP client connection.
> 2) Lookup in cache and get stale answer.
> 3) Return stale answer to requestor over TCP connection.
> 4) Close client TCP connection server-side. This forces the client to
> open another connection and create a new process if it has more queries.
> The reason for this is that the existing process now
> 5) sends the query upstream and blocks awaiting the answer.
> 6) receives the answer and caches it in the local process This also
> serialises the answer into the pipe to the parent process.
> 7) return from tcp_request() and the process exits.
>
> At this point the data has either been read from the pipe by the parent
> process and inserted into its cache, or it is still in the pipe buffer
> and will shortly be read. This is why the process-management code in
> dnsmasq.c doesn't free a process slot until _both_ the process has gone
> and the pipe has returned EOF and been closed.
>
> Pipes don't disappear and lose data until both ends have been closed. If
> the write end is closed but there is queued data, then the read end can
> still read the queued data.
>
> The control flow in tcp_request() is hard to follow, since it repeats
> the loop both to read another query and the get an answer to it, and to
> get an answer to a query which has already been answered and send that
> back to mother.
>
> The pseudocode which describes what a TCP child process does looks like
> this.
>
> do-stale = FALSE
>
> while (1)
> {
> have_answer = FALSE
>
> if (!do_stale)
> {
> read_query()
>
> if (no_query_client_connection_closed())
> exit();
>
> if (query_in_cache())
> have_answer = TRUE
> }
>
> if (!have_answer)
> {
> send_query_upstream()
> get_answer_from_upstream()
> insert_answer_into_cache_and_pipe_to_parent()
> }
>
> if (do_stale)
> exit();
>
> return_answer()
>
> if (answer_was_stale())
> {
> do_stale = TRUE;
> close_client_connection()
> }
> }
>
> The client disconnecting doesn't cause the process to exit before the
> new data has been pushed into the pipe. If you can demonstrate that it
> does, that's a bug.
>
>
>
My sleeping brain had a thought. Because the TCP-handler process doesn't
go through the poll() loop, it has to call check_log_writer() explicitly
to make logging happen - the call is there in tcp_request(). It looks
possible that in the stale-cache path, there is no call to
check_log_writer() after the new data gets cached (and logged) and the
process terminating. That would cause a failure to log the repies even
though they do end up in the parent processes cache, and may be what you
are seeing.
I shall look at this later, other things to do now.
Cheers,
Simon.
More information about the Dnsmasq-discuss
mailing list