[Dnsmasq-discuss] Leftover helper process after main process exit on FreeBSD

Simon Kelley simon at thekelleys.org.uk
Sun Jun 22 22:32:45 UTC 2025



On 6/22/25 10:29, Simon Kelley wrote:
> 
> 
> On 22/06/2025 07:04, Roman Bogorodskiy wrote:
>>    Simon Kelley wrote:
>>
>>> On 6/13/25 13:30, Roman Bogorodskiy wrote:
>>>> Hi,
>>>>
>>>> I've noticed an issue on FreeBSD which I can reproduce this way:
>>>>
>>>> # ./src/dnsmasq --interface=bridge0 --except-interface=lo0 --dhcp- 
>>>> range=192.168.127.2,192.168.127.254,255.255.255.0 --dhcp-script=/ 
>>>> usr/bin/true
>>>> $  ps aux|grep dnsm
>>>> nobody     12741    0,0  0,0    14500    3128  -  I    
>>>> 13:43             0:00,00 ./src/dnsmasq --interface=bridge0 -- 
>>>> except-interface=lo0 --dhcp- 
>>>> range=192.168.127.2,192.168.127.254,255.255.255.0 --dhcp-script=/ 
>>>> usr/bin/true
>>>> root       12742    0,0  0,0    14500    3008  -  I    
>>>> 13:43             0:00,00 ./src/dnsmasq --interface=bridge0 -- 
>>>> except-interface=lo0 --dhcp- 
>>>> range=192.168.127.2,192.168.127.254,255.255.255.0 --dhcp-script=/ 
>>>> usr/bin/true
>>>> novel      12763    0,0  0,0    14192    2588  1  S+   
>>>> 13:44             0:00,00 grep dnsm
>>>> $
>>>> # kill 12741
>>>> $ ps aux|grep dns
>>>> root       12742    0,0  0,0    14500    3008  -  I    
>>>> 13:43             0:00,00 ./src/dnsmasq --interface=bridge0 -- 
>>>> except-interface=lo0 --dhcp- 
>>>> range=192.168.127.2,192.168.127.254,255.255.255.0 --dhcp-script=/ 
>>>> usr/bin/true
>>>> novel      12785    0,0  0,0    14192    2560  1  S+   
>>>> 13:45             0:00,00 grep dns
>>>> $
>>>>
>>>>    There is a leftover process. When I attach to it using gdb I see:
>>>>
>>>> (gdb) attach 12742
>>>> Attaching to program: /usr/home/novel/code/dnsmasq/src/dnsmasq, 
>>>> process 12742
>>>> Reading symbols from /lib/libc.so.7...
>>>> Reading symbols from /usr/lib/debug//lib/libc.so.7.debug...
>>>> Reading symbols from /lib/libsys.so.7...
>>>> Reading symbols from /usr/lib/debug//lib/libsys.so.7.debug...
>>>> Reading symbols from /libexec/ld-elf.so.1...
>>>> Reading symbols from /usr/lib/debug//libexec/ld-elf.so.1.debug...
>>>> _read () at _read.S:4
>>>> 4       PSEUDO(read)
>>>> (gdb) bt
>>>> #0  _read () at _read.S:4
>>>> #1  0x00000000002208a1 in read_write (fd=19, packet=0x8204deea8 
>>>> "\260\236\212\"\b", size=112, rw=1) at util.c:783
>>>> #2  0x000000000024e6ca in create_helper (event_fd=16, err_fd=18, 
>>>> uid=0, gid=0, max_fd=1877346) at helper.c:199
>>>> #3  0x000000000023b1f1 in main (argc=5, argv=0x8204df170) at 
>>>> dnsmasq.c:743
>>>> (gdb)
>>>>
>>>> So it looks like it's stuck reading from pipefd[0]:
>>>>
>>>> (gdb) fr 2
>>>> #2  0x000000000024e6ca in create_helper (event_fd=16, err_fd=18, 
>>>> uid=0, gid=0, max_fd=1877346) at helper.c:199
>>>> 199           if (!read_write(pipefd[0], (unsigned char *)&data, 
>>>> sizeof(data), RW_READ))
>>>> (gdb)
>>>>
>>>> It also looks like both fd's are open in the helper side:
>>>>
>>>> (gdb) p pipefd
>>>> $12 = {19, 20}
>>>> (gdb)
>>>>
>>>> (gdb) call fcntl(20, 1)
>>>> $13 = 0
>>>> (gdb)
>>>>
>>>> Now if I close(20):
>>>>
>>>> (gdb) call close(20)
>>>> $14 = 0
>>>> (gdb) c
>>>> Continuing.
>>>> [Inferior 1 (process 12742) exited normally]
>>>> (gdb)
>>>>
>>>>
>>>> So the following change fixed this for me:
>>>>
>>>> --- a/src/helper.c
>>>> +++ b/src/helper.c
>>>> @@ -96,6 +96,8 @@ int create_helper(int event_fd, int err_fd, uid_t 
>>>> uid, gid_t gid, long max_fd)
>>>>          close(pipefd[0]); /* close reader side */
>>>>          return pipefd[1];
>>>>        }
>>>> +  else
>>>> +      close(pipefd[1]);
>>>>
>>>>      /* ignore SIGTERM and SIGINT, so that we can clean up when the 
>>>> main process gets hit
>>>>         and SIGALRM so that we can use sleep() */
>>>>
>>>>
>>>> FWIW, that's happening on FreeBSD 15.0-CURRENT amd64 and latest master
>>>> of dnsmasq.
>>>>
>>>> However, I'm not sure that these reproduction steps are 100% 
>>>> sufficient.
>>>> I wasn't able to reproduce that on another FreeBSD 14.2-RELEASE amd64
>>>> system with Dnsmasq version 2.91.
>>>
>>>
>>> I'm not sure what the bug is, but I'm very suspicious of commit
>>> 8a5fe8ce6bb6c2bd81f237a0f4a2583722ffbd1c, even though it's in the 2.91
>>> codebase.
>>>
>>> The write side of the pipe in the helper process is supposed to be 
>>> closed by
>>> the call
>>>
>>> close_fds(max_fd, pipefd[0], event_fd, err_fd);
>>>
>>> at line 134 of src/helper.c
>>>
>>> That call should close() ALL open fds except STDIN, STDOUT and 
>>> STDERR, and
>>> the three fds passed in as arguments. This preserves the reader-side, as
>>> pipefd[0] is one of the arguments, but the write side should be 
>>> closed. I
>>> checked in Linux (which doesn't exhibit the bug) and that's exactly what
>>> does happen.
>>>
>>> If you look at the code for close_fds() there are two code paths. A 
>>> dumb one
>>> which calls close() for every possible fd between zero and the system 
>>> max
>>> except for the six which are to be spared. Then there's a smart path 
>>> which
>>> reads a directory in /proc to find out which fds are actually open, 
>>> and only
>>> closes those.
>>>
>>> The smart path saves a lot of work on servers which are configured to
>>> support enormous numbers of open files per process.
>>>
>>> The smart path used to only exist on Linux, but was introduced on BSD 
>>> during
>>> the 2.91 development at the end of 2024. My suspicion is that that is 
>>> the
>>> cause of the regression.
>>>
>>> The smart path is same for Linux and BSD except that the directory 
>>> full of
>>> links to open files is at /proc/self/fd on Linux and /dev/fd on *BSD If
>>> these directories don't exist then the code falls back to the dumb code
>>> path.
>>>
>>> So, can you try and determine why close_fds() is not closing the 
>>> write-side
>>> of the pipe in the helper process(), since that should already be 
>>> doing what
>>> your patch does?
>>
>> Hi Simon,
>>
>> Thanks for the hint, the problem is indeed related to the fdescfs.
>>
>> I noticed that dnsmasq tries to look up processes in /dev/fd, that is,
>> following the smart path. But apparently, it always gets fd 0, 1, 2, so
>> it wasn't really closing anything.
>>
>> Then, if I just do "ls /dev/fd", it also returns only those 3 fds. Then
>> I've checked if I have the fdescfs mounted at all and it turned out that
>> I don't.
>>
>> Once I mount it, things start working as expected. Once I unmount it,
>> things get back to the previous behaviour. Also, in the fdescfs(5)
>> manpage I see:
>>
>>         Note:  /dev/fd/0,  /dev/fd/1 and    /dev/fd/2 files    are 
>> created by default
>>         when devfs alone    is mounted.  fdescfs creates entries for 
>> all file  de-
>>         scriptors opened    by the process.
>>
>> Looks like that describes my configuration, as I have devfs mounted,
>> fdescfs not mounted, and see only fd 0, 1, 2.
>>
>> So it looks a bit tricky as existence of /dev/fd doesn't necessarily
>> mean fdescfs is mounted, so it could be both working properly or not.
>>
> 
> Thanks for diagnosing this. Sigh. A classic case of "no good deed goes 
> unpunished".
> 
> The best solution I can come up with is to check that /dev/fd is a 
> mountpoint, by checking that st_dev from stat /dev/fd is different from 
> st_dev of stat /dev. If fdescfs isn't mounted, we already have a fallback.
> 

I fired up my FreeBSD VM and implemented this. It seems to do the job.

https://thekelleys.org.uk/gitweb/?p=dnsmasq.git;a=commit;h=15841f187d2b208a6113d4e2d479d3af4275bb1c

Simon.




More information about the Dnsmasq-discuss mailing list