[Dnsmasq-discuss] No DHCPOffer back but DHCPDiscover is being received by UML machine

Simon Kelley simon at thekelleys.org.uk
Fri Apr 24 14:29:18 BST 2020


Having looked at the docs for UML, I doubt that this is a UML problem,
it looks like a pure kernel (in this case, the one running under UML)
problem.

As such a regression test on those three kernels would therefore be useful.

Googling for combinations of recvmsg MSG_PEEK regression UDP MSG_TRUNK
shows a few possibles over the last few years, but no obvious smoking gun.

Assuming we've diagnosed the kernel misbehaviour correctly, the code in
dnsmasq could be changed to work-around the problem at the expense of a
small probability packet drop, which is not a problem in this case.

I'll look at doing that.

Simon.


On 23/04/2020 21:05, Josh H wrote:
> Hi there,
> 
> I'm not sure of a way of testing it with a real network device, but I'm
> happy to attempt to build a older UML kernel and test it from there. As
> I said in my original email, the last fully known working build was way
> back in kernel 3.2 and a lot has changed since then, so it could very
> well be a kernel issue and due to the edge use case, no one has ever
> really come across it. Is there a kernel version you'd like me to try
> out? Debian has a standard usermodelinux package which contains prebuilt
> UML images with kernel versions of 4.9, 4.19 or 5.5 if they'd be handy?
> https://tracker.debian.org/pkg/user-mode-linux.
> 
> Thanks for the support,
> Josh
> 
> On Thu, 23 Apr 2020 at 20:30, Simon Kelley <simon at thekelleys.org.uk
> <mailto:simon at thekelleys.org.uk>> wrote:
> 
>     Ok, so Josh ran the strace and sent me the results as requested.
> 
>     The interesting bit us here.
> 
>     recvmsg(4, {msg_name={sa_family=AF_INET, sin_port=htons(68),
>     sin_addr=inet_addr("0.0.0.0")}, msg_namelen=16,
>     msg_iov=[{iov_base="\1\1\6\0\310\261\311+\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\366\226}H"...,
>     iov_len=548}], msg_iovlen=1, msg_control=[{cmsg_len=24,
>     cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO,
>     cmsg_data={ipi_ifindex=if_nametoindex("eth0"),
>     ipi_spec_dst=inet_addr("192.168.1.1"),
>     ipi_addr=inet_addr("255.255.255.255")}}], msg_controllen=24,
>     msg_flags=0}, MSG_PEEK|MSG_TRUNC) = 300
>     recvmsg(4, {msg_namelen=16}, 0)         = -1 EAGAIN (Resource
>     temporarily unavailable)
> 
> 
> 
>     The first call to recvmsg has the MSG_PEEK and MSG_TRUNC flags set.
>     MSG_TRUNC causes the result to be the actual length of the received
>     packet, even if it's longer than  supplied buffer (548) and MSG_PEEK is
>     defined as:
> 
> 
>      MSG_PEEK
>            This  flag  causes the receive operation to return data from the
>            beginning of the receive queue without removing that  data  from
>            the queue.  Thus, a subsequent receive call will return the same
>            data.
> 
>     So this allows the buffer to be expanded if necessary and then recvmsg
>     gets called again when the buffer is big enough, to actually get the
>     data and remove it from the queue. In this case the packet is 300 bytes
>     long and the buffer is already 548 bytes, so no expansion is needed, we
>     just do the call again, without the MSG_PEEK|MSG_TRUNC flags. That's the
>     second call to recvmsg, which returns EAGAIN - the socket is
>     no-blocking, and this return says there's no packet queued. It looks
>     like the kernel is ignoring the MSG_PEEK flag, and dequeueing the data
>     on the first call.
> 
>     I think this is a kernel bug.
> 
>     Josh, does this work with an older kernel or with a real network device,
>     rather than the UML virtual device? It would be good to work out where
>     the regression happened.
> 
> 
>     Simon.
> 
>     On 16/04/2020 15:40, Josh H wrote:
>     >
>     >     First, answer a simple question the answer to which I may have
>     missed.
>     >     Is dnsmasq logging the receipt of DHCPDISCOVER messages? Can
>     we see the
>     >     whole log showing that?
>     >
>     >
>     > Based on the config I provided at the initial message, I have the log
>     > file writing to /var/log/dnsmasq.log. This is the whole content of
>     that
>     > file:
>     >
>     > root at dns:~# cat /var/log/dnsmasq.log
>     > Apr 16 15:36:50 dnsmasq[1695]: started, version 2.80 DNS disabled
>     > Apr 16 15:36:50 dnsmasq[1695]: compile time options: IPv6 GNU-getopt
>     > DBus i18n IDN DHCP DHCPv6 no-Lua TFTP conntrack ipset auth DNSSEC
>     > loop-detect inotify dumpfile
>     > Apr 16 15:36:50 dnsmasq-dhcp[1695]: DHCP, IP range 192.168.1.3 --
>     > 192.168.1.8, lease time 12h
>     >
>     > No mention of the DHCPDiscover being acknowledged.
>     >
>     >     The next stage is to run dnsmasq under strace (check back here
>     if you
>     >     need instructions on that) and see what system calls it's making.
>     >
>     >
>     > What command would I need to run for this? And what service is best to
>     > upload the strace result, pastebin?
>     >
>     > Thanks,
>     > Josh 
>     >
>     > On Thu, 16 Apr 2020 at 12:49, Simon Kelley
>     <simon at thekelleys.org.uk <mailto:simon at thekelleys.org.uk>
>     > <mailto:simon at thekelleys.org.uk <mailto:simon at thekelleys.org.uk>>>
>     wrote:
>     >
>     >
>     >
>     >     On 15/04/2020 19:27, Josh H wrote:
>     >
>     >     > It's difficult for me to share the config outright as I'm
>     using a
>     >     > modified version of netkit that I've updated to a much newer
>     kernel
>     >     > - http://netkit-ng.github.io/. The netkit version that is
>     available on
>     >     > that link is the one that worked with dnsmasq just fine, and
>     that
>     >     > version was 2.62 and kernel 3.2. However I've updated it and am
>     >     running
>     >     > 2.80 and kernel 5.6. 
>     >     >
>     >     > Anything else I can provide you with that might help? It's a
>     very
>     >     unique
>     >     > setup so I appreciate  it's probably not the easiest thing
>     to try and
>     >     > debug. 
>     >     >
>     >
>     >     First, answer a simple question the answer to which I may have
>     missed.
>     >     Is dnsmasq logging the receipt of DHCPDISCOVER messages? Can
>     we see the
>     >     whole log showing that?
>     >
>     >     The next stage is to run dnsmasq under strace (check back here
>     if you
>     >     need instructions on that) and see what system calls it's making.
>     >
>     >
>     >     Simon.
>     >
>     >
>     >     _______________________________________________
>     >     Dnsmasq-discuss mailing list
>     >     Dnsmasq-discuss at lists.thekelleys.org.uk
>     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>
>     >     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk
>     <mailto:Dnsmasq-discuss at lists.thekelleys.org.uk>>
>     >     http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss
>     >
> 



More information about the Dnsmasq-discuss mailing list