[Dnsmasq-discuss] Weird TFTP Failure

Craig Perry craigp84 at gmail.com
Wed Jul 14 22:06:05 UTC 2021


Hello,

I'm battling a little bit with a weird PXE Boot issue - i can pxeboot
ubuntu 20.04 just fine but not 21.04. When booting 20.04 (which works just
fine) the behaviour i see is:

1. transfers vmlinuz kernel image via tftp - OK
2. transfers initrd image via tftp - OK
3. Kernel boots, mounts the initial ramdisk then that's the end of tftp
stuff, everything thereafter happens over http and it's fine

However, when booting 21.04, the behaviour i see is:

1. transfers vmlinuz kernel image via tftp - OK
2. transfers initrd image via tftp - FAILS, always near the end of the file
transfer

The 20.04's initrd is 80mb in size, 21.04's is 99mb.

I chased some logs but without much luck - just saw a failure to transfer
the file. I tried disabling dnsmasq's block size negotiation and that
setting halfed the throughput of the transfers but it still failed near the
end of the initrd transfer.

So then I decided to do a tcpdump from the server which is running dnsmasq
and I can't fully wrap my head around what's happening:

Initially, i see lots of:

 20:08:48.751726 IP server.62770 > pxe-client-machine.1173: UDP, length 1412
 20:08:48.753260 IP pxe-client-machine.1173 > server.62770: UDP, length 4
... repeated ...

But then the behaviour changes near the end of the transfer:

20:08:48.762027 IP server.62770 > pxe-cleint-machine.1173: UDP, length 1412
20:08:49.580276 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:50.083707 IP server.62770 > pxe-cleint-machine.1173: UDP, length 1412
20:08:50.405353 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:51.227949 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:52.024222 IP server.62770 > pxe-cleint-machine.1173: UDP, length 1412
20:08:52.051790 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:52.875666 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:53.699587 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:54.523490 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:55.016630 IP server.62770 > pxe-cleint-machine.1173: UDP, length 1412
20:08:55.348003 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:56.171466 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:56.995009 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:57.819022 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4
20:08:58.071977 IP server.62770 > pxe-cleint-machine.1173: UDP, length 1412

The sequence of 1412 byte transfer (this is below the MTU i have configured
on this route) immediately followed by 4 byte ack / reply breaks down and I
see duplicated 4 byte replies - which suggests unreliable network maybe?

I really doubt the unreliable network idea though, this is always exactly
reproducible and the thing that makes me totally discount this unreliable
network idea is the tcpdump is being captured from the server, so on the
server i am seeing those retried 4 byte replies - at least i'm seeing them
in tcpdump so dnsmasq should be getting them, i think.

Anyway, this culminates in the PXE firmware in the client nic flaking out
and invoking a reboot of the machine but just before that, i see this in
the tcpdump:

20:09:18.416590 IP server > pxe-cleint-machine: ICMP server udp port 62770
unreachable, length 40
20:09:23.359873 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4

20:09:23.359897 IP server > pxe-cleint-machine: ICMP server udp port 62770
unreachable, length 40
20:09:29.126861 IP pxe-cleint-machine.1173 > server.62770: UDP, length 4


The server is telling the client that the ephemeral port used by dnsmasq to
send the file is unreachable.

My current hypothesis is that there's some kind of failure in dnsmasq
related to the length of the file. The very last packet sent by dnsmasq is
still 1412 bytes long but the initrd is not a multiple of 1412 bytes so i
think the final packet should be smaller?

The server host is a freebsd 13.0 machine running dnsmasq 2.85 binary from
the freebsd ports collection.

> Dnsmasq version 2.85  Copyright (c) 2000-2021 Simon Kelley

> Compile time options: IPv6 GNU-getopt no-DBus no-UBus i18n IDN2 DHCP
DHCPv6 no-Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect
no-inotify dumpfile

Both client and server are Dell t3610's. My config is here:
https://github.com/craigjperry2/home-network/blob/main/roles/home-server/files/dnsmasq.conf

I'm not really sure of the best way to proceed debugging this, any hints,
tips or suggestions?

All the best,

Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20210714/81814b6b/attachment.htm>


More information about the Dnsmasq-discuss mailing list