[Dnsmasq-discuss] TFTP Boot update "for those who find this problem in the future"

Simon Kelley simon at thekelleys.org.uk
Tue Aug 25 15:07:32 BST 2009

richardvoigt at gmail.com wrote:
> On Tue, Aug 25, 2009 at 4:31 AM, Simon Kelley<simon at thekelleys.org.uk> wrote:
>> richardvoigt at gmail.com wrote:
>>> I can't think of a single circumstance where a manufacturer-provided
>>> boot PROM would have more appropriate network-specific settings than
>>> the TFTP server configuration.
>>> Maybe tftp-no-blocksize should be set by default (with a
>>> tftp-honor-blocksize to negate it).
>>> But I don't use BOOTP remote booting, so Simon probably has good
>>> reasons for doing things the way they are.
>> Setting tftp-no-blocksize forces 512-byte blocks and makes the already-slow
>> TFTP transfer three times slower. Since most netbooting happens over a local
>> net which is a physical ethernet with well-known MTU, it makes sense for the
>> client to request a blocksize suitable for that media.
> Is that 512 adjustable?  b/c the local dnsmasq admin can surely make a
> better choice than the PROM developer. 

Sort of. If the client doesn't invoke the blocksize extension, then it 
has to be 512. If the client says "I want blocksize x" then the server 
can reply "you can have blocksize y" where y<x

> Plus I think most tcp/ip
> stacks automatically determine path MTU, don't know if dnsmasq could
> retrieve the value estimated for some other local host on the same
> interface as a reasonable default in the absence of configuration.
> There's probably no portable way to do that though.

Path-MTU discovery is turned off for the UDP socket used for TFTP, 
because the presence of the don't fraqment bit confuses some PXE ROMs. 
Sadly it looks like receiving fragmented packets confuses other PXE ROMS!
> Also, a quick look at the protocol indicates that "only one packet may
> be in-flight at a time" but that data packets and acknowledgements all
> carry sequence numbers, I'm not sure what exactly about the format
> requires stop-and-wait.
It's specified in the RFC: the T in TFTP stands for "trivial".

>> It's not clear to me why the MTU on Philippe's network is smaller, but I
>>  think a small MTU is a fairly rare occurrence. Even when it does happen, it
>> shouldn't be a show stopper: that takes badly broken client firmware that
>> has clearly never had any code-paths other than the most common ones tested.
> Passing through a switch which adds VLAN marking often causes
> fragmentation of maximally sized payloads.  Wireless hops could change
> MSS as well.

A possible fix for some (but not all) situations is to check the MTU on 
the interface handling the TFTP traffic and scale back blocksize 
requests to match that.

> But maybe the best solution would just be to mention tftp-no-blocksize
> in the error message as a possible fix.

Easier said than done: the sequence we saw with the NVIDIA PXE ROM was

PXE asks for data
   PXE gets data (fragmented) and ignores it
   server times out and retries
} repeat
PXE times out and send completely nonsense ACK packet to the wrong port
dnsmasq generates "unsupported request" because it doesn't understand 
the packet.

The extent of broken-ness in netboot firmware is astonishing.


More information about the Dnsmasq-discuss mailing list