[Dnsmasq-discuss] pxe-service entries in dnsmasq conf seem to fail non-proxy EFI boot

Petr Menšík pemensik at redhat.com
Wed Sep 29 14:59:46 UTC 2021

It is somehow hard to guess described results for each configuration (1.
2. 3.). It is unclear to me, what you saw for each variant printed by
the computer.

1. seems to have wrong pcap file or it does not use configuration
attached in linked archive. It seems it offers menu items from 2.
archive with custom pxe-services.

Option 43 Suboption: (9) PXE boot menu
    Length: 41
    boot menu:
        Type: Unknown (32768)
        Length: 21
        Description: PXELINUX (X86-64_EFI)
        Type: Unknown (32769)
        Length: 14
        Description: PXELINUX (EFI)

Above is not present in config file presented for it, but in 2. Are you
sure you have killed dnsmasq and started it again?

I think it might be difference between pxe-service served file chosen
via menuboot. I have noticed there are two way to specify file to boot
in DHCP for IPv4. One is in fixed header and first try chosen from menu
is in that. pxe-service options makes it to request direct query to DHCP
server, marked proxyDHCP in wireshark. This proxy ACK is followed by TFTP.

I used filter in wireshark: "dhcp or (!tftp.destination_file && tftp)"

However following DHCP offers boot file path ONLY in option 67 value.
Fixed header boot file is all zeroed. It seems to me this is the part
the snponly.efi firmware does not understand. It does not try to use
path in option, but may insist only on file. Since option #52 overload
is not in packet, I guess dnsmasq should have used mess->file for path
and not option 67. But rules of rfc2131.c:2476 are simple. If client
have requested option 67, it should handle it as option 67. I guess it
is bug in snponly.efi. Either it should not include option 67 between
requested options or it should actually handle the option. Dnsmasq would
offer boot path in both cases.

Interesting enough, dnsmasq is inconsistent with itself. It behaves a
bit different way in PXE proxy mode, where file header part is always
used. In normal mode unless --dhcp-no-override is used, option is used
if requested.

Can you please try if dhcp-no-override option would fix your issues? I
think it should behave the same way in both situations.

I attached patch, which would set boot file on pxe-service the same way
as dhcp-boot. It may require dhcp-no-override where it did not before.
Could you please try it?

On 9/28/21 11:54, Shrenik Bhura wrote:
> Hi Petr,
> As per your guidance, we have enabled logging (LOG_ALL in
> config/consolle.h) and recompiled the ipxe binaries. Below are the
> latest observations.
> Taking down the scenarios from the previous post for ease of reference -
> 1. Default dnsmasq config with default ltsp's pxe-service entries -
> https://drive.google.com/file/d/1-BGnZw4RMAuIbJudVA2D4a1vasNeAd1j/view?usp=sharing
> 2. Custom pxe-service entries (just to prove that pxe-service and
> dhcp-boot do seem to successfully co-exist) -
> https://drive.google.com/file/d/1-CjHXxlKmYw-9aOTD7xK8m5uAdj4qyAB/view?usp=sharing
> 3. Without pxe-service entries -
> https://drive.google.com/file/d/1-6Q_1Fg6zVVNruzQTJjxvmKRRkRnCBmh/view?usp=sharing
> I'll try to summarise the understanding and prevailing ambiguities
> thus far to help allot responsibility of multiple things that may be
> going wrong here :
> Between scenario (1) and (2), we see that ltsp.ipxe is being served in
> (2) which doesn't happen in (1).
> In (1), the primary issue is that EFI clients do not receive
> snponly.efi, thus they do not advertise option 175 and hence are not
> sent the ltsp.ipxe. Since it has not got to the iPXE stage as yet,
> there are no logs available from ipxe.  All that is visible
> momentarily on the client side is these two lines -
> *Station IP address is
> *
> *PXE-E21: Remote boot cancelled.*
> Quoting from an explanation herein [1] for "Remote boot cancelled" -
> /" This message is also displayed when a DHCP/proxyDHCP server sends a
> menu that auto-selects *Local Boot* and when a bootserver sends a
> bootstrap program that returns control to the PXE LoadFile protocol. "/*
> *
> *
> *
> In scenario (2), PXE boot menu is displayed as defined in the
> pxe-service lines, option 175 is received back from the client,
> ltsp.ipxe is sent but is not "downloaded" by the client. There is
> nothing reported in the ipxe logs. On the client, the last line says -
> No more network devices.*
> *
> *
> *
> But, above all, if we simply comment out all the pxe-service lines, as
> in scenario (3), including the one with tag:rpi, the EFI clients boot
> up perfectly. iPXE log has -
> ipxe: Downloaded "ltsp.ipxe"
> ipxe: Executing "ltsp.ipxe"
> ipxe: Downloaded "vmlinuz"
> ipxe: Downloaded "ltsp.img"
> ipxe: Downloaded "initrd.img"
> ipxe: Executing "vmlinuz"
> The question thus arises that why does dnsmasq not ignore the
> pxe-service lines which have an unmatched "tag:proxy" or "tag:rpi"
> when dnsmasq is operating in non-proxy mode? Or does it ignore and yet
> there is a problem outside dnsmasq? With respect to scenario (1),
> there could be a problem in the UEFI implementation, with respect to
> (2), there could be an issue with iPXE but what we can immediately
> control within dnsmasq is to ignore lines of pxe-service with tags
> that have not been set.
> Your thoughts?
> [1]
> https://techpubs.jurassic.nl/manuals/hdwr/enduser/SG750_UG/sgi_html/ch04.html
> On Mon, 27 Sept 2021 at 22:56, Petr Menšík <pemensik at redhat.com> wrote:
>     Hello,
>     I made a mistake when reading the code. You are right. The part I
>     mentioned is only affected on vendor-class information option 43,
>     only in DHCPREQUEST or DHCPINFORM. Which is not in request in pcap
>     you have sent.
>     It seems to me problem is somewhere on IPXE side in decoding reply
>     dnsmasq sent to it. I took a look at the second offer of both
>     without-pxe and default-ltsp. It seems the only difference is in
>     vendorclass information containing PXE menu. Without pxe continues
>     to TFTP, where default is stuck. The answer is on its decoding
>     side. Assignment got the same boot file successfully in both
>     configurations. I am afraid it would be problem at PXE decoding
>     client, which may not understand menu dnsmasq tried to send.
>     According to option 43 decoding in wireshark, pxe suboptions look
>     well. Except suboption 9 boot menu. Type unknown 0x8000 does seem
>     weird, but should be just Vendor use according to IBM docs [1].
>     Why it did not do anything else should be answered by ipxe people.
>     It should continue after 2 seconds even without any action. Did it
>     display at least boot menu on that station? Did it show anything?
>     Are those machines with normal VGA output? Perhaps LOG_LEVEL in
>     PXE [2] might reveal true reason.
>     Cheers,
>     Petr
>     1.
>     https://www.ibm.com/docs/en/aix/7.2?topic=daemon-pxe-vendor-container-suboptions
>     2. https://ipxe.org/buildcfg/log_level
>     On 9/27/21 16:04, Shrenik Bhura wrote:
>>     Hello Petr,
>>     Thanks for your guidance.
>>     It does seem that dhcp-boot is being reached even when
>>     pxe-service is successfully executed. Taking a hint from this
>>     discussion on UEFI and PXE
>>     (https://bbs.archlinux.org/viewtopic.php?id=237655), we tried
>>     this custom configuration -
>>     pxe-prompt="Press any key for boot menu",2
>>     pxe-service=X86-64_EFI,"PXELINUX (X86-64_EFI)",ltsp/snponly.efi
>>     pxe-service=7,"PXELINUX (EFI)",ltsp/snponly.efi
>>     dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
>>     dhcp-boot=tag:!iPXE,tag:X86-64_EFI,ltsp/snponly.efi
>>     dhcp-boot=tag:iPXE,ltsp/ltsp.ipxe
>>     (full file attached below)
>>     Server does proceed to offering ltsp.ipxe to the client via dhcp
>>     but is eventually not being transferred via tftp.
>>     Have attached logs, pcap and dnsmasq configuration of three
>>     scenarios -
>>     1. Default dnsmasq config with default ltsp's pxe-service entries
>>     2. Custom pxe-service entries
>>     3. Without pxe-service entries
>>     We have tested these with two systems - Intel NUC and Dell
>>     Optiplex 3040 with their updated firmware and have found the same
>>     results.
>>     I hope this helps to zoom further into the problem area.
>>     Best regards,
>>     Shrenik
>>     On Mon, 27 Sept 2021 at 17:00, Petr Menšík <pemensik at redhat.com>
>>     wrote:
>>         Hi Alkis,
>>         It would be helpful, if you could record pcap with those
>>         lines commented
>>         out and enabled. It seems suspicious dhcp-boot option is
>>         present at the
>>         same time with pxe-service. From what I undestood,
>>         pxe-service should
>>         offer boot options only to PXEClient vendor string. I think
>>         it saves you
>>         the need to dhcp-match=set:X86PC,option:client-arch,0
>>         then matched in
>>         dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
>>         dhcp-boot=tag:iPXE,tag:X86PC,ltsp/ltsp.ipxe
>>         I just checked my Raspberry 3. I guess architecture of RPi in
>>         DHCP
>>         request is clearly wrong. Unfortunately it reports it wrong
>>         also in
>>         vendorclass ARCH:0000.
>>         Anyway, it might not handle tags correctly. Around
>>         src/rfc2131.c:891, it
>>         searches for pxe service without using tags. It is not used
>>         to find
>>         correct service, just to find correct context.
>>         Also it seems if any pxe-service is defined and incoming DHCP
>>         packet
>>         contains PXEClient in VendorClass option, it MUST be handled by
>>         pxe-service. If no correct service & context is found, reply
>>         is not
>>         handled for it. It cannot fall back to normal DHCP reply in
>>         that case,
>>         which can be fixed. But current situation seems to me clear.
>>         If any
>>         pxe-service is present, all PXEClient packets has to be
>>         handled by it.
>>         It seems to me you define tags per arch anyway, so I guess
>>         you can avoid
>>         pxe-service just fine.
>>         I made an attempt to respond to PXE request only when correct
>>         service
>>         matches. But I have no setup prepared for it, I tested just
>>         it compiles.
>>         Could you try it would help?
>>         Cheers,
>>         Petr
>>         On 3/19/21 10:05, Alkis Georgopoulos wrote:
>>         > Hi all,
>>         >
>>         > I'm one of the LTSP developers; I asked Shrenik to contact
>>         the dnsmasq
>>         > mailing list because I feel this might be a dnsmasq issue.
>>         >
>>         > Specifically, success or failure depends on whether these
>>         five lines
>>         > are commented out or not:
>>         >
>>         >
>>         #pxe-service=tag:proxy,tag:!iPXE,X86PC,"undionly.kpxe",ltsp/undionly.kpxe
>>         >
>>         #pxe-service=tag:proxy,tag:!iPXE,X86-64_EFI,"snponly.efi",ltsp/snponly.efi
>>         >
>>         >
>>         #pxe-service=tag:proxy,tag:iPXE,X86PC,"ltsp.ipxe",ltsp/ltsp.ipxe
>>         >
>>         #pxe-service=tag:proxy,tag:iPXE,X86-64_EFI,"ltsp.ipxe",ltsp/ltsp.ipxe
>>         > #pxe-service=tag:rpi,X86PC,"Raspberry Pi Boot   ",unused
>>         >
>>         > You may find the full configuration files and logs at:
>>         > https://github.com/ltsp/ltsp/pull/417
>>         >
>>         > The reason I feel it might be a dnsmasq issue, is that
>>         these tags are
>>         > NOT matched in Shrenik's use case. He's not using proxy
>>         mode and he's
>>         > not booting a Raspberry Pi.
>>         >
>>         > So, "pxe-service" lines that are NOT matched, cause the
>>         problem,
>>         > yet if they're commented out, the problem is gone...
>>         >
>>         > Would that be an issue with dnsmasq, or with the UEFI PXE
>>         stack?
>>         >
>>         > Thanks,
>>         > Alkis Georgopoulos
>>         >
>>         > _______________________________________________
>>         > Dnsmasq-discuss mailing list
>>         > Dnsmasq-discuss at lists.thekelleys.org.uk
>>         >
>>         https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>>         >
>>         -- 
>>         Petr Menšík
>>         Software Engineer
>>         Red Hat, http://www.redhat.com/
>>         email: pemensik at redhat.com
>>         PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>>         _______________________________________________
>>         Dnsmasq-discuss mailing list
>>         Dnsmasq-discuss at lists.thekelleys.org.uk
>>         https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>     -- 
>     Petr Menšík
>     Software Engineer
>     Red Hat, http://www.redhat.com/
>     email: pemensik at redhat.com
>     PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
Petr Menšík
Software Engineer
Red Hat, http://www.redhat.com/
email: pemensik at redhat.com
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20210929/c04452eb/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Send-boot-file-the-same-way-on-pxe-service-and-dhcp-.patch
Type: text/x-patch
Size: 3771 bytes
Desc: not available
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20210929/c04452eb/attachment-0001.bin>

More information about the Dnsmasq-discuss mailing list