[Dnsmasq-discuss] pxe-service entries in dnsmasq conf seem to fail non-proxy EFI boot

Shrenik Bhura shrenik.bhura at gmail.com
Tue Sep 28 09:54:59 UTC 2021


Hi Petr,

As per your guidance, we have enabled logging (LOG_ALL in
config/consolle.h) and recompiled the ipxe binaries. Below are the latest
observations.

Taking down the scenarios from the previous post for ease of reference -
1. Default dnsmasq config with default ltsp's pxe-service entries -
https://drive.google.com/file/d/1-BGnZw4RMAuIbJudVA2D4a1vasNeAd1j/view?usp=sharing
2. Custom pxe-service entries (just to prove that pxe-service and dhcp-boot
do seem to successfully co-exist) -
https://drive.google.com/file/d/1-CjHXxlKmYw-9aOTD7xK8m5uAdj4qyAB/view?usp=sharing
3. Without pxe-service entries -
https://drive.google.com/file/d/1-6Q_1Fg6zVVNruzQTJjxvmKRRkRnCBmh/view?usp=sharing

I'll try to summarise the understanding and prevailing ambiguities thus far
to help allot responsibility of multiple things that may be going wrong
here :

Between scenario (1) and (2), we see that ltsp.ipxe is being served in (2)
which doesn't happen in (1).
In (1), the primary issue is that EFI clients do not receive snponly.efi,
thus they do not advertise option 175 and hence are not sent the ltsp.ipxe.
Since it has not got to the iPXE stage as yet, there are no logs available
from ipxe.  All that is visible momentarily on the client side is these two
lines -

*Station IP address is 192.168.67.134*
*PXE-E21: Remote boot cancelled.*
Quoting from an explanation herein [1] for "Remote boot cancelled" -
*" This message is also displayed when a DHCP/proxyDHCP server sends a menu
that auto-selects Local Boot and when a bootserver sends a bootstrap
program that returns control to the PXE LoadFile protocol. "*

In scenario (2), PXE boot menu is displayed as defined in the pxe-service
lines, option 175 is received back from the client, ltsp.ipxe is sent but
is not "downloaded" by the client. There is nothing reported in the ipxe
logs. On the client, the last line says -
No more network devices.

But, above all, if we simply comment out all the pxe-service lines, as in
scenario (3), including the one with tag:rpi, the EFI clients boot up
perfectly. iPXE log has -
ipxe: Downloaded "ltsp.ipxe"
ipxe: Executing "ltsp.ipxe"
ipxe: Downloaded "vmlinuz"
ipxe: Downloaded "ltsp.img"
ipxe: Downloaded "initrd.img"
ipxe: Executing "vmlinuz"

The question thus arises that why does dnsmasq not ignore the pxe-service
lines which have an unmatched "tag:proxy" or "tag:rpi" when dnsmasq is
operating in non-proxy mode? Or does it ignore and yet there is a problem
outside dnsmasq? With respect to scenario (1), there could be a problem in
the UEFI implementation, with respect to (2), there could be an issue with
iPXE but what we can immediately control within dnsmasq is to ignore lines
of pxe-service with tags that have not been set.

Your thoughts?

[1]
https://techpubs.jurassic.nl/manuals/hdwr/enduser/SG750_UG/sgi_html/ch04.html

On Mon, 27 Sept 2021 at 22:56, Petr Menšík <pemensik at redhat.com> wrote:

> Hello,
>
> I made a mistake when reading the code. You are right. The part I
> mentioned is only affected on vendor-class information option 43, only in
> DHCPREQUEST or DHCPINFORM. Which is not in request in pcap you have sent.
>
> It seems to me problem is somewhere on IPXE side in decoding reply dnsmasq
> sent to it. I took a look at the second offer of both without-pxe and
> default-ltsp. It seems the only difference is in vendorclass information
> containing PXE menu. Without pxe continues to TFTP, where default is stuck.
> The answer is on its decoding side. Assignment got the same boot file
> successfully in both configurations. I am afraid it would be problem at PXE
> decoding client, which may not understand menu dnsmasq tried to send.
>
> According to option 43 decoding in wireshark, pxe suboptions look well.
> Except suboption 9 boot menu. Type unknown 0x8000 does seem weird, but
> should be just Vendor use according to IBM docs [1]. Why it did not do
> anything else should be answered by ipxe people. It should continue after 2
> seconds even without any action. Did it display at least boot menu on that
> station? Did it show anything? Are those machines with normal VGA output?
> Perhaps LOG_LEVEL in PXE [2] might reveal true reason.
>
> Cheers,
> Petr
>
> 1.
> https://www.ibm.com/docs/en/aix/7.2?topic=daemon-pxe-vendor-container-suboptions
> 2. https://ipxe.org/buildcfg/log_level
> On 9/27/21 16:04, Shrenik Bhura wrote:
>
> Hello Petr,
>
> Thanks for your guidance.
>
> It does seem that dhcp-boot is being reached even when pxe-service is
> successfully executed. Taking a hint from this discussion on UEFI and PXE (
> https://bbs.archlinux.org/viewtopic.php?id=237655), we tried this custom
> configuration -
>
> pxe-prompt="Press any key for boot menu",2
> pxe-service=X86-64_EFI,"PXELINUX (X86-64_EFI)",ltsp/snponly.efi
> pxe-service=7,"PXELINUX (EFI)",ltsp/snponly.efi
> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
> dhcp-boot=tag:!iPXE,tag:X86-64_EFI,ltsp/snponly.efi
> dhcp-boot=tag:iPXE,ltsp/ltsp.ipxe
>
> (full file attached below)
>
> Server does proceed to offering ltsp.ipxe to the client via dhcp but is
> eventually not being transferred via tftp.
>
> Have attached logs, pcap and dnsmasq configuration of three scenarios -
> 1. Default dnsmasq config with default ltsp's pxe-service entries
> 2. Custom pxe-service entries
> 3. Without pxe-service entries
>
> We have tested these with two systems - Intel NUC and Dell Optiplex 3040
> with their updated firmware and have found the same results.
>
> I hope this helps to zoom further into the problem area.
>
> Best regards,
> Shrenik
>
>
>
>
> On Mon, 27 Sept 2021 at 17:00, Petr Menšík <pemensik at redhat.com> wrote:
>
>> Hi Alkis,
>>
>> It would be helpful, if you could record pcap with those lines commented
>> out and enabled. It seems suspicious dhcp-boot option is present at the
>> same time with pxe-service. From what I undestood, pxe-service should
>> offer boot options only to PXEClient vendor string. I think it saves you
>> the need to dhcp-match=set:X86PC,option:client-arch,0
>>
>> then matched in
>> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
>> dhcp-boot=tag:iPXE,tag:X86PC,ltsp/ltsp.ipxe
>>
>> I just checked my Raspberry 3. I guess architecture of RPi in DHCP
>> request is clearly wrong. Unfortunately it reports it wrong also in
>> vendorclass ARCH:0000.
>>
>> Anyway, it might not handle tags correctly. Around src/rfc2131.c:891, it
>> searches for pxe service without using tags. It is not used to find
>> correct service, just to find correct context.
>>
>> Also it seems if any pxe-service is defined and incoming DHCP packet
>> contains PXEClient in VendorClass option, it MUST be handled by
>> pxe-service. If no correct service & context is found, reply is not
>> handled for it. It cannot fall back to normal DHCP reply in that case,
>> which can be fixed. But current situation seems to me clear. If any
>> pxe-service is present, all PXEClient packets has to be handled by it.
>> It seems to me you define tags per arch anyway, so I guess you can avoid
>> pxe-service just fine.
>>
>> I made an attempt to respond to PXE request only when correct service
>> matches. But I have no setup prepared for it, I tested just it compiles.
>> Could you try it would help?
>>
>> Cheers,
>> Petr
>>
>> On 3/19/21 10:05, Alkis Georgopoulos wrote:
>> > Hi all,
>> >
>> > I'm one of the LTSP developers; I asked Shrenik to contact the dnsmasq
>> > mailing list because I feel this might be a dnsmasq issue.
>> >
>> > Specifically, success or failure depends on whether these five lines
>> > are commented out or not:
>> >
>> >
>> #pxe-service=tag:proxy,tag:!iPXE,X86PC,"undionly.kpxe",ltsp/undionly.kpxe
>> >
>> #pxe-service=tag:proxy,tag:!iPXE,X86-64_EFI,"snponly.efi",ltsp/snponly.efi
>> >
>> > #pxe-service=tag:proxy,tag:iPXE,X86PC,"ltsp.ipxe",ltsp/ltsp.ipxe
>> > #pxe-service=tag:proxy,tag:iPXE,X86-64_EFI,"ltsp.ipxe",ltsp/ltsp.ipxe
>> > #pxe-service=tag:rpi,X86PC,"Raspberry Pi Boot   ",unused
>> >
>> > You may find the full configuration files and logs at:
>> > https://github.com/ltsp/ltsp/pull/417
>> >
>> > The reason I feel it might be a dnsmasq issue, is that these tags are
>> > NOT matched in Shrenik's use case. He's not using proxy mode and he's
>> > not booting a Raspberry Pi.
>> >
>> > So, "pxe-service" lines that are NOT matched, cause the problem,
>> > yet if they're commented out, the problem is gone...
>> >
>> > Would that be an issue with dnsmasq, or with the UEFI PXE stack?
>> >
>> > Thanks,
>> > Alkis Georgopoulos
>> >
>> > _______________________________________________
>> > Dnsmasq-discuss mailing list
>> > Dnsmasq-discuss at lists.thekelleys.org.uk
>> >
>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>> >
>> --
>> Petr Menšík
>> Software Engineer
>> Red Hat, http://www.redhat.com/
>> email: pemensik at redhat.com
>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>> _______________________________________________
>> Dnsmasq-discuss mailing list
>> Dnsmasq-discuss at lists.thekelleys.org.uk
>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>>
> --
> Petr Menšík
> Software Engineer
> Red Hat, http://www.redhat.com/
> email: pemensik at redhat.com
> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20210928/29a126da/attachment.htm>


More information about the Dnsmasq-discuss mailing list