[Dnsmasq-discuss] pxe-service entries in dnsmasq conf seem to fail non-proxy EFI boot

Shrenik Bhura shrenik.bhura at gmail.com
Thu Sep 30 10:47:13 UTC 2021


 > 1. seems to have wrong pcap file or it does not use configuration
attached in linked archive. It seems it offers menu items from 2. archive
with custom pxe-services.

Apologies, there was definitely some mistake.

We have applied the patch and tried with and without dhcp-no-override but
it still fails to boot. Herein are the pcap and the logs for this case.
https://drive.google.com/file/d/1-GvsId99FC8f8B2I0YaTVuje5385u4LC/view?usp=sharing

Additionally, also included is the qemu pcap wherein it does boot
successfully.

On Wed, 29 Sept 2021 at 20:29, Petr Menšík <pemensik at redhat.com> wrote:

> It is somehow hard to guess described results for each configuration (1.
> 2. 3.). It is unclear to me, what you saw for each variant printed by the
> computer.
>
> 1. seems to have wrong pcap file or it does not use configuration attached
> in linked archive. It seems it offers menu items from 2. archive with
> custom pxe-services.
>
> Option 43 Suboption: (9) PXE boot menu
>     Length: 41
>     boot menu:
> 8000155058454c494e555820285838362d36345f4546492980010e5058454c494e555820…
>         Type: Unknown (32768)
>         Length: 21
>         Description: PXELINUX (X86-64_EFI)
>         Type: Unknown (32769)
>         Length: 14
>         Description: PXELINUX (EFI)
>
> Above is not present in config file presented for it, but in 2. Are you
> sure you have killed dnsmasq and started it again?
>
> I think it might be difference between pxe-service served file chosen via
> menuboot. I have noticed there are two way to specify file to boot in DHCP
> for IPv4. One is in fixed header and first try chosen from menu is in that.
> pxe-service options makes it to request direct query to DHCP server, marked
> proxyDHCP in wireshark. This proxy ACK is followed by TFTP.
>
> I used filter in wireshark: "dhcp or (!tftp.destination_file && tftp)"
>
> However following DHCP offers boot file path ONLY in option 67 value.
> Fixed header boot file is all zeroed. It seems to me this is the part the
> snponly.efi firmware does not understand. It does not try to use path in
> option, but may insist only on file. Since option #52 overload is not in
> packet, I guess dnsmasq should have used mess->file for path and not option
> 67. But rules of rfc2131.c:2476 are simple. If client have requested option
> 67, it should handle it as option 67. I guess it is bug in snponly.efi.
> Either it should not include option 67 between requested options or it
> should actually handle the option. Dnsmasq would offer boot path in both
> cases.
>
> Interesting enough, dnsmasq is inconsistent with itself. It behaves a bit
> different way in PXE proxy mode, where file header part is always used. In
> normal mode unless --dhcp-no-override is used, option is used if requested.
>
> Can you please try if dhcp-no-override option would fix your issues? I
> think it should behave the same way in both situations.
>
> I attached patch, which would set boot file on pxe-service the same way as
> dhcp-boot. It may require dhcp-no-override where it did not before. Could
> you please try it?
> On 9/28/21 11:54, Shrenik Bhura wrote:
>
> Hi Petr,
>
> As per your guidance, we have enabled logging (LOG_ALL in
> config/consolle.h) and recompiled the ipxe binaries. Below are the latest
> observations.
>
> Taking down the scenarios from the previous post for ease of reference -
> 1. Default dnsmasq config with default ltsp's pxe-service entries -
> https://drive.google.com/file/d/1-BGnZw4RMAuIbJudVA2D4a1vasNeAd1j/view?usp=sharing
> 2. Custom pxe-service entries (just to prove that pxe-service and
> dhcp-boot do seem to successfully co-exist) -
> https://drive.google.com/file/d/1-CjHXxlKmYw-9aOTD7xK8m5uAdj4qyAB/view?usp=sharing
> 3. Without pxe-service entries -
> https://drive.google.com/file/d/1-6Q_1Fg6zVVNruzQTJjxvmKRRkRnCBmh/view?usp=sharing
>
> I'll try to summarise the understanding and prevailing ambiguities thus
> far to help allot responsibility of multiple things that may be going wrong
> here :
>
> Between scenario (1) and (2), we see that ltsp.ipxe is being served in (2)
> which doesn't happen in (1).
> In (1), the primary issue is that EFI clients do not receive snponly.efi,
> thus they do not advertise option 175 and hence are not sent the ltsp.ipxe.
> Since it has not got to the iPXE stage as yet, there are no logs available
> from ipxe.  All that is visible momentarily on the client side is these two
> lines -
>
> *Station IP address is 192.168.67.134 *
> *PXE-E21: Remote boot cancelled.*
> Quoting from an explanation herein [1] for "Remote boot cancelled" -
> *" This message is also displayed when a DHCP/proxyDHCP server sends a
> menu that auto-selects Local Boot and when a bootserver sends a bootstrap
> program that returns control to the PXE LoadFile protocol. "*
>
> In scenario (2), PXE boot menu is displayed as defined in the pxe-service
> lines, option 175 is received back from the client, ltsp.ipxe is sent but
> is not "downloaded" by the client. There is nothing reported in the ipxe
> logs. On the client, the last line says -
> No more network devices.
>
> But, above all, if we simply comment out all the pxe-service lines, as in
> scenario (3), including the one with tag:rpi, the EFI clients boot up
> perfectly. iPXE log has -
> ipxe: Downloaded "ltsp.ipxe"
> ipxe: Executing "ltsp.ipxe"
> ipxe: Downloaded "vmlinuz"
> ipxe: Downloaded "ltsp.img"
> ipxe: Downloaded "initrd.img"
> ipxe: Executing "vmlinuz"
>
> The question thus arises that why does dnsmasq not ignore the pxe-service
> lines which have an unmatched "tag:proxy" or "tag:rpi" when dnsmasq is
> operating in non-proxy mode? Or does it ignore and yet there is a problem
> outside dnsmasq? With respect to scenario (1), there could be a problem in
> the UEFI implementation, with respect to (2), there could be an issue with
> iPXE but what we can immediately control within dnsmasq is to ignore lines
> of pxe-service with tags that have not been set.
>
> Your thoughts?
>
> [1]
> https://techpubs.jurassic.nl/manuals/hdwr/enduser/SG750_UG/sgi_html/ch04.html
>
> On Mon, 27 Sept 2021 at 22:56, Petr Menšík <pemensik at redhat.com> wrote:
>
>> Hello,
>>
>> I made a mistake when reading the code. You are right. The part I
>> mentioned is only affected on vendor-class information option 43, only in
>> DHCPREQUEST or DHCPINFORM. Which is not in request in pcap you have sent.
>>
>> It seems to me problem is somewhere on IPXE side in decoding reply
>> dnsmasq sent to it. I took a look at the second offer of both without-pxe
>> and default-ltsp. It seems the only difference is in vendorclass
>> information containing PXE menu. Without pxe continues to TFTP, where
>> default is stuck. The answer is on its decoding side. Assignment got the
>> same boot file successfully in both configurations. I am afraid it would be
>> problem at PXE decoding client, which may not understand menu dnsmasq tried
>> to send.
>>
>> According to option 43 decoding in wireshark, pxe suboptions look well.
>> Except suboption 9 boot menu. Type unknown 0x8000 does seem weird, but
>> should be just Vendor use according to IBM docs [1]. Why it did not do
>> anything else should be answered by ipxe people. It should continue after 2
>> seconds even without any action. Did it display at least boot menu on that
>> station? Did it show anything? Are those machines with normal VGA output?
>> Perhaps LOG_LEVEL in PXE [2] might reveal true reason.
>>
>> Cheers,
>> Petr
>>
>> 1.
>> https://www.ibm.com/docs/en/aix/7.2?topic=daemon-pxe-vendor-container-suboptions
>> 2. https://ipxe.org/buildcfg/log_level
>> On 9/27/21 16:04, Shrenik Bhura wrote:
>>
>> Hello Petr,
>>
>> Thanks for your guidance.
>>
>> It does seem that dhcp-boot is being reached even when pxe-service is
>> successfully executed. Taking a hint from this discussion on UEFI and PXE (
>> https://bbs.archlinux.org/viewtopic.php?id=237655), we tried this custom
>> configuration -
>>
>> pxe-prompt="Press any key for boot menu",2
>> pxe-service=X86-64_EFI,"PXELINUX (X86-64_EFI)",ltsp/snponly.efi
>> pxe-service=7,"PXELINUX (EFI)",ltsp/snponly.efi
>> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
>> dhcp-boot=tag:!iPXE,tag:X86-64_EFI,ltsp/snponly.efi
>> dhcp-boot=tag:iPXE,ltsp/ltsp.ipxe
>>
>> (full file attached below)
>>
>> Server does proceed to offering ltsp.ipxe to the client via dhcp but is
>> eventually not being transferred via tftp.
>>
>> Have attached logs, pcap and dnsmasq configuration of three scenarios -
>> 1. Default dnsmasq config with default ltsp's pxe-service entries
>> 2. Custom pxe-service entries
>> 3. Without pxe-service entries
>>
>> We have tested these with two systems - Intel NUC and Dell Optiplex 3040
>> with their updated firmware and have found the same results.
>>
>> I hope this helps to zoom further into the problem area.
>>
>> Best regards,
>> Shrenik
>>
>>
>>
>>
>> On Mon, 27 Sept 2021 at 17:00, Petr Menšík <pemensik at redhat.com> wrote:
>>
>>> Hi Alkis,
>>>
>>> It would be helpful, if you could record pcap with those lines commented
>>> out and enabled. It seems suspicious dhcp-boot option is present at the
>>> same time with pxe-service. From what I undestood, pxe-service should
>>> offer boot options only to PXEClient vendor string. I think it saves you
>>> the need to dhcp-match=set:X86PC,option:client-arch,0
>>>
>>> then matched in
>>> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
>>> dhcp-boot=tag:iPXE,tag:X86PC,ltsp/ltsp.ipxe
>>>
>>> I just checked my Raspberry 3. I guess architecture of RPi in DHCP
>>> request is clearly wrong. Unfortunately it reports it wrong also in
>>> vendorclass ARCH:0000.
>>>
>>> Anyway, it might not handle tags correctly. Around src/rfc2131.c:891, it
>>> searches for pxe service without using tags. It is not used to find
>>> correct service, just to find correct context.
>>>
>>> Also it seems if any pxe-service is defined and incoming DHCP packet
>>> contains PXEClient in VendorClass option, it MUST be handled by
>>> pxe-service. If no correct service & context is found, reply is not
>>> handled for it. It cannot fall back to normal DHCP reply in that case,
>>> which can be fixed. But current situation seems to me clear. If any
>>> pxe-service is present, all PXEClient packets has to be handled by it.
>>> It seems to me you define tags per arch anyway, so I guess you can avoid
>>> pxe-service just fine.
>>>
>>> I made an attempt to respond to PXE request only when correct service
>>> matches. But I have no setup prepared for it, I tested just it compiles.
>>> Could you try it would help?
>>>
>>> Cheers,
>>> Petr
>>>
>>> On 3/19/21 10:05, Alkis Georgopoulos wrote:
>>> > Hi all,
>>> >
>>> > I'm one of the LTSP developers; I asked Shrenik to contact the dnsmasq
>>> > mailing list because I feel this might be a dnsmasq issue.
>>> >
>>> > Specifically, success or failure depends on whether these five lines
>>> > are commented out or not:
>>> >
>>> >
>>> #pxe-service=tag:proxy,tag:!iPXE,X86PC,"undionly.kpxe",ltsp/undionly.kpxe
>>> >
>>> #pxe-service=tag:proxy,tag:!iPXE,X86-64_EFI,"snponly.efi",ltsp/snponly.efi
>>> >
>>> > #pxe-service=tag:proxy,tag:iPXE,X86PC,"ltsp.ipxe",ltsp/ltsp.ipxe
>>> > #pxe-service=tag:proxy,tag:iPXE,X86-64_EFI,"ltsp.ipxe",ltsp/ltsp.ipxe
>>> > #pxe-service=tag:rpi,X86PC,"Raspberry Pi Boot   ",unused
>>> >
>>> > You may find the full configuration files and logs at:
>>> > https://github.com/ltsp/ltsp/pull/417
>>> >
>>> > The reason I feel it might be a dnsmasq issue, is that these tags are
>>> > NOT matched in Shrenik's use case. He's not using proxy mode and he's
>>> > not booting a Raspberry Pi.
>>> >
>>> > So, "pxe-service" lines that are NOT matched, cause the problem,
>>> > yet if they're commented out, the problem is gone...
>>> >
>>> > Would that be an issue with dnsmasq, or with the UEFI PXE stack?
>>> >
>>> > Thanks,
>>> > Alkis Georgopoulos
>>> >
>>> > _______________________________________________
>>> > Dnsmasq-discuss mailing list
>>> > Dnsmasq-discuss at lists.thekelleys.org.uk
>>> >
>>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>>> >
>>> --
>>> Petr Menšík
>>> Software Engineer
>>> Red Hat, http://www.redhat.com/
>>> email: pemensik at redhat.com
>>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>>> _______________________________________________
>>> Dnsmasq-discuss mailing list
>>> Dnsmasq-discuss at lists.thekelleys.org.uk
>>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>>>
>> --
>> Petr Menšík
>> Software Engineer
>> Red Hat, http://www.redhat.com/
>> email: pemensik at redhat.com
>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>>
>> --
> Petr Menšík
> Software Engineer
> Red Hat, http://www.redhat.com/
> email: pemensik at redhat.com
> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20210930/68e44d96/attachment-0001.htm>


More information about the Dnsmasq-discuss mailing list