[Dnsmasq-discuss] pxe-service entries in dnsmasq conf seem to fail non-proxy EFI boot

Shrenik Bhura shrenik.bhura at gmail.com
Sat Oct 2 14:34:49 UTC 2021


Hi Petr,

Is there anything else needed from me on this to diagnose this further?

Last I had shared the log and pcap corresponding to the case 1. i.e.,
pxe-service entries with tag:proxy with dhcp-boot .

Regards,
Shrenik

On Thu, 30 Sep, 2021, 16:17 Shrenik Bhura, <shrenik.bhura at gmail.com> wrote:

> > 1. seems to have wrong pcap file or it does not use configuration
> attached in linked archive. It seems it offers menu items from 2. archive
> with custom pxe-services.
>
> Apologies, there was definitely some mistake.
>
> We have applied the patch and tried with and without dhcp-no-override but
> it still fails to boot. Herein are the pcap and the logs for this case.
>
> https://drive.google.com/file/d/1-GvsId99FC8f8B2I0YaTVuje5385u4LC/view?usp=sharing
>
> Additionally, also included is the qemu pcap wherein it does boot
> successfully.
>
> On Wed, 29 Sept 2021 at 20:29, Petr Menšík <pemensik at redhat.com> wrote:
>
>> It is somehow hard to guess described results for each configuration (1.
>> 2. 3.). It is unclear to me, what you saw for each variant printed by the
>> computer.
>>
>> 1. seems to have wrong pcap file or it does not use configuration
>> attached in linked archive. It seems it offers menu items from 2. archive
>> with custom pxe-services.
>>
>> Option 43 Suboption: (9) PXE boot menu
>>     Length: 41
>>     boot menu:
>> 8000155058454c494e555820285838362d36345f4546492980010e5058454c494e555820…
>>         Type: Unknown (32768)
>>         Length: 21
>>         Description: PXELINUX (X86-64_EFI)
>>         Type: Unknown (32769)
>>         Length: 14
>>         Description: PXELINUX (EFI)
>>
>> Above is not present in config file presented for it, but in 2. Are you
>> sure you have killed dnsmasq and started it again?
>>
>> I think it might be difference between pxe-service served file chosen via
>> menuboot. I have noticed there are two way to specify file to boot in DHCP
>> for IPv4. One is in fixed header and first try chosen from menu is in that.
>> pxe-service options makes it to request direct query to DHCP server, marked
>> proxyDHCP in wireshark. This proxy ACK is followed by TFTP.
>>
>> I used filter in wireshark: "dhcp or (!tftp.destination_file && tftp)"
>>
>> However following DHCP offers boot file path ONLY in option 67 value.
>> Fixed header boot file is all zeroed. It seems to me this is the part the
>> snponly.efi firmware does not understand. It does not try to use path in
>> option, but may insist only on file. Since option #52 overload is not in
>> packet, I guess dnsmasq should have used mess->file for path and not option
>> 67. But rules of rfc2131.c:2476 are simple. If client have requested option
>> 67, it should handle it as option 67. I guess it is bug in snponly.efi.
>> Either it should not include option 67 between requested options or it
>> should actually handle the option. Dnsmasq would offer boot path in both
>> cases.
>>
>> Interesting enough, dnsmasq is inconsistent with itself. It behaves a bit
>> different way in PXE proxy mode, where file header part is always used. In
>> normal mode unless --dhcp-no-override is used, option is used if requested.
>>
>> Can you please try if dhcp-no-override option would fix your issues? I
>> think it should behave the same way in both situations.
>>
>> I attached patch, which would set boot file on pxe-service the same way
>> as dhcp-boot. It may require dhcp-no-override where it did not before.
>> Could you please try it?
>> On 9/28/21 11:54, Shrenik Bhura wrote:
>>
>> Hi Petr,
>>
>> As per your guidance, we have enabled logging (LOG_ALL in
>> config/consolle.h) and recompiled the ipxe binaries. Below are the latest
>> observations.
>>
>> Taking down the scenarios from the previous post for ease of reference -
>> 1. Default dnsmasq config with default ltsp's pxe-service entries -
>> https://drive.google.com/file/d/1-BGnZw4RMAuIbJudVA2D4a1vasNeAd1j/view?usp=sharing
>> 2. Custom pxe-service entries (just to prove that pxe-service and
>> dhcp-boot do seem to successfully co-exist) -
>> https://drive.google.com/file/d/1-CjHXxlKmYw-9aOTD7xK8m5uAdj4qyAB/view?usp=sharing
>> 3. Without pxe-service entries -
>> https://drive.google.com/file/d/1-6Q_1Fg6zVVNruzQTJjxvmKRRkRnCBmh/view?usp=sharing
>>
>> I'll try to summarise the understanding and prevailing ambiguities thus
>> far to help allot responsibility of multiple things that may be going wrong
>> here :
>>
>> Between scenario (1) and (2), we see that ltsp.ipxe is being served in
>> (2) which doesn't happen in (1).
>> In (1), the primary issue is that EFI clients do not receive snponly.efi,
>> thus they do not advertise option 175 and hence are not sent the ltsp.ipxe.
>> Since it has not got to the iPXE stage as yet, there are no logs available
>> from ipxe.  All that is visible momentarily on the client side is these two
>> lines -
>>
>> *Station IP address is 192.168.67.134 *
>> *PXE-E21: Remote boot cancelled.*
>> Quoting from an explanation herein [1] for "Remote boot cancelled" -
>> *" This message is also displayed when a DHCP/proxyDHCP server sends a
>> menu that auto-selects Local Boot and when a bootserver sends a bootstrap
>> program that returns control to the PXE LoadFile protocol. "*
>>
>> In scenario (2), PXE boot menu is displayed as defined in the pxe-service
>> lines, option 175 is received back from the client, ltsp.ipxe is sent but
>> is not "downloaded" by the client. There is nothing reported in the ipxe
>> logs. On the client, the last line says -
>> No more network devices.
>>
>> But, above all, if we simply comment out all the pxe-service lines, as in
>> scenario (3), including the one with tag:rpi, the EFI clients boot up
>> perfectly. iPXE log has -
>> ipxe: Downloaded "ltsp.ipxe"
>> ipxe: Executing "ltsp.ipxe"
>> ipxe: Downloaded "vmlinuz"
>> ipxe: Downloaded "ltsp.img"
>> ipxe: Downloaded "initrd.img"
>> ipxe: Executing "vmlinuz"
>>
>> The question thus arises that why does dnsmasq not ignore the pxe-service
>> lines which have an unmatched "tag:proxy" or "tag:rpi" when dnsmasq is
>> operating in non-proxy mode? Or does it ignore and yet there is a problem
>> outside dnsmasq? With respect to scenario (1), there could be a problem in
>> the UEFI implementation, with respect to (2), there could be an issue with
>> iPXE but what we can immediately control within dnsmasq is to ignore lines
>> of pxe-service with tags that have not been set.
>>
>> Your thoughts?
>>
>> [1]
>> https://techpubs.jurassic.nl/manuals/hdwr/enduser/SG750_UG/sgi_html/ch04.html
>>
>> On Mon, 27 Sept 2021 at 22:56, Petr Menšík <pemensik at redhat.com> wrote:
>>
>>> Hello,
>>>
>>> I made a mistake when reading the code. You are right. The part I
>>> mentioned is only affected on vendor-class information option 43, only in
>>> DHCPREQUEST or DHCPINFORM. Which is not in request in pcap you have sent.
>>>
>>> It seems to me problem is somewhere on IPXE side in decoding reply
>>> dnsmasq sent to it. I took a look at the second offer of both without-pxe
>>> and default-ltsp. It seems the only difference is in vendorclass
>>> information containing PXE menu. Without pxe continues to TFTP, where
>>> default is stuck. The answer is on its decoding side. Assignment got the
>>> same boot file successfully in both configurations. I am afraid it would be
>>> problem at PXE decoding client, which may not understand menu dnsmasq tried
>>> to send.
>>>
>>> According to option 43 decoding in wireshark, pxe suboptions look well.
>>> Except suboption 9 boot menu. Type unknown 0x8000 does seem weird, but
>>> should be just Vendor use according to IBM docs [1]. Why it did not do
>>> anything else should be answered by ipxe people. It should continue after 2
>>> seconds even without any action. Did it display at least boot menu on that
>>> station? Did it show anything? Are those machines with normal VGA output?
>>> Perhaps LOG_LEVEL in PXE [2] might reveal true reason.
>>>
>>> Cheers,
>>> Petr
>>>
>>> 1.
>>> https://www.ibm.com/docs/en/aix/7.2?topic=daemon-pxe-vendor-container-suboptions
>>> 2. https://ipxe.org/buildcfg/log_level
>>> On 9/27/21 16:04, Shrenik Bhura wrote:
>>>
>>> Hello Petr,
>>>
>>> Thanks for your guidance.
>>>
>>> It does seem that dhcp-boot is being reached even when pxe-service is
>>> successfully executed. Taking a hint from this discussion on UEFI and PXE (
>>> https://bbs.archlinux.org/viewtopic.php?id=237655), we tried this
>>> custom configuration -
>>>
>>> pxe-prompt="Press any key for boot menu",2
>>> pxe-service=X86-64_EFI,"PXELINUX (X86-64_EFI)",ltsp/snponly.efi
>>> pxe-service=7,"PXELINUX (EFI)",ltsp/snponly.efi
>>> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
>>> dhcp-boot=tag:!iPXE,tag:X86-64_EFI,ltsp/snponly.efi
>>> dhcp-boot=tag:iPXE,ltsp/ltsp.ipxe
>>>
>>> (full file attached below)
>>>
>>> Server does proceed to offering ltsp.ipxe to the client via dhcp but is
>>> eventually not being transferred via tftp.
>>>
>>> Have attached logs, pcap and dnsmasq configuration of three scenarios -
>>> 1. Default dnsmasq config with default ltsp's pxe-service entries
>>> 2. Custom pxe-service entries
>>> 3. Without pxe-service entries
>>>
>>> We have tested these with two systems - Intel NUC and Dell Optiplex 3040
>>> with their updated firmware and have found the same results.
>>>
>>> I hope this helps to zoom further into the problem area.
>>>
>>> Best regards,
>>> Shrenik
>>>
>>>
>>>
>>>
>>> On Mon, 27 Sept 2021 at 17:00, Petr Menšík <pemensik at redhat.com> wrote:
>>>
>>>> Hi Alkis,
>>>>
>>>> It would be helpful, if you could record pcap with those lines commented
>>>> out and enabled. It seems suspicious dhcp-boot option is present at the
>>>> same time with pxe-service. From what I undestood, pxe-service should
>>>> offer boot options only to PXEClient vendor string. I think it saves you
>>>> the need to dhcp-match=set:X86PC,option:client-arch,0
>>>>
>>>> then matched in
>>>> dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe
>>>> dhcp-boot=tag:iPXE,tag:X86PC,ltsp/ltsp.ipxe
>>>>
>>>> I just checked my Raspberry 3. I guess architecture of RPi in DHCP
>>>> request is clearly wrong. Unfortunately it reports it wrong also in
>>>> vendorclass ARCH:0000.
>>>>
>>>> Anyway, it might not handle tags correctly. Around src/rfc2131.c:891, it
>>>> searches for pxe service without using tags. It is not used to find
>>>> correct service, just to find correct context.
>>>>
>>>> Also it seems if any pxe-service is defined and incoming DHCP packet
>>>> contains PXEClient in VendorClass option, it MUST be handled by
>>>> pxe-service. If no correct service & context is found, reply is not
>>>> handled for it. It cannot fall back to normal DHCP reply in that case,
>>>> which can be fixed. But current situation seems to me clear. If any
>>>> pxe-service is present, all PXEClient packets has to be handled by it.
>>>> It seems to me you define tags per arch anyway, so I guess you can avoid
>>>> pxe-service just fine.
>>>>
>>>> I made an attempt to respond to PXE request only when correct service
>>>> matches. But I have no setup prepared for it, I tested just it compiles.
>>>> Could you try it would help?
>>>>
>>>> Cheers,
>>>> Petr
>>>>
>>>> On 3/19/21 10:05, Alkis Georgopoulos wrote:
>>>> > Hi all,
>>>> >
>>>> > I'm one of the LTSP developers; I asked Shrenik to contact the dnsmasq
>>>> > mailing list because I feel this might be a dnsmasq issue.
>>>> >
>>>> > Specifically, success or failure depends on whether these five lines
>>>> > are commented out or not:
>>>> >
>>>> >
>>>> #pxe-service=tag:proxy,tag:!iPXE,X86PC,"undionly.kpxe",ltsp/undionly.kpxe
>>>> >
>>>> #pxe-service=tag:proxy,tag:!iPXE,X86-64_EFI,"snponly.efi",ltsp/snponly.efi
>>>> >
>>>> > #pxe-service=tag:proxy,tag:iPXE,X86PC,"ltsp.ipxe",ltsp/ltsp.ipxe
>>>> > #pxe-service=tag:proxy,tag:iPXE,X86-64_EFI,"ltsp.ipxe",ltsp/ltsp.ipxe
>>>> > #pxe-service=tag:rpi,X86PC,"Raspberry Pi Boot   ",unused
>>>> >
>>>> > You may find the full configuration files and logs at:
>>>> > https://github.com/ltsp/ltsp/pull/417
>>>> >
>>>> > The reason I feel it might be a dnsmasq issue, is that these tags are
>>>> > NOT matched in Shrenik's use case. He's not using proxy mode and he's
>>>> > not booting a Raspberry Pi.
>>>> >
>>>> > So, "pxe-service" lines that are NOT matched, cause the problem,
>>>> > yet if they're commented out, the problem is gone...
>>>> >
>>>> > Would that be an issue with dnsmasq, or with the UEFI PXE stack?
>>>> >
>>>> > Thanks,
>>>> > Alkis Georgopoulos
>>>> >
>>>> > _______________________________________________
>>>> > Dnsmasq-discuss mailing list
>>>> > Dnsmasq-discuss at lists.thekelleys.org.uk
>>>> >
>>>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>>>> >
>>>> --
>>>> Petr Menšík
>>>> Software Engineer
>>>> Red Hat, http://www.redhat.com/
>>>> email: pemensik at redhat.com
>>>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>>>> _______________________________________________
>>>> Dnsmasq-discuss mailing list
>>>> Dnsmasq-discuss at lists.thekelleys.org.uk
>>>> https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss
>>>>
>>> --
>>> Petr Menšík
>>> Software Engineer
>>> Red Hat, http://www.redhat.com/
>>> email: pemensik at redhat.com
>>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>>>
>>> --
>> Petr Menšík
>> Software Engineer
>> Red Hat, http://www.redhat.com/
>> email: pemensik at redhat.com
>> PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/attachments/20211002/3d5a91be/attachment-0001.htm>


More information about the Dnsmasq-discuss mailing list