<div dir="auto"><div>Hi Petr,<div dir="auto"><br></div><div dir="auto">Is there anything else needed from me on this to diagnose this further?</div><div dir="auto"><br></div><div dir="auto">Last I had shared the log and pcap corresponding to the case 1. i.e., pxe-service entries with tag:proxy with dhcp-boot .<br><br><div data-smartmail="gmail_signature" dir="auto">Regards,<br>Shrenik</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 30 Sep, 2021, 16:17 Shrenik Bhura, <<a href="mailto:shrenik.bhura@gmail.com">shrenik.bhura@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">
<div id="m_-6684599476645665730gmail-:1zw"><div dir="ltr"><div dir="ltr"><span><div>>
1. seems to have wrong pcap file or it does not use configuration
attached in linked archive. It seems it offers menu items from 2.
archive with custom pxe-services. <br></div><div><br></div></span><div>Apologies, there was definitely some mistake.</div><div><br></div><div>We
have applied the patch and tried with and without dhcp-no-override but
it still fails to boot. Herein are the pcap and the logs for
this case. <br></div><div><a href="https://drive.google.com/file/d/1-GvsId99FC8f8B2I0YaTVuje5385u4LC/view?usp=sharing" target="_blank" rel="noreferrer">https://drive.google.com/file/d/1-GvsId99FC8f8B2I0YaTVuje5385u4LC/view?usp=sharing</a></div><div><br></div><div>Additionally, also included is the qemu pcap wherein it does boot successfully.</div></div></div></div>
</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 29 Sept 2021 at 20:29, Petr Menšík <<a href="mailto:pemensik@redhat.com" target="_blank" rel="noreferrer">pemensik@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<p>It is somehow hard to guess described results for each
configuration (1. 2. 3.). It is unclear to me, what you saw for
each variant printed by the computer.</p>
<p>1. seems to have wrong pcap file or it does not use configuration
attached in linked archive. It seems it offers menu items from 2.
archive with custom pxe-services.<br>
</p>
<p>Option 43 Suboption: (9) PXE boot menu<br>
Length: 41<br>
boot menu:
8000155058454c494e555820285838362d36345f4546492980010e5058454c494e555820…<br>
Type: Unknown (32768)<br>
Length: 21<br>
Description: PXELINUX (X86-64_EFI)<br>
Type: Unknown (32769)<br>
Length: 14<br>
Description: PXELINUX (EFI)<br>
</p>
<p>Above is not present in config file presented for it, but in 2.
Are you sure you have killed dnsmasq and started it again?<br>
</p>
<p>I think it might be difference between pxe-service served file
chosen via menuboot. I have noticed there are two way to specify
file to boot in DHCP for IPv4. One is in fixed header and first
try chosen from menu is in that. pxe-service options makes it to
request direct query to DHCP server, marked proxyDHCP in
wireshark. This proxy ACK is followed by TFTP.</p>
<p>I used filter in wireshark: "dhcp or (!tftp.destination_file
&& tftp)"<br>
</p>
<p>However following DHCP offers boot file path ONLY in option 67
value. Fixed header boot file is all zeroed. It seems to me this
is the part the snponly.efi firmware does not understand. It does
not try to use path in option, but may insist only on file. Since
option #52 overload is not in packet, I guess dnsmasq should have
used mess->file for path and not option 67. But rules of
rfc2131.c:2476 are simple. If client have requested option 67, it
should handle it as option 67. I guess it is bug in snponly.efi.
Either it should not include option 67 between requested options
or it should actually handle the option. Dnsmasq would offer boot
path in both cases.</p>
<p>Interesting enough, dnsmasq is inconsistent with itself. It
behaves a bit different way in PXE proxy mode, where file header
part is always used. In normal mode unless --dhcp-no-override is
used, option is used if requested.</p>
<p>Can you please try if dhcp-no-override option would fix your
issues? I think it should behave the same way in both situations.</p>
<p>I attached patch, which would set boot file on pxe-service the
same way as dhcp-boot. It may require dhcp-no-override where it
did not before. Could you please try it?<br>
</p>
<div>On 9/28/21 11:54, Shrenik Bhura wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hi Petr,</div>
<div><br>
</div>
<div>As per your guidance, we have enabled logging (LOG_ALL in
config/consolle.h) and recompiled the ipxe binaries. Below are
the latest observations.<br>
</div>
<div> <br>
</div>
<div>Taking down the scenarios from the previous post for ease
of reference -<br>
</div>
<div>
<div>1. Default dnsmasq config with default ltsp's pxe-service
entries - <a href="https://drive.google.com/file/d/1-BGnZw4RMAuIbJudVA2D4a1vasNeAd1j/view?usp=sharing" target="_blank" rel="noreferrer">https://drive.google.com/file/d/1-BGnZw4RMAuIbJudVA2D4a1vasNeAd1j/view?usp=sharing</a></div>
<div>2. Custom pxe-service entries (just to prove that
pxe-service and dhcp-boot do seem to successfully co-exist)
- <a href="https://drive.google.com/file/d/1-CjHXxlKmYw-9aOTD7xK8m5uAdj4qyAB/view?usp=sharing" target="_blank" rel="noreferrer">https://drive.google.com/file/d/1-CjHXxlKmYw-9aOTD7xK8m5uAdj4qyAB/view?usp=sharing</a></div>
<div>3. Without pxe-service entries - <a href="https://drive.google.com/file/d/1-6Q_1Fg6zVVNruzQTJjxvmKRRkRnCBmh/view?usp=sharing" target="_blank" rel="noreferrer">https://drive.google.com/file/d/1-6Q_1Fg6zVVNruzQTJjxvmKRRkRnCBmh/view?usp=sharing</a>
</div>
</div>
<div><br>
</div>
<div>I'll try to summarise the understanding and prevailing
ambiguities thus far to help allot responsibility of multiple
things that may be going wrong here :</div>
<div><br>
</div>
<div>Between scenario (1) and (2), we see that ltsp.ipxe is
being served in (2) which doesn't happen in (1).</div>
<div>In (1), the primary issue is that EFI clients do not
receive snponly.efi, thus they do not advertise option 175 and
hence are not sent the ltsp.ipxe. Since it has not got to the
iPXE stage as yet, there are no logs available from ipxe. All
that is visible momentarily on the client side is these two
lines -</div>
<div><span><b>Station IP address is 192.168.67.134<br>
</b></span></div>
<div><span><b>PXE-E21: Remote boot cancelled.</b></span></div>
<div><span>Quoting from an explanation herein [1] for "Remote
boot cancelled" -</span></div>
<div><span><i>"
This message is also displayed when a DHCP/proxyDHCP
server sends a menu that auto-selects <b>Local Boot</b>
and when a bootserver sends a bootstrap program that
returns control to the PXE LoadFile protocol.
"</i><b><br>
</b></span></div>
<div><span><b><br>
</b></span></div>
<div><span>In scenario (2), PXE boot menu is displayed as
defined in the pxe-service lines, option 175 is received
back from the client, ltsp.ipxe is sent but is not
"downloaded" by the client. There is nothing reported in the
ipxe logs. On the client, the last line says - <br>
</span></div>
<div><span>No more network devices.<b> <br>
</b></span></div>
<div><span><b><br>
</b></span></div>
<div>But, above all, if we simply comment out all the
pxe-service lines, as in scenario (3), including the one with
tag:rpi, the EFI clients boot up perfectly. iPXE log has -<br>
</div>
<div><span>ipxe: Downloaded "ltsp.ipxe"<br>
</span></div>
<div><span>ipxe: Executing "ltsp.ipxe"</span></div>
<div><span>ipxe: Downloaded "vmlinuz"</span></div>
<div><span>ipxe: Downloaded "ltsp.img"</span></div>
<div><span>ipxe: Downloaded "initrd.img"<br>
</span></div>
<div><span>ipxe: Executing "vmlinuz"</span></div>
<div><br>
</div>
<div>The question thus arises that why does dnsmasq not ignore
the pxe-service lines which have an unmatched "tag:proxy" or
"tag:rpi" when dnsmasq is operating in non-proxy mode? Or does
it ignore and yet there is a problem outside dnsmasq? With
respect to scenario (1), there could be a problem in the UEFI
implementation, with respect to (2), there could be an issue
with iPXE but what we can immediately control within dnsmasq
is to ignore lines of pxe-service with tags that have not been
set.<br>
</div>
<div><br>
</div>
<div>Your thoughts?</div>
<div><br>
</div>
<div>[1] <a href="https://techpubs.jurassic.nl/manuals/hdwr/enduser/SG750_UG/sgi_html/ch04.html" target="_blank" rel="noreferrer">https://techpubs.jurassic.nl/manuals/hdwr/enduser/SG750_UG/sgi_html/ch04.html</a></div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 27 Sept 2021 at 22:56,
Petr Menšík <<a href="mailto:pemensik@redhat.com" target="_blank" rel="noreferrer">pemensik@redhat.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote">
<div>
<p>Hello,</p>
<p>I made a mistake when reading the code. You are right.
The part I mentioned is only affected on vendor-class
information option 43, only in DHCPREQUEST or DHCPINFORM.
Which is not in request in pcap you have sent.</p>
<p>It seems to me problem is somewhere on IPXE side in
decoding reply dnsmasq sent to it. I took a look at the
second offer of both without-pxe and default-ltsp. It
seems the only difference is in vendorclass information
containing PXE menu. Without pxe continues to TFTP, where
default is stuck. The answer is on its decoding side.
Assignment got the same boot file successfully in both
configurations. I am afraid it would be problem at PXE
decoding client, which may not understand menu dnsmasq
tried to send.</p>
<p>According to option 43 decoding in wireshark, pxe
suboptions look well. Except suboption 9 boot menu. Type
unknown 0x8000 does seem weird, but should be just Vendor
use according to IBM docs [1]. Why it did not do anything
else should be answered by ipxe people. It should continue
after 2 seconds even without any action. Did it display at
least boot menu on that station? Did it show anything? Are
those machines with normal VGA output? Perhaps LOG_LEVEL
in PXE [2] might reveal true reason.</p>
<p>Cheers,<br>
Petr<br>
</p>
<p>1.
<a href="https://www.ibm.com/docs/en/aix/7.2?topic=daemon-pxe-vendor-container-suboptions" target="_blank" rel="noreferrer">https://www.ibm.com/docs/en/aix/7.2?topic=daemon-pxe-vendor-container-suboptions</a><br>
2. <a href="https://ipxe.org/buildcfg/log_level" target="_blank" rel="noreferrer">https://ipxe.org/buildcfg/log_level</a><br>
</p>
<div>On 9/27/21 16:04, Shrenik Bhura wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Hello Petr,</div>
<div><br>
</div>
<div>Thanks for your guidance.</div>
<div><br>
</div>
<div>It does seem that dhcp-boot is being reached even
when pxe-service is successfully executed. Taking a
hint from this discussion on UEFI and PXE (<a href="https://bbs.archlinux.org/viewtopic.php?id=237655" target="_blank" rel="noreferrer">https://bbs.archlinux.org/viewtopic.php?id=237655</a>),
we tried this custom configuration -</div>
<div><br>
<span>pxe-prompt="Press any key for boot menu",2<br>
pxe-service=X86-64_EFI,"PXELINUX
(X86-64_EFI)",ltsp/snponly.efi<br>
pxe-service=7,"PXELINUX (EFI)",ltsp/snponly.efi</span></div>
<div><span>dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe<br>
dhcp-boot=tag:!iPXE,tag:X86-64_EFI,ltsp/snponly.efi<br>
dhcp-boot=tag:iPXE,ltsp/ltsp.ipxe</span></div>
<div><br>
</div>
<div>(full file attached below)</div>
<div><br>
</div>
<div>Server does proceed to offering ltsp.ipxe to the
client via dhcp but is eventually not being
transferred via tftp.<br>
</div>
<div><br>
</div>
<div>Have attached logs, pcap and dnsmasq configuration
of three scenarios -</div>
<div>1. Default dnsmasq config with default ltsp's
pxe-service entries<br>
</div>
<div>2. Custom pxe-service entries <br>
</div>
<div>3. Without pxe-service entries<br>
</div>
<div><br>
</div>
<div>We have tested these with two systems - Intel NUC
and Dell Optiplex 3040 with their updated firmware and
have found the same results. <br>
</div>
<div><br>
</div>
<div>I hope this helps to zoom further into the problem
area. <br>
</div>
<div><br>
</div>
<div>Best regards,<br>
</div>
<div>Shrenik</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, 27 Sept 2021
at 17:00, Petr Menšík <<a href="mailto:pemensik@redhat.com" target="_blank" rel="noreferrer">pemensik@redhat.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote">Hi Alkis,<br>
<br>
It would be helpful, if you could record pcap with
those lines commented<br>
out and enabled. It seems suspicious dhcp-boot option
is present at the<br>
same time with pxe-service. From what I undestood,
pxe-service should<br>
offer boot options only to PXEClient vendor string. I
think it saves you<br>
the need to dhcp-match=set:X86PC,option:client-arch,0<br>
<br>
then matched in<br>
dhcp-boot=tag:!iPXE,tag:X86PC,ltsp/undionly.kpxe<br>
dhcp-boot=tag:iPXE,tag:X86PC,ltsp/ltsp.ipxe<br>
<br>
I just checked my Raspberry 3. I guess architecture of
RPi in DHCP<br>
request is clearly wrong. Unfortunately it reports it
wrong also in<br>
vendorclass ARCH:0000.<br>
<br>
Anyway, it might not handle tags correctly. Around
src/rfc2131.c:891, it<br>
searches for pxe service without using tags. It is not
used to find<br>
correct service, just to find correct context.<br>
<br>
Also it seems if any pxe-service is defined and
incoming DHCP packet<br>
contains PXEClient in VendorClass option, it MUST be
handled by<br>
pxe-service. If no correct service & context is
found, reply is not<br>
handled for it. It cannot fall back to normal DHCP
reply in that case,<br>
which can be fixed. But current situation seems to me
clear. If any<br>
pxe-service is present, all PXEClient packets has to
be handled by it.<br>
It seems to me you define tags per arch anyway, so I
guess you can avoid<br>
pxe-service just fine.<br>
<br>
I made an attempt to respond to PXE request only when
correct service<br>
matches. But I have no setup prepared for it, I tested
just it compiles.<br>
Could you try it would help?<br>
<br>
Cheers,<br>
Petr<br>
<br>
On 3/19/21 10:05, Alkis Georgopoulos wrote:<br>
> Hi all,<br>
><br>
> I'm one of the LTSP developers; I asked Shrenik
to contact the dnsmasq<br>
> mailing list because I feel this might be a
dnsmasq issue.<br>
><br>
> Specifically, success or failure depends on
whether these five lines<br>
> are commented out or not:<br>
><br>
>
#pxe-service=tag:proxy,tag:!iPXE,X86PC,"undionly.kpxe",ltsp/undionly.kpxe<br>
>
#pxe-service=tag:proxy,tag:!iPXE,X86-64_EFI,"snponly.efi",ltsp/snponly.efi<br>
><br>
>
#pxe-service=tag:proxy,tag:iPXE,X86PC,"ltsp.ipxe",ltsp/ltsp.ipxe<br>
>
#pxe-service=tag:proxy,tag:iPXE,X86-64_EFI,"ltsp.ipxe",ltsp/ltsp.ipxe<br>
> #pxe-service=tag:rpi,X86PC,"Raspberry Pi Boot
",unused<br>
><br>
> You may find the full configuration files and
logs at:<br>
> <a href="https://github.com/ltsp/ltsp/pull/417" rel="noreferrer noreferrer" target="_blank">https://github.com/ltsp/ltsp/pull/417</a><br>
><br>
> The reason I feel it might be a dnsmasq issue, is
that these tags are<br>
> NOT matched in Shrenik's use case. He's not using
proxy mode and he's<br>
> not booting a Raspberry Pi.<br>
><br>
> So, "pxe-service" lines that are NOT matched,
cause the problem,<br>
> yet if they're commented out, the problem is
gone...<br>
><br>
> Would that be an issue with dnsmasq, or with the
UEFI PXE stack?<br>
><br>
> Thanks,<br>
> Alkis Georgopoulos<br>
><br>
> _______________________________________________<br>
> Dnsmasq-discuss mailing list<br>
> <a href="mailto:Dnsmasq-discuss@lists.thekelleys.org.uk" target="_blank" rel="noreferrer">Dnsmasq-discuss@lists.thekelleys.org.uk</a><br>
> <a href="https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss" rel="noreferrer noreferrer" target="_blank">https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss</a><br>
><br>
-- <br>
Petr Menšík<br>
Software Engineer<br>
Red Hat, <a href="http://www.redhat.com/" rel="noreferrer noreferrer" target="_blank">http://www.redhat.com/</a><br>
email: <a href="mailto:pemensik@redhat.com" target="_blank" rel="noreferrer">pemensik@redhat.com</a><br>
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB<br>
_______________________________________________<br>
Dnsmasq-discuss mailing list<br>
<a href="mailto:Dnsmasq-discuss@lists.thekelleys.org.uk" target="_blank" rel="noreferrer">Dnsmasq-discuss@lists.thekelleys.org.uk</a><br>
<a href="https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss" rel="noreferrer noreferrer" target="_blank">https://lists.thekelleys.org.uk/cgi-bin/mailman/listinfo/dnsmasq-discuss</a><br>
</blockquote>
</div>
</blockquote>
<pre cols="72">--
Petr Menšík
Software Engineer
Red Hat, <a href="http://www.redhat.com/" target="_blank" rel="noreferrer">http://www.redhat.com/</a>
email: <a href="mailto:pemensik@redhat.com" target="_blank" rel="noreferrer">pemensik@redhat.com</a>
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB</pre>
</div>
</blockquote>
</div>
</blockquote>
<pre cols="72">--
Petr Menšík
Software Engineer
Red Hat, <a href="http://www.redhat.com/" target="_blank" rel="noreferrer">http://www.redhat.com/</a>
email: <a href="mailto:pemensik@redhat.com" target="_blank" rel="noreferrer">pemensik@redhat.com</a>
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB</pre>
</div>
</blockquote></div>
</blockquote></div></div></div>