UEFI, SecureBoot, PXE, and You

For a while now we’ve had a need to PXE-boot computers that are set up for UEFI and SecureBoot but haven’t quite been able to pull it off.  For a long time, information on the subject was really difficult to come by and was mainly in the form of discussions by experts in the process of research and development.  I’m not an expert in the field of…anything really; I’m just an everyday computer repair grunt who knows enough about a lot of different things to make him wish he knew a lot about any one thing.  Over time I’ve occasionally taken the time to do a little more searching to see if tutorial-format information had become available and today I was not disappointed.

I came across this page on the Ubuntu wiki, and it was the glue I needed to finally make the pieces fit together in my mind.  It didn’t work exactly as described, so I’ll document my actual steps here.  I’m starting off with an already existing PXE environment that was working the old way, e.g. BIOS booting using pxelinux.  This is one of the big differences between my setup and the Ubuntu wiki page and that’s the starting point for this tutorial.  There are a lot of great resources out there for that so if you need help it’s only a web search away.

One thing to keep in mind is that my server is running CentOS, but my bootable PXE environment is LinuxMint.  What that basically means is that the config files will be CentOS related but I’m borrowing files from Ubuntu (since they have signed bootloader files easily documented in their tutorial).

First, here’s a list of files as per the Ubuntu wiki page:

  • shim.efi.signed from the shim-signed package, installed as bootx64.efi under the tftp root

  • grubnetx64.efi.signed from the grub2 source package (and shipped in the grub-efi-amd64-signed binary package), installed as ‘grubx64.efi’ under the tftp root

  • unicode.pf2 from the grub-common package, installed as grub/fonts/unicode.pf2 under the tftp root.

So getting these files is pretty easy.  We’re just going to extract them directly from the packages they belong to.  Now Ubuntu has a script on their page that is supposed to do this stuff, but I want to do it manually.  For one thing they have you grab grubnetx64 from the Saucy repo but that did not work for me.  It would load the grub menu, then work intermittently, otherwise giving two errors: “couldn’t send network packet” and “you need to load the kernel first”  Doing some searching it appears there have been bugs in that file fixed recently and the one from Trusty worked for me fine.  Here’s what we want to do (I ran these from my existing BIOS PXE environment):

apt-get download shim-signed
ar vx shim-signed_1.6+0.4-0ubuntu4_amd64.deb
tar -xvJf data.tar.xz
cp ./usr/lib/shim/shim.efi.signed ./bootx64.efi
# !! now you're ready to copy ./bootx64.efi to your tftproot
rm -rf ./usr data.tar.xz control.tar.gz debian-binary shim-signed_1.6+0.4-0ubuntu4_amd64.deb

wget -O grubx64.efi http://archive.ubuntu.com/ubuntu/dists/trusty/main/uefi/grub2-amd64/current/grubnetx64.efi.signed
# !! now copy ./grubx64.efi to your tftproot

apt-get download grub-common
ar vx grub-common_2.02~beta2-9ubuntu1_amd64.deb 
tar -xvJf data.tar.xz
cp ./usr/share/grub/unicode.pf2 ./
# !! now copy unicode.pf2 to your tpftproot under grub/fonts (e.g. /tftpboot/grub/fonts/)
rm -rf ./usr data.tar.xz control.tar.gz debian-binary grub-common_2.02~beta2-9ubuntu1_amd64.deb

Now we need to create a grub configuration file that will be stored on tftproot under the “grub” directory (e.g. /tftpboot/grub/grub.cfg).  Mine is a bit different from the one on the Ubuntu wiki since I’m mounting an NFS root and they were not.  My NFS root is on my server (192.168.1.15) under /exports/nfsrootqiana.  Keep in mind that in the kernel load lines , “(pxe)/” refers to the tftp root directory and that’s where the files vmlinuz-3.15.3 and initrd.img-qiana are located.  So here’s my config file:

# /tftpboot/grub/grub.cfg
set default="0"
set timeout=-1

if loadfont unicode ; then
 set gfxmode=auto
 set locale_dir=$prefix/locale
 set lang=en_US
fi
terminal_output gfxterm

set menu_color_normal=white/black
set menu_color_highlight=black/light-gray
if background_color 44,0,30; then
 clear
fi

function gfxmode {
 set gfxpayload="${1}"
 if [ "${1}" = "keep" ]; then
 set vt_handoff=vt.handoff=7
 else
 set vt_handoff=
 fi
}

set linux_gfx_mode=keep

export linux_gfx_mode

menuentry 'Linuxmint Qiana' {
 gfxmode $linux_gfx_mode
 linux (pxe)/vmlinuz-3.15.3 $vt_handoff root=/dev/nfs initrd=initrd.img-qiana nfsroot=192.168.1.15:/exports/nfsrootqiana ip=dhcp rw
 initrd (pxe)/initrd.img-qiana
}

We also need to tweak our DHCP config to respond to UEFI PXE requests; dnsmasq is my DHCP server of choice, as it gives local DNS registration for free. Here is my entire updated config, including the lines that make the old BIOS PXE boot work:

# /etc/dnsmasq.d/dhcp.conf
dhcp-range=192.168.1.100,192.168.1.254,12h
dhcp-option=3,192.168.1.1
dhcp-option=15,alltech.local
dhcp-boot=pxelinux.0
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=tag:efi-x86_64,bootx64.efi
except-interface=wan0
dhcp-authoritative

Or if your tftp root is on a different host than the one running dnsmasq, you can use a  configuration like the one below.  In this case, dnsmasq is running on 192.168.1.1 and the tftp root is hosted from a machine named server1 with the ip 192.168.1.15. See man dnsmasq for full details regarding the syntax for each option.

# /etc/dnsmasq.d/dhcp.conf
dhcp-range=192.168.1.100,192.168.1.254,12h
dhcp-option=3,192.168.1.1
dhcp-option=15,alltech.local
dhcp-boot=pxelinux.0,server1,192.168.1.15
dhcp-match=set:efi-x86_64,option:client-arch,7
dhcp-boot=tag:efi-x86_64,bootx64.efi,server1,192.168.1.15
except-interface=wan0
dhcp-authoritative

On our old CentOS (5.10), the builtin dnsmasq didn’t work with that “tag:” syntax so I had to update the program.  The source version of dnsmasq doesn’t come with the init scripts for Red Hat so I had to cheat a little bit.  I built dnsmasq from source and did a “make install” like usual.  Then I simply edited /etc/init.d/dnsmasq and changed it like so:

# /etc/init.d/dnsmasq
# change...
dnsmasq=/usr/sbin/dnsmasq
# to...
dnsmasq=/usr/local/sbin/dnsmasq

Now it will happily use the nice new version of dnsmasq. No fuss, no muss.

We should now be able to boot PXE from either a BIOS motherboard or a UEFI motherboard with or without SecureBoot enabled. For the curious, here’s the sequence of events as I understand it. Please correct me in the comments if I get something wrong:

  1. You tell the computer you want to boot via PXE.  It sends out a PXE request.
  2. DHCP Server initially sets the boot image file to pxelinux.0
  3. If booting from UEFI, the DHCP Server sees that the client architecture is 7 (EFI) and sets the tag “efi-x86_64”.
  4. If the efi-x86_64 tag is set, the DHCP Server switches the boot image to bootx64.efi (otherwise, the PC boots from pxelinux.0)
  5. DHCP Server sends the response to the client
  6. (from here on, I’m following UEFI sequence) The client requests the file bootx64.efi via TFTP
  7. SecureBoot checks the signature on bootx64.efi, and it has a valid Microsoft signature
  8. The firmware then loads and runs bootx64.efi
  9. bootx64.efi looks for grubx64.efi via tftp and checks its signature.  It’s signed by Canonical, and passes the check
  10. bootx64.efi loads and runs grubx64.efi, which in turn loads the grub config from tftp
  11. Normal grub boot sequence occurs, and we’re cooking with gas
Advertisements

Why uid != username

Due to various problems with Ubuntu Server, I’m migrating our server at the shop to CentOS.  Part of that migration included copying over the chroot environments that we use for PXE booting.  When it comes to copying large amounts of data over the network, especially when the permissions and ownership of that data is crucial, rsync is always my goto tool.  It can use compression to save network bandwidth, uses delta copies for files that are already copied but have changes, and has options to delete files from the destination that don’t exist in the source.  On the whole, rsync is pretty easy to use as well and the man page has EXCELLENT examples.  Here’s a simplified version of what I was using (the real one uses ssh tricks to get rsync to run as sudo on the other end):

rsync -avz 192.16 8.1.14:/nfsrootmav/ /nfsrootmav/

Looks good, right?  What this tells rsync to do is copy recursively and maintain ownership/permissions (-a), use verbose output messages (-v), and use compression during the copy (-z).  I ran this command and all my files copied happily and without complaint.  When I tried to test the copied chroot environment, however, it wasn’t quite right.  When I tried to shut the computer down (booted to the chroot over PXE) it logged out of Gnome instead.  The shutdown command in GDM didn’t work either.

I knew that the only difference between the original and the copy is that the copy…had been copied.  This lead me in the direction of permissions, though I couldn’t understand why they’d be different if rsync was running in archive mode.  I tried running diff on `ls` dumps of the original vs. the copied directories, but I never really got the outputs on the different servers right for the diff to tell me everything I wanted to know.  What I did glean from my diff was that if I looked at the uid’s and gid’s of the files numerically, the original and copy had different ownerships on some files!

What?  How could that be?  Then it occurred to me…I had copied the chroot from the host OS, not from within the chroot.  It turns out that by default rsync copies the user and group owners of a file by name, not uid and gid.  That makes sense, doesn’t it?  If I have a file that’s owned by, say, “messagebus” on one computer and I copy it to another, I want it to be owned by “messagebus” on the other computer too even if the “messagebus” user has a different numeric uid.  It went wrong here because files are stored with a numeric uid/gid, and any time an operation requests or sets the text user/group then they are simply looked up in /etc/passwd or /etc/group.  The files in my chroot had users that matched uids in it’s own /etc/passwd, but rsync was referencing the host OS’s /etc/passwd.

The solution?  rysnc has an option called “–numeric-ids”.  This will cause rsync to use the numeric uid’s and gid’s directly rather than trying to look up the plain text names.  I added that and my files copied properly.  The PXE-booted computer shutdown as it should and all was well with the world.

I actually never would have had this problem if I’d chrooted into my /nfsrootmav directory and rsynced out from there because rsync would have been looking at the right /etc/passwd and /etc/group, but then I never would have learned this great lesson.

New server at the shop

It has been a long time since I’ve written anything, I know. I’ve been so busy recently with several different things; I’ve wanted to write about all of it but I just haven’t had the time. Maybe I’ll cover a little bit more of it soon but right now I wanted to talk about an interesting problem we had with the new server at the computer shop.

On the hardware side it’s a Dell PowerEdge 1950 with a quad-core Xeon 2.33GHz CPU, 4 GB RAM, and a mirrored 500 GB RAID array. It’s not what you’d call cutting edge in the data-center but it’s a far cry from the dorky little single-core Pentium 4 thing we had been using.

Software-wise it’s running Ubuntu 10.04 Server which is taking care of our file-server needs along with housing our squid caching proxy and PXE boot environments. Since this server is equipped with dual gigabit LAN we bit the bullet and bought a couple of gigabit switches as well, which has really made a difference in Quickbooks’ performance alone.

For various reasons (not least of which is Quickbooks) all of our workstations in the shop but mine are running Windows which means file sharing must be done over SMB using Samba. I set everything up as usual with mostly defaults in my smb.conf. The shares were added as usershares with full write access by everyone for the sake of simplicity; that shouldn’t be a problem since iptables is blocking all traffic originating from the WAN port.

Since everyone was so excited about the gigabit stuff, my boss fired up a couple of file transfers and began listening to mp3 files over SMB just to see what kind of performance we could get. The funny thing was that every so often the music would just hang for no reason for 20-30 seconds at a time. During the hang his computer wouldn’t be able to browse the file shares at all and all file transfers would hang as well. I checked the system logs on his computer and dmesg on the server and neither mentioned anything related to the problem. Finally I found the Samba session logs (these are in /var/log/samba on Ubuntu) which are unique per client and saw lines in it like this:

"Unable to connect to CUPS server localhost:631 - Connection timed out"

Timed out? Yeah, that sounds like it. A bit more investigation led me to realize the timing of the pauses and the errors seemed to correlate. Searching for that error on the Internet was very difficult, though; it seemed like everybody with a Samba client log posted on the web had that error and nearly all the posts I found we’re asking about unrelated issues.

Disgusted with searching for a bit, I tried the obvious solution of commenting out all printer related lines in smb.conf. No luck…apparently printing support is enabled by default even if it isn’t configured. Back to searching.

Finally I found one guy who was complaining about a big lag during log-in on password protected shares and was getting that same CUPS error. He figured out how to get Samba to stop trying to connect to CUPS using these configuration lines. He admits that they might not all be needed, but it doesn’t hurt my feelings to smash a bug a little harder than necessary:

load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes

Around the same time I also found that many people were saying that this particular configuration tweak was supposed to give better throughput when serving Windows clients:

socket options = TCP_NODELAY

I shamelessly applied both fixes at once.  At the time I just really needed it to work (we’d already moved our Quickbooks company file) and didn’t care too much which one fixed it so long as it was fixed.  And it was.

No one really mentioned the socket options tweak affecting any sort of hangs or delays so I’m certain that disabling print services in Samba was the real kicker.  Apparently there are very few people who set up Samba on a server without CUPS, but we have no need for a print server like that.  We don’t have a big copier or anything; our only printer is a small black and white laser at the front desk which is 30 feet and several walls away from our server.

So we have another problem solved and I feel pretty good about it. It’s nice to find the solution to a tough problem and I thank the good Lord for helping me.