2 September 2019

PXEBooting Ubuntu

Now that the broken laptops are mostly in a usable state (well, about 67% anyway) and I have some spare time at the end of vacation, I want to get Kubernetes running on them. First, I need a base install. And I don’t want to have to do that manually. Setting up a PXE boot server so these things automatically install an OS should be fairly straightforward, right?

Why do I punish myself this way?

Despite having managed systems like this before, for literally years, this was more painful than I expected. I never learn, though–I do things differently every time (because exciting! new! things!) and then I have to figure out a whole bunch of new stuff even though I’m doing the same old thing. (Because I’m not, really.)

Overview

Broken laptops

In January 2018 I bought a bag (literally) of broken laptops from a local guy wanting to get out of the laptop-repair business. Of the six laptops, all of which had higher specs than my home desktop–I am not a fancy man–I eventually managed to get five of them to work in some capacity. The sixth, alas, had no screen, which meant it would completely refuse to boot and according to the intertubes there was no way around it, at least with that model.

The most powerful laptop with the nicest screen had broken and irreparable hinge assembly, so it was basically useful for laptop use. I re-soldered the loose power connection, which was incredibly tricky given the lack of wire to work with, and mounted it on a piece of aluminum tread plate I bought at a hardware store, and now it hangs on my wall. With a wireless keyboard/mouse combo, it’s my new workstation/object d’art.

That leaves me with four laptops for my Kubernetes cluster, which is completely adequate. One still has a thermal issue because it requires a thermal pad instead of paste for its graphics chip, and none of the local shops know what a thermal pad is. So I have three I can use.

Except I’m temporarily using one for temporary purposes I haven’t gotten it off of yet, so I have two. But that’s fine. I will have one master and one worker. That’s good enough to start, and with this automated process I should be able to rebuild pretty easily. So let’s go.

End goals

I am working towards a Kubespray installation. I have used Kubespray at work a number of times (and have submitted various patches to its configuration and documentation) but that has all been with OpenStack. Since I will be using bare metal, deploying Kubernetes should be quite a bit simpler. I have found with Kubespray that the bug discovery, patching and reconfiguration I have to do literally every time I deploy is with the Terraform stuff that handles OpenStack to build a node inventory. The Ansible part that takes over is generally fairly solid, since it’s much more tested.

So to that end I need to have these laptops built up with a base OS install with SSH access, hopefully passwordless. Each node must have its hostname defined but that’s about it.

Build up intranet management server

I have set up a separate intranet for the Kubernetes cluster. My old workstation, Blue, has two NICs and I’ve bridged the second, previously unused NIC for this purpose. Blue also has mirrored hard drives and so is a sturdy box for the more critical tasks. It runs LXC containers under LXD. LXC containers are more system-like than Docker containers, so it’s sort of like “lite” VMs. One of these containers, Cyan, will be the intranet management server and will tell the laptops who they are and what they do.

Host configuration

To start with, I created a bridge interface on Blue using the second NIC. Calling this k8sbr I now use the k8sbr device instead of eth1. I have the following in /etc/network/interfaces:

# k8s cluster
auto k8sbr
iface k8sbr inet static
    address 192.168.4.1
    network 192.168.4.0
    netmask 255.255.255.0
    gateway 192.168.4.1
    broadcast 192.168.4.255
    dns-nameservers 192.168.1.2 192.168.1.254
    bridge_ports eth1
    bridge_fd 0
    bridge_maxwait 0

Blue is an Ubuntu 16.04 box. Newer releases use Netplan so that configuration will be different.

LXC configuration

When I originally created Cyan, I built it up using lxc commands. I then set this whole project aside for months. Now I rebuilt Cyan using Terraform which has worked quite well although I am still working some things out.

Once Cyan was up, I added the bridged network for the Kubernetes cluster using:

lxc config device add cyan eth1 nic nictype=bridged parent=k8sbr

This means Cyan has the regular LXD network on eth0 and the Kubernetes (“k8s”) intranet on eth1. This allows me to get to Cyan from workstations without using a bastion host (aka jump host) because the default network for the LXC containers is bridged to the normal house intranet.

I also have the Ubuntu 18.04 ISO mounted on Blue to /mnt/ubuntu-18.04.2-iso and make it available on Cyann wih the following:

lxc config device add cyan u18iso disk source=/mnt/ubuntu-18.04.2-iso path=/mnt/ubuntu-18.04.2-iso

This will be necessary later.

DHCP/DNS server

Once Cyan is ready for direct management, I log in and set up DHCP and DNS services. In a previous iteration I have used the ISC DHCP server package, with which I am somewhat familiar. On the second build (remember how I mentioned I set this aside for some time) I restarted with Dnsmasq. This is what is used on OpenWRT which I ran for years on an early generation WRT-54G router and it’s simple, well-documented, and stable, and handles DHCP, DNS and TFTP, while providing the sample configuration for PXE booting.

The following is the result of grep -Ev '^#|^\s*$' /etc/dnsmasq.conf with some comments added back in:

# I don't want to serve DHCP for the house intranet.  This is badly serviced
# by the wireless router--I might replace that but not yet.
no-dhcp-interface=eth0

# Add domain to hostnames
expand-hosts
domain=k8s

# DHCP range
dhcp-range=192.168.4.16,192.168.4.128,12h

# read MAC addresses from /etc/ethers, which is just sort of tidy
read-ethers

# these provide necessary information for PXE booting
dhcp-option-force=208,f1:00:74:7e
dhcp-option-force=210,/tftp/
dhcp-option-force=211,30i
dhcp-boot=pxelinux.0

# enable and configure TFTP
enable-tftp
tftp-root=/tftp

# only serve files under /tftp which are owned by user running dnsmasq
tftp-secure

Dnsmasq is straightforward enough that this basically worked on the first try. It took me a little time to work through the well-commented, self-documented /etc/dnsmasq.conf provided by the APT package, but I basically got everything right. This is a pretty impressive piece of software.

Masquerading

Cyan will also need to provide an outgoing network route for the k8s intranet. Enter good old IP masquerading.

The following commands must be applied to support this:

iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
iptables -A FORWARD -i eth1 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -j ACCEPT

Also, IP forwarding must be turned on, which seems to be the default:

net.ipv4.ip_forward = 1

If not, this can be implemented through sysctl and made persistent by editing /etc/sysctl.conf or adding [to] an appropriate file in /etc/sysctl.d/.

PXE booting

PXE stands for Pre-eXecution Environment and are a set of instructions to give to a booting computer to tell it what to do before it loads the OS. When you choose “Network boot” from your computer’s BIOS boot manager, this is what it means.

PXE booting works by sending out a DHCP query and receiving as part of the normal response an extended portion that identifies a file to download and execute. This file is similar to the SYSLINUX boot execution typically used to boot Linux machines and is actually managed under that project. This sets up a minimal operating environment to begin OS installation.

TFTP boot and sources

TFTP (Trivial File Transfer Protocol) is handled by Dnsmasq, as configured above. TFTP is a basic file transfer service with no authentication and presumably simpler to implement than a basic web server and client (full disclosure: I’ve never implemented TFTP or HTTP protocols) and so suitable for the minimal operating footprint of a booting client.

The first file made available for PXE booting is pxelinux.0. This is the file pointed to by the DHCP extensions described above. Once this loads, it looks for a PXE configuration for the host by searching pxelinux.cfg for the first file matching the host, in the following hierarchy:

abcdef01-2345-6789-abcd-ef0123456789
01-AA-BB-CC-DD-00-11
C0A8025B
C0A8025
C0A802
C0A80
C0A8
C0A
C0
C
default

In this example, the first file will match a client with this UUID, the second will match a client with that MAC address (prepended with 01-), the files after that represent IPv4 addresses converted to hex with decreasing specificity, until the last matches any host that hasn’t matched on the previous.

I have only created default because any host using this DHCP server during a network boot should be an install candidate. When the machines boot off of their hard drives and are already running the OS when they first send out a DHCP request, they will ignore the PXE boot extensions. (Possibly they won’t request them. I could go look that up, but not right now.)

I also recursively copy the contents of /mnt/ubuntu-18.04.2-iso/install/netboot to the TFTP root. These files are either expected by the PXE boot binary in pxelinux.0 or referenced by the configuration, described in the next section.

PXE configuration

The PXE configuration contained in /tftp/pxelinux.cfg/default looks like this:

 DEFAULT linux
  SAY Now starting k8s node install from PXE...
 LABEL linux
  KERNEL ubuntu-installer/amd64/linux
  APPEND auto=true vga=788 initrd=ubuntu-installer/amd64/initrd.gz \
    preseed/url=tftp://192.168.4.3/preseed/k8s.preseed \
    preseed/interactive=false locale=en_CA.UTF-8 \
    console-setup/ask_detect=false console-setup/layoutcode=us \
    keyboard-configuration/layoutcode=us \
    mirror/http/mirror=ca.archive.ubuntu.com netcfg/get_hostname=

This tells the booting kernel to load the installer’s kernel and initial ramdisk, where to find the preseed configuration, and sets some options for the Ubuntu installer that are relevant before the preseed file is downloaded, such as locale and keyboard layout.

Preseeding

Preseeding was not straightforward.

Preseeding is Ubuntu’s (actually, Debian’s) automation of its installer. Every decision requiring interactive response in the Ubuntu installer can have an answer pre-selected in the preseeding configuration. This is similar to the KickStart system developed by Red Hat for that distribution, but less well documented, less user-friendly and less popular–according to my experience, opinion and impression, respectively, at least. I have experience with KickStart and found Ubuntu’s preseeding to be rather finicky.

Actually, scratch that: samples are well-commented, and I basically configured the preseeding configuration using a current example from the 18.04 documentation in the same manner as I configured Dnsmasq, but a lot more tweaking, online research and frustration was required.

Preseeding configuration is way out of scope, but for reference, here is the configuration that has worked at this point:

d-i debian-installer/language string en
d-i debian-installer/country string CA
d-i debian-installer/locale string en_US.UTF-8
d-i console-setup/ask_detect boolean false
d-i keyboard-configuration/xkb-keymap select us
d-i netcfg/choose_interface select auto
d-i netcfg/get_hostname string unassigned-hostname
d-i netcfg/get_domain string unassigned-domain
d-i netcfg/wireless_wep string
d-i mirror/country string ca
d-i mirror/http/hostname string ca.archive.ubuntu.com
d-i mirror/http/directory string /ubuntu
d-i mirror/http/proxy string
d-i passwd/root-login boolean true
d-i passwd/make-user boolean false
d-i passwd/root-password-crypted password <crypt>
d-i user-setup/encrypt-home boolean false
d-i clock-setup/utc boolean true
d-i time/zone string Canada/Pacific
d-i clock-setup/ntp boolean true
d-i partman-auto/disk string /dev/sda
d-i partman-auto/method string regular
d-i partman-lvm/device_remove_lvm boolean true
d-i partman-md/device_remove_md boolean true
d-i partman-auto/choose_recipe select atomic
d-i partman-md/confirm boolean true
d-i partman-partitioning/confirm_write_new_label boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
d-i live-installer/net-image string tftp://192.168.4.3/ubuntu/install/filesystem.squashfs
tasksel tasksel/first multiselect openssh-server
d-i pkgsel/upgrade select safe-upgrade
d-i pkgsel/language-packs multiselect en
d-i pkgsel/update-policy select unattended-upgrades
d-i pkgsel/updatedb boolean false
d-i grub-installer/only_debian boolean true
d-i grub-installer/with_other_os boolean true
d-i grub-installer/bootdev  string /dev/sda
d-i finish-install/reboot_in_progress note
d-i cdrom-detect/eject boolean false

This almost certainly involves unnecessary statements as I struggled to figure out how to skip the mirror selection. It’s worth grabbing a recent example configuration and working through the well-commented file, with the above as an added reference for a known-to-work configuration.

Client configuration

I have configured the BIOS on the client laptops to boot from the hard drive by default. In this way, if I ever want to rebuild the OS on one of these machines, I press F12 at the BIOS boot screen to engage the network boot, but otherwise, the laptop will boot as previously configured. It is possible to control this via the PXE boot process itself by configuring the BIOS to boot from the network and updating the PXE configuration to instruct the machine to boot from its disk or re-install as desired, but for this situation this is simpler to implement and manage going forward.

In order to avoid confusion about what NIC to boot with I have removed the WiFi card from each laptop. This is a very simple operation on most laptops for some reason–changing out the hard drive is often harder. I could specify the NIC in the boot parameters but since I don’t need WiFi and I don’t want to worry about configuring (disabling) unnecessary networks (and attack vectors) I have simply removed the cards. I’ve set them aside just in case, and will undoubtedly come across them years from now and remember this project fondly, long after the laptops have been retired and recycled.