[bitfolk] Thoughts on pvgrub, proposal for switching

Top Page
Author: Andy Smith
Date:  
To: users
Subject: [bitfolk] Thoughts on pvgrub, proposal for switching

Reply to this message
gpg: Signature made Mon Feb 13 00:54:42 2017 UTC
gpg: using DSA key 2099B64CBF15490B
gpg: Good signature from "Andy Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andrew James Smith <andy@strugglers.net>" [unknown]
gpg: aka "Andy Smith (UKUUG) <andy.smith@ukuug.org>" [unknown]
gpg: aka "Andy Smith (BitFolk Ltd.) <andy@bitfolk.com>" [unknown]
gpg: aka "Andy Smith (Linux User Groups UK) <andy@lug.org.uk>" [unknown]
gpg: aka "Andy Smith (Cernio Technology Cooperative) <andy.smith@cernio.com>" [unknown]
Hello,

As you may be aware, it's long been on the BitFolk TODO list to
implement a better way (than pygrub) to boot your VPSes.

Unfortunately I have to explain some rather lengthy history so you
can know how things arrived at where they are now, and what the
options are for the future.

If you want to skip all that then please do at least skip down to
the heading "== Now ==" so you are aware of what I am proposing to
do. Without reading the history though it may not make sense.

== Ancient times ==

Kernels and initrds had to live outside of the guest in dom0. Users
couldn't update their own kernel but had to find some means to sync
their kernel and initrd to some place in the dom0 (the privileged
gust that controls everything else). People came up with various
hacks to allow users to do that, the disadvantage being that quite a
lot of that had to run in dom0.

== Still rather a long time ago ==

Someone invented pygrub, a Python implementation of something that
looked a but like GNU GRUB 0.x. It was capable of looking inside an
image file, block device or partitions on same, to try to find a
file called menu.lst.

If it found that file it would try to parse it like GRUB would,
display a menu emulating GRUB's menu, and pass the chosen kernel,
initrd and boot arguments to Xen.

Advantages:

- Users could install their own kernel packages, since these would
maintain a menu.lst file in their guest, and pygrub would then
pick up the changes when they next booted.

- Users could control their own kernel command line since this also
was in the menu.lst file.

Disadvantages:

- It still all runs in dom0. pygrub is opening an actual filesystem
that is supplied by the user, in the context of a user in dom0,
and that is not a particularly safe thing to be doing.

- It isn't actual GRUB, it's just trying to emulate it. That means
it doesn't behave quite the same and doesn't support every single
thing that is valid in a real GRUB menu.lst file.

- As a further consequence of the above, pygrub is relying on
userspace filesystem access libraries that aren't part of actual
GRUB. This sometimes lead to surprising discrepancies between what
GRUB was expected to support and what pygrub would support, such
as XFS filesystems, XZ-compressed initrds, etc.

- GRUB 0.x is now referred to as GRUB-legacy. GRUB moved on to 1.x,
it got a lot more complicated, and it now puts its configuration
in /boot/grub/grub.cfg. Many Linux distributions gave up on
GRUB-legacy and only support grub.cfg now, so users of such are
left to maintain their own menu.lst file.

On Debian and Ubuntu you can still install the package grub-legacy
and it will maintain menu.lst, but it's getting increasingly
persistent about wanting you to move to grub.cfg.

This is the boot method that BitFolk has almost always been using.
Up until a couple of months ago there was still one customer who for
various reasons had to have a kernel hard-coded in their dom0 config
file, but now there's none. Everyone's on pygrub.

== Recent-ish history ==

Around 2010 in light of the above disadvantages, a member of the Xen
project ported GRUB 0.x to boot as a paravirtual guest under Xen.
That's the same sort of thing that your VPSes are. So you get a VM
started, and the VM runs actual GRUB. Once GRUB knows what it wants
to boot, it chain loads to that.

This was known as "pvgrub", an awful name really since it differs
from pygrub in only the descender of one letter. Should have passed
that one by the marketing department first.

Advantages:

- Runs only inside a VM.

- Still allows user control.

Disadvantages:

- Part of the Xen code tree, not really packaged anywhere.

- Still only GRUB-legacy support.

Due to it still only supporting GRUB-legacy, which wasn't any
different from what pygrub supports, BitFolk did not pursue it.

We ran into a couple of sticky situations such as when XZ-compressed
kernels became popular in Debian and pygrub didn't support that, but
we worked around it¹.

Still it became more pressing to support a better boot system
because it would only be a matter of time before more distributions
cease to support menu.lst or do something else that pygrub doesn't
support.

== Just 3¼ years ago! ==

A GNU GRUB committer added PV-booting support to upstream GRUB 2:

    https://lists.xen.org/archives/html/xen-devel/2013-11/msg01216.html


What this means is that using only the upstream GRUB 2 binaries you
can generate a GRUB-as-kernel image that can be booted as Xen PV
virtual machine, which can then do everything that GRUB 2 can
normally do. Namely look inside your block devices for GRUB configs
and boot your own kernels.

It took a couple of years for this to filter down to being supported
in Debian stable and to iron out some bugs, so, fast forward to…

== Now ==

My recent experimentations with GRUB indicate that it may now be a
good time to switch to this method of booting instead of pygrub.

Here's how I am thinking of doing it.

- BitFolk's 64-bit or 32-bit GRUB2 image is booted as your kernel,
in your VM.

- It checks for existence of:

    - /boot/xen/pvboot-x86_64.elf
    - /xen/pvboot-x86_64.elf
    - /boot/xen/pvboot-i386.elf
    - /xen/pvboot-i386.elf


on any of your block devices/partitions.

If any of those are found then a menu entry for chainloading to
your own bootloader at that path is created.

- It checks for existence of:

    - /boot/grub/grub.cfg
    - /grub/grub.cfg


on any of your block devices/partitions.

If either of those are found then a menu entry for reading your
GRUB 2 config from that path is created.

- It checks for existence of:

    - /boot/grub/menu.lst
    - /grub/menu.lst


on any of your block devices/partitions.

If either of those are found then a menu entry for reading a
GRUB-legacy config from that path is created.

- The available menu entries are displayed for 5 seconds. If no
selection is made then a boot attempt is made for each menu
entry, starting at the top and going in order.

- Selection of a menu entry (or allowing it to time out and be
selected by default) will result in some text being displayed for
2 seconds explaining what is going to happen, e.g.:

    Loading your GRUB config at (xen/xvda,msdos1)/boot/grub/grub.cfg...


would indicate that a GRUB2 config was found on the first MSDOS
partition of the Xen disk called xvda.

The message and 2 second wait could be cancelled by pressing
Escape though obviously you'd have to be looking at the console to
do that.

As you can see, the order of boot method is:

1. Chainload to your own bootloader.
2. Use your GRUB2 config.
3. Use your GRUB-legacy config.

Although there is some limited interactivity in this proposal (you
get to select a menu item if you're looking at your console),
obviously servers do need to boot automatically, so assuming no
intervention then after 5 seconds it's going to pick the first
available from the above and then after 2 seconds more it will do
it.

Unfortunately the configuration for how GRUB will behave has to be
baked into the binary image itself so I can't think of an easy way
to allow you to select your own timeouts or remove them altogether,
not on a per-user basis anyway.

It does mean a 7 second delay to booting, or more if for some reason
one of the earlier steps fails. A step can fail if the image or
config is incorrect. BitFolk's GRUB will only be checking for the
existence of a file path, not that what is there is valid. So for
example if you have GRUB2 installed so there is a
/boot/grub/grub.cfg file present but then you also touch the file
/boot/xen/pvboot-x86_64.elf, it will try to chainload to that first.
If that fails then it will fall back to parsing /boot/grub/grub.cfg,
which will add another 2 second pause.

Perhaps it would be acceptable to reduce the initial 5 second
countdown to 2 seconds? Pressing any key aborts the countdown and
gives you as long as you like to choose a menu entry, as usual with
GRUB.

The chainloading option hasn't been mentioned yet. In practice I
don't expect many people will be interested in it, but it does
provide for running your own bootloader.

At the moment although you may have GRUB-legacy installed in order
to maintain a menu.lst file, your bootloader is never actually
executed, only its config file is parsed, by pygrub. I am proposing
to move to using GRUB2 to parse your grub.cfg, but still this is
BitFolk's GRUB2 binary that is being run, not yours.

If for some reason you did need to run your own bootloader, perhaps
because you need to boot off of something that is not supported by
the GRUB binary that BitFolk is using, then you could install your
own bootloader that supports the PV boot protocol.

As far as I am aware only Debian has distribution support for this.
As of Debian jessie, installing grub-xen in your guest installs a
GRUB-as-PV-kernel image in /boot/xen/pvboot-$ARCH.elf. Using this
option, the last thing that runs before your VPS's kernel would be
your VPS's bootloader.

As I say, I expect for most people that will be unneeded
complication, as you don't really care whose bootloader is being
run, just that it boots. But the option would be there should you
need it.

I am not intending for the switching to pvgrub from pygrub to be
optional or reversible. I have put some effort into trying to ensure
that pvgrub will boot from a guest's menu.lst and all customers
currently have a menu.lst, so I am hoping I do not need to build in
a setting for this. Naturally if we switch and then someone finds
out it doesn't work then we can switch them back to pygrub until the
problem is worked out.

At the moment I have tested the following configurations:

- 64-bit Debian jessie guest, single root filesystem on xvda1:
    - Chainload to GRUB, parse /boot/grub/grub.cfg
    - Parse /boot/grub/grub.cfg
    - Parse /boot/grub/menu.lst


- i686 Debian jessie guest, single root filesystem on xvda1:
    - Parse /boot/grub/grub.cfg
    - Parse /boot/grub/menu.lst


- i686 Debian jessie guest, single root filesystem directly on xvda:
    - Parse /boot/grub/menu.lst


If you have a planned reboot coming up, it would be useful to me if
you would let me do the reboot for you. I would take the opportunity
to switch you to pvgrub booting to check it works in your
configuration. If it didn't then I would just revert you back to
pygrub and leave you be.

So, that is what I am thinking of doing and how I am proposing it
will work. If you have any comments about this, especially if you
think it will not work for you, or you have suggestions about how it
could be done better, I really want to hear about it.

Thanks for reading this immense email!

Cheers,
Andy

¹ You may recall, we modified pygrub to detect an XZ-compressed
kernel and unpack it using the actual xz utility, until a newer
version of pygrub could be installed which supported those
kernels.

--
https://bitfolk.com/ -- No-nonsense VPS hosting

> I'd be interested to hear any (even two word) reviews of their sofas…

Provides seating.
— Andy Davidson