Showing posts with label Debian. Show all posts
Showing posts with label Debian. Show all posts

Sunday, November 17, 2024

notes on installing Kubernetes on Debian

Not instructions per se, but a guide to the instructions plus thoughts on choices that arise in installing Kubernetes onto a Debian machine.

I began writing this posting as specific instructions on how to install a specific version of Kubernetes onto a specific version of Debian. In the process, I realized that there isn't a one-size-fits-all approach to installing Kubernetes. So these are the notes that remain from what I learned that are still generally applicable. Most of their value is that they linearize the thought process of installing Kubernetes from scratch.

The Kubernetes website has good, thorough instructions on how to install Kubernetes on Debian, and its instructions on “Installing kubeadm” include pointers to installing CRI-O as well; those pages should be considered the authority on the subject. (They will be referenced again, below.)

If you're not familiar with the Kubernetes cluster architecture, you should consult that page as needed.

not the quick solution

If you're looking to quickly get Kubernetes running on a small scale, there are faster ways. This post is aimed at the production-grade approach, the kind that a business (e.g., one of my employers or clients, and maybe one of yours) would use. So it's aimed at developing an understanding of how business-grade systems work. If you want the quick solution, here are some that were recommended to me: k3s, RKE2, microk8s, Charmed Kuber­netes, OpenShift.

caveats

These are notes that I made while going through the process for the first time. I haven't tested them a second time, except inasmuch as I had to repeat parts for the second node. So there may be things that I forgot to document.

These notes are based on Debian 12, which mostly means that systemd is used.

nodes

The first subject to consider is what are the nodes on which you'll install the Kubernetes cluster. Kubernetes separates its design into a control plane and—what might be called a “data plane”—a set of worker nodes.

  • The control plane runs on nodes, and may run all on one node, or replicated across multiple nodes for high availability. There must be at least one control plane node in order for a cluster to exist.
  • There must be at least one worker node in order for the cluster to service workloads.
  • The control plane and workload processes can be mixed together on a node. This requires removing taints on such nodes that would normally prohibit workloads from running there. (See InitConfiguration.nodeRegistration.taints for the setting.) The case of having a single-node cluster with both control plane and workload combined may be useful in development or testing.
  • Extra control plane or worker nodes can be added for redundancy. A common choice in production is to have an odd number (at least three) of control plane nodes; this can help the cluster decide which part is authoritative if the network is partitioned. Distributing the nodes across hardware, racks, power supplies, or data centers can improve the reliability.

There are several ways to place the workload with respect to the control plane. I chose to separate the control plane into nodes that are separate from the worker nodes, in order to protect the control plane services from any workload that may manage to break out of its restraints (Linux namespace and cgroup). I’m starting with one node of each type.

The Kubernetes tools (kubdadm in particular) allow one to reconfigure a cluster so that the initial configuration doesn't matter much.

Xen domUs

If your machines are Xen domUs, you'll want to set vcpus equal to maxcpus in the domU config, because vcpus is what determines the number of vCPUs that appear in /proc/cpuinfo, and thus determines the number of vCPUs that Kubernetes believes to be present. If you over-allocate the vCPUs among the Xen domains, perhaps in order to ensure that they're not underutilized, you can use scheduler weights to affect the priority that each domain has to the CPUs.

For example, with a 24-core, 32-thread Intel CPU, 32 Xen vCPUs would be available and could be allocated thus:

  • dom0: dom0_max_vcpus=1-4
  • control-plane domU: vcpus = 2, maxcpus = 2, cpu_weight = 192
  • worker domU: vcpus = 30, maxcpus = 30, cpu_weight = 128

deployment method

On dedicated hardware, there are several ways to deploy the control plane:

  • A “traditional” deployment uses systemd to run (and keep running) the control plane services. This is a manual configuration. Using kubeadm is preferred because it configures the cluster in a more consistent way; see the next point.
  • The “static pods” deployment, used by kubeadm, lets kubelet manage the life of the control plane services as static pods.
  • The “self-hosted” deployment, in which the cluster itself manages the running of the control plane services. This seems a bit fragile in that a problem in a control plane could cause the whole control plane to fail.

So the way that I prefer for my on-premises hardware is to use kubeadm.

Debian repository limitations

Note that the Debian repository has kubernetes-client, containerd, crun and runc packages, but we're not using these since, as is usual with stable Debian releases, the packages are out-of-date by a year on average, and updates to this code are frequent and often security-related. Also there are restrictions on which version of kubectl can be used with kubelet: usually they should differ by no more than one minor version. Further, the repository doesn't contain the other Kubernetes packages. So simply installing from Debian isn't an option.

package installation

These deb packages need to be installed on every node. The cri-o package's services run containers on the node, and the kubelet service connects the node to the control plane. Some of the control plane is implemented by the running of containers (that perform actions specific to the control plane needs) so all the infrastructure is needed on every node. The kubeadm package is only needed for managing the node's membership in the cluster. The kubectl package may not be strictly necessary on every node, but it helps to have it if proxying isn't working.

The process involves setting up apt to fetch from the Docker and Kubernetes Debian-style repos. These official Debian packages will be needed to do that:

apt install curl gpg

CRI-O

Kubernetes first needs a container runtime that supports CRI. CRI-O is the more modern choice of the popular implementations. You'll first need to check some Linux prerequisites are in place, according to the instructions on the “Container Runtimes” page. The CRI-O project has installation instructions, which include Debian-specific instructions.

Following those instructions, and before running systemctl start crio, it's necessary to remove or move aside CRI-O's *-crio-*.conflist CNI configuration files from /etc/cni/net.d/. We will use Calico's CNI configuration instead, which the Calico operator will install. Then continue with running systemctl start crio. Stop after that; the rest of the instructions are addressed in the next step.

Kubernetes will automatically detect that CRI-O is running, from the presence of its UNIX socket. CRI-O needs an OCI-compatible runtime to which it can delegate the actual running of containers; it comes with crun and is configured to use it by default.

Kubernetes

Continue on by following “Installing kubeadm, kubelet and kubectl”. (You will have done most of this in the previous step. The apt-mark hold is still needed.)

Once all the packages are installed, you're ready to run kubeadm init. But first you'll need to understand how to configure it.

dual-stack IPv4/IPv6

In order to avoid DNAT to any Services published from the Kubernetes cluster and SNAT from any Pods, one would like the Pods and Services to have publicly-routable IPv6 addresses. (IPv4 addresses on the Pods would probably be RFC1918 internal addresses, since IPv4 addresses aren't as abundant, and so would need to be NATed in any case.) The Pods and Services should have both IPv4 and IPv6 addresses, so that they can interact easily with either stack. The IPv6 addresses that we're using here are only for use in documentation. They're presumably global-scope unicast addresses; we could instead use site-local addresses, with the caveat that we'd need to NAT external communication.

This is a choice that needs to be made when the cluster is created; it can't be changed later, nor can the chosen CIDRs (at least not via kubeadm). One subnet of each type is needed for Pods and Services. This of course assumes that you have an IPv6 subnet assigned to you from which you can allocate. When sizing the subnets, mind that every Pod and Service needs an address.

For the examples, we'll use 10.0.0.0/16 and 2001:db8:0:0::/64 for the Pod subnet, and 10.1.0.0/16 and 2001:db8:0:1::/108 for the Service subnet. (The services IPv6 subnet can be no larger than /108. This isn't documented, but the scripts check for it.) For your real-life cluster, I recommend that you choose a random 10. subnet, instead of these above or the 10.96 default, to avoid address collisions when you have access to multiple clusters or private networks in general. It's probably best to choose two consecutive subnets as above, to make firewalling rules shorter; that is, 10.0.0.0/15 and 2001:db8:0:0::/63 cover both of the above pairs of subnets.

Forwarding of IPv4 packets must be enabled for most CNI implementations, but most will do this automatically if needed. For dual-stack, you'll also need to enable IPv6 forwarding; it's not clear whether CNI implementations will also do this automatically. In any case, these settings are required for the kubeadm pre-flight checks to pass. Run this on every node:

sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv6.conf.all.forwarding=1

and create /etc/sysctl.d/k8s.conf:

net.ipv4.ip_forward=1
net.ipv6.conf.all.forwarding=1

pod network plugin (Calico)

Kubernetes requires a pod network plugin to manage the networking between containers. Container networking is designed to scale to an extremely large number of containers, while still presenting a simple OSI layer 2 (Ethernet), or at least a layer 3 (IP), view of the Kubernetes container network. Optimal use of the network avoids using an overlay network (e.g., VXLAN or IPIP) unless necessary (e.g., between nodes). Here we can avoid an overlay because we have full control over the on-prem network. Specifically, we can use an arbitrary subnet for the pods and services, but still route packets across the cluster, or into or out of the cluster, because we can adjust the routing tables as needed.

Calico also supports NetworkPolicys, a desirable feature.

The Pod's DNS Policy is typically the default, ClusterFirst (not Default!); setting hostNetwork: true in the PodSpec would be unusual.

full-mesh BGP routing

Since we have a small (two-node) cluster here, we're going to use a BGP full-mesh routing infrastructure, as provided by Calico. This is the default mode of the Calico configuration. As long as the nodes are all on the same ISO Level 2 network, their BGP servers can find each other and forward packets, without resorting to IPIP encapsulation.

kubeadm init options

The kubeadm init command accepts a (YAML) configuration file with an InitConfiguration and/or a ClusterConfiguration. Let's call it kubeadm-init.yaml:

---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration

clusterName: kubernetes
controlPlaneEndpoint: control.kubernetes.internal

networking:
  dnsDomain: kubernetes.internal
  podSubnet:     10.0.0.0/16,2001:db8:0:0::/64
  serviceSubnet: 10.1.0.0/16,2001:db8:0:1::/108

apiServer:
  extraArgs:
  - name: service-cluster-ip-range
    value: 10.1.0.0/16,2001:db8:0:1::/108  # same as networking.serviceSubnet

controllerManager:
  extraArgs:
  - name: cluster-cidr
    value: 10.0.0.0/16,2001:db8:0:0::/64  # same as networking.podSubnet

  - name: service-cluster-ip-range
    value: 10.1.0.0/16,2001:db8:0:1::/108  # same as networking.serviceSubnet

  - name: allocate-node-cidrs
    value: "false"

---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration

nodeRegistration:
  kubeletExtraArgs:
  # Required for dual-stack; defaults only to IPv4 default address
  # Has no KubeletConfiguration parameter, else we'd set it there.
  - name: node-ip
    value: 192.168.4.2,2001:db8:0:4::2

---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration

cgroupDriver: systemd
clusterDomain: kubernetes.internal

# Enables the node to work with and use swap.
# See https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory
failSwapOn: false
memorySwap:
  swapBehavior: LimitedSwap

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration

mode: nftables
clusterCIDR: 10.0.0.0/16,2001:db8:0:0::/64  # same as ClusterConfiguration.networking.podSubnet
  • Although this looks like a YAML file, it doesn't accept all of the YAML 1.2 syntax. If you try to place comments or directives before the first "---", kubeadm will fail to parse it.
  • Note that the convention that I use here is that the DNS domain in which all of this cluster's parts exist is formed from the name of the cluster (here, kubernetes). So the domain is kubernetes.internal, the load-balanced control endpoint is control.kubernetes.internal, the Services subdomain is service.kubernetes.internal, etc.
  • We're going to pretend here that the machine on which we're installing Kubernetes has IP addresses 192.168.4.2 and 2001:db8:0:4::2. In order for the kubelet service on the (worker) node to listen on both the IPv4 and IPv6 addresses of its node, we need to configure InitConfiguration.nodeRegistration.kubeletExtraArgs as above. Else it would only listen on one of those.
  • We're using here the subnets that we chose above.
  • Since we're preparing here for later expansion of the control plane to other nodes, we'll need to specify the ClusterConfiguration.​controlPlaneEndpoint. (This represents the --control-plane-endpoint command-line value.) Easiest is to use DNS round-robin A records, but since I plan to run DNS on the cluster, I'll just add an /etc/hosts entry for now.
  • There's no need for Kubernetes to allocate CIDRs to the nodes, because Calico's IPAM plugin will do this instead.
  • We'll also need a configuration file for kubelet, which we'll call kubelet.yaml and put in the same directory:
  • In the KubeletConfiguration, we're taking advantage of any swap that may be available to the machine.

kubeadm reset

If the kubeadm init or kubeadm join fails to complete, kubeadm reset can be used to revert most of its effect. The command output spells out what other state needs clean-up.

initial control plane node

The first control plane node is where the Kubernetes cluster is created/initialized. Any additional control plane node will be joined to the cluster, and so would be handled differently.

It's assumed that Debian is already installed on the node, and that there is no firewalling set up on the node. As an option, Kubernetes provides its own mechanism for adding firewalling that's compatible with its services, which we'll add below.

build the cluster

Then we run:

kubeadm init --config=kubeadm-init.yaml

You should use script to capture the command's output when you run it, since the final output includes a token and certificate needed for future steps. If you perform the configuration setup that's described in the command's final output, you can then, as a non-root user, run kubectl cluster-info to verify that the cluster's running. You might also want to enable Bash autocompletion for kubectl.

deploy Calico

Once the cluster is running, and before adding applications, we need to install the network plugin. This has two parts: install the Tigera Kubernetes Operator, and deploy Calico (using the operator). We'll follow the Calico instructions for a self-managed, on-premises cluster. (The same thing could be done with Helm, but that presumes having Helm already installed.) If you've set up the non-root user's ~/.kube/config according to the instructions output by kubeadm init, you can run the instructions as that user.

You will need to customize the custom-resources.yaml file's Installation.spec.calicoNetwork.ipPools to match the IPv4 and IPv6 pools chosen above. Calico has instructions for configuring for dual-stack. For example, given the above subnets, the Installation configuration should be:

apiVersion: operator.tigera.io/v1
kind: Installation

metadata:
  name: default
spec:
  calicoNetwork:
    linuxDataplane: Nftables
    ipPools:
    - name: default-ipv4-pool
      cidr: 10.0.0.0/16  # from ClusterConfiguration.networking.podSubnet
      encapsulation: None
      natOutgoing: Enabled

    - name: default-ipv6-pool
      cidr: 2001:db8:0:0::/64  # from ClusterConfiguration.networking.podSubnet
      encapsulation: None
      natOutgoing: Disabled
---
apiVersion: operator.tigera.io/v1
kind: APIServer

metadata:
  name: default
spec: {}
  • We're assuming that the IPv6 addresses are global scope, so SNAT is disabled; if we chose site-local addresses instead, then we would enable SNAT.
  • We're assuming that the Kubernetes nodes—on which BIRD, Calico's BGP server, run—are all connected together on the same ISO Level 2 network, so that the BIRD instances can find each other automatically.
  • We need to tell Calico to use nftables, consistent with our choice for Kubernetes.

If you have a slow Internet connection, it may take some minutes for the Calico pods to come up, because images need to be downloaded. (This is, of course, true for any new images that you'll be starting.)

You can optionally validate dual-stack networking.

worker node(s)

Before you can add deploy workloads to the cluster, you'll need to add a worker node. We assume that a machine has been set up according to the “package installation” and “dual-stack” sections above. The end of the output from the kubeadm init command above contains the kubeadm join command that you should run on the machine that will be the new worker node, e.g.:

kubeadm join control.kubernetes.internal:6443 --token asdf… \
        --discovery-token-ca-cert-hash sha256:deadbeef…

The preferred way to use these values is in a JoinConfiguration file (e.g., called kubeadm-join.yaml):

apiVersion: kubeadm.k8s.io/v1beta4
kind: JoinConfiguration

discovery:
  bootstrapToken:
    apiServerEndpoint: control.kubernetes.internal:6443
    token: "asdf…"
    caCertHashes:
    - "sha256:deadbeef…"

nodeRegistration:
  kubeletExtraArgs:
  - name: "node-ip"
    value: "192.168.4.3,2001:db8:0:4::3"
  • We're going to pretend here that the machine on which we're installing Kubernetes has IP addresses 192.168.4.3 and 2001:db8:0:4::3, analogous to what we did in the InitConfiguration
  • The token is only good for 24 hours (by default). You can create a new one with kubeadm token create on the control node.

Then you can run kubeadm join --config=kubeadm-join.yaml control.kubernetes.internal:6443. (Note that the server name, control.kubernetes.internal, needs to be resolvable on the worker node. So you might need to add it to /etc/hosts, DNS, etc.)

machine shutdown

So what happens when the machine on which we're running the whole cluster shuts down? Debian uses systemd, so the cri-o and kubelet services will be shut down, and with them all of the workload and control plane containers that run.

upgrading

Kubernetes releases a new minor version about four times per year, with a few patch versions in between. CRI-O follow the Kubernetes minor release cycle. See “Upgrading kubeadm clusters” for the full details. “Skipping MINOR versions when upgrading is unsupported.”

You may wonder why we need to place a hold on the Kubernetes packages. Couldn't we remove the hold and perform the upgrades automatically, using unattended-upgrades? The answer is no, because there are sometimes manual steps required even after the software packages have been upgraded. Specifically, a kubeadm upgrade apply will be needed, and sometimes configuration API may change.

next steps

Monday, September 2, 2024

how to install Xen on Debian 12 “bookworm”

A brief guide to installing Xen with a single "PVH" domU on Debian using contemporary hardware.

motivation

There are a few “official” instructions for installing Xen, and many unofficial ones (like this). The “Xen Project Beginners Guide” is a useful introduction to Xen, and even uses Debian as its host OS. But for someone who is already familiar with Xen and simply wants a short set of current instructions, it's too verbose. It even includes basics on installing the Debian OS prior to installing Xen, which is something that I take as a given here. Further, it doesn't address the optimized “PVH” configuration for domUs, which is available for modern hardware. Much of the Xen documentation seems to have last been touched in 2018, when AWS was still using Xen.

The Debian wiki also has a series of pages on “Xen on Debian”, but the writing appears unfocused, speculating about all sorts of alternative approaches one could take. Some useful information can be gleaned from it, but it doesn't have the brevity that I'm looking for here.

The Xen wiki's “Xen Common Problems” page is a good source for various factoids, but not a set of cookbook instructions. Various unofficial instructions can be found on the Web, but I found them to be incomplete for my purposes.

preparation

Xen 4.17 is the current version in Debian 12.6.0; Xen 4.19 was recently released, so the Debian version is probably sufficiently recent for most needs.

VT-d virtualization is required for Intel processors. (I don't address AMD or other chip virtualization standards, but the corresponding technology is required in that case.) In /boot/config-*, one can confirm that CONFIG_INTEL_IOMMU=y for the kernel, and “dmesg | grep DMAR” (in the non-Xen kernel[1]) returns lines like:

ACPI: DMAR 0x00000000… 000050 (v02 INTEL  EDK2     00000002     01000013)
ACPI: Reserving DMAR table memory at [mem 0x…-0x…]
DMAR: Host address width 39
DMAR: DRHD base: 0x000000fed91000 flags: 0x1
…
DMAR-IR: Enabled IRQ remapping in xapic mode
…
DMAR: Intel(R) Virtualization Technology for Directed I/O

so VT-d seems to be working. If VT-d is not available, you may need to enable it in your BIOS settings.

The Xen wiki's “Linux PVH” page has information on PVH mode on Linux, but it hasn’t been updated since 2018 and references Xen 4.11 at the latest. All of the kernel config settings mentioned there are present in the installed kernel, except that CONFIG_PARAVIRT_GUEST doesn’t exist. I assume it was removed.

Xen installation on Debian

start Xen

apt install xen-system-amd64 xen-tools

installs the Xen Debian packages and the tools for creating the domUs. (If you see references that say to install the xen-hypervisor virtual package, know that xen-system-amd64 depends on the latest xen-hypervisor-*-amd64. You may need a different architecture than -amd64.) Rebooting will go into Xen: it will have been added to GRUB as the default kernel.

configure dom0

configure GRUB’s Xen configuration

The Xen command line options are described in /etc/default/grub.d/xen.cfg. The complete documentation is at https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html.

In /etc/default/grub.d/xen.cfg, set:

XEN_OVERRIDE_GRUB_DEFAULT=1

so that GRUB doesn’t whine about Xen being prioritized.

  • By default, dom0 is given access to all vCPUs (we'll assume 32 on this hypothetical hardware) and all memory (64GB, here). It doesn’t need that much. Furthermore, as domUs are started, the dom0 memory is ballooned down in size, so that the dom0 Linux kernel no longer has as much memory as it thought it had at start-up. So the first step is to scale this back: dom0_mem=4G,max:4G. (I've been told that even 1G should suffice for a server.) The fixed size will avoid ballooning at all. Likewise, for the vCPUs: dom0_max_vcpus=1-4.
  • Since we’re not running unsafe software in dom0, we can set xpti off there.
  • Since modern processors support deeper sleep states, we may benefit by enabling Xen to use those states with the cpuidle option.

So in /etc/default/grub.d/xen.cfg, set:

GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=4G,max:4G dom0_max_vcpus=1-4 xpti=dom0=false,domu=true cpuidle"

(There’s no need to change the autoballoon setting in /etc/xen/xl.conf, since "auto" does what’s needed.)

Then run update-grub and reboot.

configure Xen networking

create a xenbr0 bridge

Xen domUs require a bridge in order to attach to the dom0’s network interface. (There are other options, but bridging is the most common.) Following the Xen network configuration wiki page, in /etc/network/interfaces, change:

allow-hotplug eno1
iface eno1 inet static
…

to:

iface eno1 inet manual

auto xenbr0
iface xenbr0 inet static
	bridge_ports eno1
	bridge_waitport 0
	bridge_fd 0
	… # the rest of the original eno1 configuration

(Obviously this is assuming that your primary interface is named eno1, which is typical for an onboard Ethernet NIC.) Run ifdown eno1 before saving this change, and ifup xenbr0 after.

xenbr0 is the default bridge name for the XL networking tools, which is what we’ll use.

about Netfilter bridge filtering

You may see a message in the kernel logs:

bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.

This is of no concern. It gets printed when the bridge module gets loaded, in the process of bringing up xenbr0 for the first time since boot.

It used to be that bridged packets would be sent through Netfilter. This was considered to be confusing, since it required setting up a Netfilter FORWARD rule to accept those packets—something that most people expected automatically with a bridge. The solution was to remove that behavior to a new module (br_netfilter). This message is a remnant reminder of the change, for those who were depending on it. See the kernel patch; it has been this way since 3.18.

configure the dom0 restart behavior

When dom0 is shut down, by default Xen will save the images of all running domUs, in order to restore them on reboot. This takes some time, and disk space for the images, and most likely you'll want to shut down the domUs instead. To configure that, in /etc/default/xendomains, set:

XENDOMAINS_RESTORE=false

and comment out XENDOMAINS_SAVE.

create a PVH domU

create the domU configuration

There isn’t any obvious documentation on using xen-create-image, the usual tool, for creating a specifically PVH Linux guest. So this is your summary.

Edit /etc/xen-tools/xen-tools.conf, which is used to set variables for the Perl scripts that xen-tools uses, to set:

lvm = "server-disk"
pygrub = 1

This assumes that you're using LVM to provide a root file system to the domUs, and the VG for the domUs is named as shown. Then run this (I recommend using script, though a log is created by default):

xen-create-image --hostname=my-domu.internal --verbose \
  --ip=192.168.1.2 --gateway=192.168.1.1 --netmask=255.255.255.0 \
  --memory=60G --maxmem=60G --size=100G --swap=60G

or whatever settings you choose; see the xen-create-image man page for explanations.

The --memory setting can be tuned later to the maximum available memory, if you're not adding any other domU. It's only used to set the memory setting in the /etc/xen/*.cfg file, and can be edited there. Likewise for maxmem. Setting them equal provides the benefit that no memory ballooning will be performed by Xen on the domU, so there will be no surprises while the domU is running and unable to obtain more memory. The available memory for domUs can be found in the free_memory field in the xl info output in the dom0; it may not be precisely what you can use, since there may be some unaccounted-for overhead in starting the domU.

The --size and --swap settings for the root and swap LV partitions can be expanded later if needed, using the LVM tools in the usual way.

Adjust the /etc/xen/*.cfg file by adding:

type     = "pvh"
vcpus    = 4
maxvcpus = 31
xen_platform_pci = 1
cpu_weight = 128

The maxvcpus setting here assumes that 32 vCPUs are available; it leaves one for the exclusive use of the dom0. Four vCPUs should be enough to start the domU quickly. The cpu_weight deprioritizes the domU's CPU usage vs. the dom0's. xl sched-credit2 shows the current weights.

To have the domU automatically start when dom0 starts:

mkdir -p /etc/xen/auto
cd /etc/xen/auto
ln -s ../my-domu.internal.cfg .

(You may see instructions that tell you to symlink the whole /etc/xen/auto directory to /etc/xen. The downside is that will report warnings as Xen tries to parse the non-configuration example files in /etc/xen.)

fix the domU networking configuration

Due to Debian bug #1060394, eth0 is used in /etc/network/interfaces, not enX0. You can mount the domU's LV disk device temporarily in order to correct this.

add software to the domU

Except for the sudo package needed for the next step, the rest of this is optional but are things that I typically do. This requires starting the domU (with xl create) and logging in to it (with xl console, using the root password given at the end of the xen-create-image output).

Edit /etc/apt/sources.list to drop the deb-src and use non-free-firmware. While you're in there, fix the Debian repositories to be the ones that you wanted; there's a bug in xen-create-image in that it doesn't use the repos from dom0. Then:

apt update
apt upgrade
apt install ca-certificates

Now you can edit /etc/apt/sources.list to use https. Then:

apt update
apt install sudo

plus any other tools you find useful (aptitude, curl, emacs, gpg, lsof, man-db, net-tools, unattended-upgrades,…).

add a normal user to the domU

Connecting to the domU can only be done, initially, with xl console, which requires logging in as root, since that’s the only user that we created so far. (The generated root password will have been printed at the end of the xen-create-image output.) xl console appears to have limitations to its terminal emulation, so connection via SSH would be better. An SSH server is already installed. The SSH daemon (by default) prohibits login as root, and anyway it's best to not log in as root, even via the console, so create a normal user that has complete root privileges [2]:

adduser john
adduser john sudo
passwd --lock root

That's all you need to get started with your new domU!


1 The “non-Xen kernel” is the Linux kernel that is installed with a simple Debian installation, that is, a kernel that isn't enhanced with the Xen hypervisor. When the hypervisor is running in the kernel, it hides certain information from the kernel. Most of that information can be found when the hypervisor is running by using “xl dmesg”.

2 Note that once you lock root's password, if you log out of the console without having created the normal user with admin privileges, you will be locked out of the domU. The way to get access again in that case is to shut down the domU, mount its file system, and edit /etc/shadow to remove the “!” at the start of root's password.