Virtualisation course

Welcome and thanks for joining!

© Course authors (CC BY-SA 4.0) - Image: © Austin Design (CC BY-SA 2.0)

What we will cover

How and why virtualisation help the industry save money, improve uptime and secure IT systems.

© Course authors (CC BY-SA 4.0) - Image: © Håkan Dahlström (CC BY 2.0)

How we will do it

  • Lectures and Q&A
  • Individual and group presentations
  • Lab exercises
  • Continuous reflection
  • Quizzes and scored tests
© Course authors (CC BY-SA 4.0) - Image: © Kuhnmi (CC BY 2.0)

For notes, slides and similar, see:
t.menacit.se/virt.zip.

These resources should be seen as a complement to an instructor lead course, not a replacement.

© Course authors (CC BY-SA 4.0)

Free as in beer and speech

Is anything unclear? Got ideas for improvements? Don't fancy the animals in the slides?

Create an issue or submit a pull request to
github.com/menacit/virt_base_course.

© Course authors (CC BY-SA 4.0) - Image: © Ludm (CC BY-SA 2.0)

Acknowledgements

Thanks to IT-Högskolan and Särimner for enabling development of the course.

Hats off to all FOSS developers and free culture contributors making it possible.

Course contributors

Joel Rangsmo, semeiths and
Palsmo.

© Course authors (CC BY-SA 4.0) - Image: © Marcus Hansson (CC BY 2.0)

Let us dig in!

© Course authors (CC BY-SA 4.0) - Image: © Austin Design (CC BY-SA 2.0)

Background and basics

Why and how we virtualise

© Course authors (CC BY-SA 4.0) - Image: © Reid Campbell (CC0 1.0)

Computers were humans.

© Course authors (CC BY-SA 4.0) - Image: © Robert Sullivan (CC0 1.0)

Computers became...

  • Problem-specific devices
  • General purpose
  • Time-shared/multi-user
© Course authors (CC BY-SA 4.0) - Image: © C. Watts (CC BY 2.0)

Initial motivations

To save money and "keep things simple".

© Course authors (CC BY-SA 4.0) - Image: © Beraldo Leal (CC BY 2.0)

HW-level virtualisation

Pretend to be hardware well enough to make operating systems run.

OS-level virtualisation

Pretend to be an operating system environment well enough to run applications.

Domain-specific virtual machines

More about that later...

© Course authors (CC BY-SA 4.0) - Image: © Jonathan Brandt (CC0 1.0)

VMware

  • Early player introducing the industry to x86 virtualisation
  • Capable of virtualising hardware for most operating systems
  • Still popular today, though a bit messy
© Course authors (CC BY-SA 4.0)

FreeBSD Jails

  • Introduced in version 4.0 (2000)
  • Lazy solution to "UID 0 problem"
  • Most OS functionality provided in an isolated manner to each jail
  • Very popular for web and shell hosting
© Course authors (CC BY-SA 4.0) - Image: © Fandrey (CC BY 2.0)

Glossary and loose synonyms

Physical machine

Host, hypervisor, virtual machine manager (VMM)

Virtual machine

Guest, instance, VM

© Course authors (CC BY-SA 4.0) - Image: © Jonathan Miske (CC BY-SA 2.0)

Course prerequisites

What you need to know and setup

© Course authors (CC BY-SA 4.0) - Image: © Gytis B (CC BY-SA 2.0)

You need to know...

Basic networking and shell usage of Linux.

© Course authors (CC BY-SA 4.0) - Image: © Thierry Ehrmann (CC BY 2.0)

You need to setup...

For details and guidance, see:
"resources/lab_setup.md".

© Course authors (CC BY-SA 4.0) - Image: © Kristina Hoeppner (CC BY-SA 2.0)

Economic benefits

Time to save money and dress sharply

© Course authors (CC BY-SA 4.0) - Image: © Bixentro (CC BY 2.0)

Less hardware means...

  • Less data center space
  • Less supporting infrastructure
  • Less electricity and cooling
  • Less things that can break
© Course authors (CC BY-SA 4.0) - Image: © Sergei F (CC BY 2.0)

Studies show average data center CPU utilization at 6-30%

© Course authors (CC BY-SA 4.0) - Image: © Roy Luck (CC BY 2.0)

Resource sharing means...

  • Shared margins for resource utilization
  • Easier capacity planning
  • Cost savings through use of more efficient/powerful hardware
© Course authors (CC BY-SA 4.0) - Image: © Jonathan Brandt (CC0 1.0)

Technical interlude

© Course authors (CC BY-SA 4.0) - Image: © Wolfgang Stief (CC0 1.0)

Guest migration

Move VM from one hypervisor to another.

Offline

Guest is shutdown before migration and started afterwards.

Online

Guest remains running and is almost seamlessly migrated between hypervisors.

© Course authors (CC BY-SA 4.0) - Image: © Austin Design (CC BY-SA 2.0)

Back to business

© Course authors (CC BY-SA 4.0) - Image: © Austin Design (CC BY-SA 2.0)

Migration of VMs, especially live, enables service of hypervisors during office hours.

If hardware breaks, just start up the VM on another host.

Humans are likely the biggest cost.

© Course authors (CC BY-SA 4.0) - Image: © Kuhnmi (CC BY 2.0)

Deploying software stacks on hardware requires time-consuming planning, procurement and setup.

Automation and self-service features cuts time required for meetings and hand-overs.

Virtualisation makes organisations faster.

© Course authors (CC BY-SA 4.0) - Image: © Jeena Paradies (CC BY 2.0)

Virtualisation provides a (more or less) predictable execution environment.

Testing, installation and support of complex software stacks is expensive.

Software vendors love virtual appliances.

© Course authors (CC BY-SA 4.0) - Image: © Jusotil_1943 (CC0 1.0)

Reliability benefits

Not only nerds care about uptime

© Course authors (CC BY-SA 4.0) - Image: © Meddygarnet (CC BY 2.0)

Virtualisation enables...

  • Easy maintenance of hypervisors and supporting infrastructure thanks to migration
  • Legacy workloads to run on decent and supported HW
  • Less maintenance work due to less HW
  • Better testing due to predictable runtime environment
© Course authors (CC BY-SA 4.0) - Image: © Loco Steve (CC BY-SA 2.0)

Makes multi-server and multi-site redundancy economically viable for many organisations.

© Course authors (CC BY-SA 4.0) - Image: © Takomabibelot (CC BY 2.0)

Snapshots

  • Freeze and store guest state
  • Roll back changes with ease
  • Enabled through Copy-On-Write disk storage and/or overlay file systems
© Course authors (CC BY-SA 4.0) - Image: © J.V.R Sharma (CC BY-SA 2.0)

Snapshots enables operators to apply bug fixes and perform other changes with more confidence.

© Course authors (CC BY-SA 4.0) - Image: © Nicholas Day (CC BY 2.0)

Environmental benefits

How virtualisation can make things a little less terrible

© Course authors (CC BY-SA 4.0) - Image: © Pumpkinmook (CC BY 2.0)

Exercise: Pitch to Greta

Describe environmental benefits of using virtualisation based on what you've learned so far.

Participants are split into groups.
After discussion, send notes to:
courses+virt_010501@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © Peggy Dembicer (CC BY 2.0)

Less HW means less waste

  • Doesn't just apply to servers, but supporting equipment and infrastructure as well
  • Smaller margins, instances can be scaled
  • Bigger/more powerful systems makes sense
  • Studies show average data center CPU utilization at 6-30%
© Course authors (CC BY-SA 4.0) - Image: © Reid Campbell (CC0 1.0)

Less HW means less energy

  • Doesn't just apply to servers, but supporting equipment and infrastructure as well
  • Bigger/more powerful systems are (usually) more energy efficient
  • Migration enables power cycling of servers based on utilization
© Course authors (CC BY-SA 4.0) - Image: © Reid Campbell (CC0 1.0)

VM (live) migration enables temporary
transfer of workloads to regions with
a surplus of renewable energy.

© Course authors (CC BY-SA 4.0) - Image: © Reid Campbell (CC0 1.0)

Security benefits

(Yes, there are several)

© Course authors (CC BY-SA 4.0) - Image: © Tim Green (CC BY 2.0)

Safely running multi-user/multi-tenant systems is hard.

© Course authors (CC BY-SA 4.0) - Image: © Kuhnmi (CC BY 2.0)

One VM per task enables...

  • A smaller attack surface
  • Fewer privileged users per system
  • Tighter network restrictions
  • Easier anomaly and intrusion detection
© Course authors (CC BY-SA 4.0) - Image: © Austin Design (CC BY-SA 2.0)

Qubes OS and unikernels
such as MirageOS represent the logical extreme of these arguments.

© Course authors (CC BY-SA 4.0) - Image: © Freestocks.org (CC0 1.0)

Computers are stateful

Snapshots enable a more aggressive security patching process.

Properly cleaning up a hacked server is no trivial task - neither is forensics.

Modifications and malware cannot only hide in the OS, but also in firmware and OoB mechanisms.

© Course authors (CC BY-SA 4.0) - Image: © Wolfgang Stief (CC0 1.0)

Downsides of virtualisation

The good, the bad and the ugly

© Course authors (CC BY-SA 4.0) - Image: © Wendelin Jacober (CC0 1.0)

Economics

More messy licenses to care about.

"System/VM sprawl" - many guests to take care of.

© Course authors (CC BY-SA 4.0) - Image: © Tofoli Douglas (CC0 1.0)

Reliability

More code means more complexity and things that can break.

Large blast radius if virtualisation fails.

Lazy ways to implement reliability aren't always the best ones.

© Course authors (CC BY-SA 4.0) - Image: © Jesse James (CC BY 2.0)

Security

Isolation is not perfect, many eggs in one basket.

Must trust underlying infrastructure, control plane and humans.

Way too easy to spin-up and forget about systems.

© Course authors (CC BY-SA 4.0) - Image: © Kristina Hoeppner (CC BY-SA 2.0)

Performance

Always some overhead, depending on workload it may be significant.

Less predictable due to "noisy neighbours" and hidden counters.

Trickier to utilize accelerators and CPU features.

© Course authors (CC BY-SA 4.0) - Image: © Jonathan Brandt (CC0 1.0)

Reflections

What have we learned so far?

© Course authors (CC BY-SA 4.0) - Image: © Alan Shearman (CC BY 2.0)

Answer the following questions

  • What are your most important take-aways?
  • Did you have any "Ahaaa!"-moments?
  • Was there anything that was unclear or that you didn't understand?

courses+virt_010801@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © Austin Design (CC BY-SA 2.0)
© Course authors (CC BY-SA 4.0) - Image: © Reid Campbell (CC0 1.0)

Introductory recap

Refreshing what we've learned so far

© Course authors (CC BY-SA 4.0) - Image: © Bixentro (CC BY 2.0)

Covered what virtualisation is and its benefits.

Built foundation for further understanding of technologies available on the market.

© Course authors (CC BY-SA 4.0) - Image: © John Regan (CC BY 2.0)

Recap of prerequisites

Install HashiCorp Vagrant and
a VMM (like Virtualbox) on physical machine.

Install Docker Engine on Ubuntu using method
describe in Docker's official documentation.

© Course authors (CC BY-SA 4.0) - Image: © Kojach (CC BY 2.0)

Warm-up quiz

  • Describe three ways virtualisation can improve reliability
  • Explain the main difference between HW-level and OS-level virtualisation
  • Explain why virtualisation can have a negative impact on performance

courses+virt_010901@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © Reid Campbell (CC0 1.0)

Virtualisation software

Examples of HW-level solutions

© Course authors (CC BY-SA 4.0) - Image: © ETC Project (CC0 1.0)

VMware ESXi

  • Directly installed on HW
  • Primarily used for hosting virtual servers
  • Great functionality for clustering through "vCenter"
  • Expensive and proprietary, but batteries are included
© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

Hyper-V

  • Window's native virtualisation technology
  • Also used for "Windows Subsystem for Linux v2" and "Virtualization-based Security" features
  • Somewhat messy licensing
© Course authors (CC BY-SA 4.0) - Image: © Daniel Oliva Barbero (CC BY 2.0)

"KVM"

What people really mean is
libvirt + QEMU + KVM.

Most widely used virtualisation stack
on Linux and the base for offerings
from Red Hat and Canonical.

Commonly used by public cloud providers.

© Course authors (CC BY-SA 4.0) - Image: © Maja Dumat (CC BY 2.0)

libvirt

  • Abstraction layer for running/managing VMs
  • Supports several different hypervisors
  • Does the dirty work of setting up networking, storage, snapshots etc.
  • Mess of Python and XML files
© Course authors (CC BY-SA 4.0) - Image: © Pedro Ribeiro Simões (CC BY 2.0)

QEMU

  • Swiss army knife of emulation
  • Emulates BIOS/UEFI and devices
  • Designed with flexibility in mind, not security
© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

KVM

  • Feature of the Linux kernel
  • Used to run performance critical operations in kernel space and access HW acceleration features
  • Not only utilized by QEMU!
© Course authors (CC BY-SA 4.0) - Image: © Aka Tman (CC BY 2.0)

Proxmox VE

  • All-in-one solution deployed on HW
  • Managed through web UI or API
  • Uses QEMU + KVM and LXC
  • Built-in ZFS support
© Course authors (CC BY-SA 4.0) - Image: © Masahiko Ohkubo (CC BY 2.0)

Cloud Hypervisor

© Course authors (CC BY-SA 4.0) - Image: © Edd Thomas (CC BY 2.0)

Firecracker

  • Originally developed by AWS
  • Runs secure and performant "microVMs"
  • Not designed for general purpose hosting
  • Shares code with Cloud Hypervisor
© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

Xen

  • Pioneering solution for para-virtualisation
  • Performant "bare metal" hypervisor
  • Enables small Trusted Computing Base
  • XCP-ng provides a ESXi-/Proxmox-like experience
© Course authors (CC BY-SA 4.0) - Image: © Lars Juhl Jensen (CC BY 2.0)
center:100%
© Course authors (CC BY-SA 4.0) - Image: © Scsami (CC0 1.0)

Xen

  • Pioneering solution for para-virtualisation
  • Performant "bare metal" hypervisor
  • Enables small Trusted Computing Base
  • XCP-ng provides a ESXi-/Proxmox-like experience
© Course authors (CC BY-SA 4.0) - Image: © Lars Juhl Jensen (CC BY 2.0)

There are many more solutions available
for hardware-level virtualisation -
all with their own pros/cons.

This was just an appetizer to make
you familar with the most common ones!

© Course authors (CC BY-SA 4.0) - Image: © ETC Project (CC0 1.0)

Qubes OS showcase

An alternative vision of the desktop

© Course authors (CC BY-SA 4.0) - Image: © Pedro Ribeiro Simões (CC BY 2.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)
center:100%
© Course authors (CC BY-SA 4.0)

Not everything is awesome

  • Requires somewhat fancy HW
  • Quite a lot of overhead
  • No 3D acceleration
  • Lots to manage and keep updated
  • Lacking support for newest HW
© Course authors (CC BY-SA 4.0) - Image: © Eric Savage (CC BY-SA 2.0)

Get inspired

  • Use VM and snapshots for risky things
  • Setup a router VM to force traffic through VPN
  • Separate VM per customer/project for easy cleanup and leak prevention
© Course authors (CC BY-SA 4.0) - Image: © Mathias Appel (CC0 1.0)

Confidential computing

Making computers a bit more trustworthy

© Course authors (CC BY-SA 4.0) - Image: © Nikki Tysoe (CC BY 2.0)

Traditionally, we must trust the hypervisor and operators of underlying infrastructure.

© Course authors (CC BY-SA 4.0) - Image: © Eric Chan (CC BY 2.0)

We can encrypt disks of instances, but with somewhat limited value.

© Course authors (CC BY-SA 4.0) - Image: © Eric Kilby (CC BY-SA 2.0)

Operators can build trust using transparency.

Trust may not be enough due to regulatory requirements.

Even if operators act in good faith and auditors are happy, things can still get owned.

© Course authors (CC BY-SA 4.0) - Image: © Jan Bommes (CC BY 2.0)

Currently, lots of interest (see "money") in solving these issues for cloud and edge computing.

"Confidential computing" is the collective term used for attempted solutions.

© Course authors (CC BY-SA 4.0) - Image: © Quinn Dombrowski (CC BY-SA 2.0)

Memory encryption

  • Prevent host (and other guests) from peaking at/modifying VM memory
  • Guests are still vulnerable against many different types of attacks
  • AMD SEV* is an example of a solution
© Course authors (CC BY-SA 4.0) - Image: © Sergei F (CC BY 2.0)

"Full guest isolation"

  • Hardware and software features to prevent host from messing with guests
  • Owner of guests verify runtime environment before starting/decrypting data
  • Intel tries to solve this with TDX
  • AMD SEV-SNP is another solution
© Course authors (CC BY-SA 4.0) - Image: © Lydur Skulason (CC BY 2.0)

What's the catch?

  • Still early days, lacking support in virtualisation software stack
  • Current state of verified boot is quite bad, especially in Linux distributions
  • Requires trust in a proprietary black box
  • Several vulnerabilities have been identified, breaking the promise
© Course authors (CC BY-SA 4.0) - Image: © Fritzchens Fritz (CC0 1.0)

Mayhaps a good complement to minimize risk, if understood properly.

If you find this interesting,
checkout Joel's talk from SEC-T.

© Course authors (CC BY-SA 4.0) - Image: © Graham Drew (CC BY 2.0)

Optimizing virtualisation

Making HW-level VMs go wroooom!

© Course authors (CC BY-SA 4.0) - Image: © Chris Dlugosz (CC BY 2.0)

CPU assisted virtualisation

Allow guests to access parts of
CPU, RAM and PCI bus without going
through virtualisation software.

Requires support from hardware,
like Intel VT-(x|d) and AMD-V features.

May require use of "feature masking"
for live migration and security.

© Course authors (CC BY-SA 4.0) - Image: © Thierry Ehrmann (CC BY 2.0)

Para-virtualisation

Let's not pretend that guests are running on real hardware.

Virtual devices don't necessarily need to act like physical HW and can be much more efficient.

VIRTIO offers standardised virtual devices
such as NICs, block devices ("disk drives"),
sockets and sound cards.

© Course authors (CC BY-SA 4.0) - Image: © Micah Elizabeth Scott (CC BY-SA 2.0)

Memory and data optimization

Deduplication (only store unique data once)
may be used for both guest memory/RAM
and disk to save a lot of space.

"Memory ballooning" features enables
overprovisioning of RAM in guests.

© Course authors (CC BY-SA 4.0) - Image: © Freestocks.org (CC0 1.0)

Security recap

Pros and cons of virtualisation

© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

Exercise: Convince Eddy

Participants are split into two or more groups.

Half should present the security benefits of using virtualisation while the other do the opposite.

After presentation, send slides to
course+virt_011401@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © AK Rockefeller (CC BY-SA 2.0)

Reflections

What have we learned about hardware-level virtualisation?

© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

Answer the following questions

  • What are your most important take-aways?
  • Did you have any "Ahaaa!"-moments?
  • Was there anything that was unclear or that you didn't understand?

courses+virt_011501@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © Fritzchens Fritz (CC0 1.0)
© Course authors (CC BY-SA 4.0) - Image: © Alan Levine (CC0 1.0)

Course progress recap

Refreshing what we've learned so far

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Covered what virtualisation is
and its pros/cons.

Talked about HW-level virtualisation
and some of the solutions available..

© Course authors (CC BY-SA 4.0) - Image: © Kuhnmi (CC BY 2.0)
  • High-level concepts and pros/cons
  • Hardware-level virtualisation
  • OS-level virtualisation
  • Virtualised development and containers
  • Lab
  • Course recap and summary
© Course authors (CC BY-SA 4.0) - Image: © Jon Evans (CC BY 2.0)

Things are about to get more complex.

Remember to ask questions and utilize available learning resources!

© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

Q&A, reflections and feedback

© Course authors (CC BY-SA 4.0) - Image: © John K. Thorne (CC0 1.0)

In most use-cases, the security pros far outweigh the cons.

© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

Qubes OS is neat, but requires decent hardware.

>16GB RAM, Intel VT-x/AMD-V
and Intel VT-d/AMD-Vi.

Checkout the "hardware compatibility list".

© Course authors (CC BY-SA 4.0) - Image: © Andrew Hart (CC BY-SA 2.0)

Para-virtualised devices, such as network cards and disks, don't pretend to be physical things.

By getting rid of the illusion, a lot of overhead is removed which make them faster.

© Course authors (CC BY-SA 4.0) - Image: © Helsinki Hacklab (CC BY 2.0)

Vertical scaling

Add/Remove CPU, RAM, etc.

Horizontal scaling

Increase/Decrease number of VM instances.

© Course authors (CC BY-SA 4.0) - Image: © Pedro Ribeiro Simões (CC BY 2.0)

If you got some spare hardware,
why not try setting up
Proxmox VE
or XCP-NG?

Play around with advanced features,
such as live migration and
PCI forwarding.

© Course authors (CC BY-SA 4.0) - Image: © Jeena Paradies (CC BY 2.0)

Questions/thoughts before we keep going?

© Course authors (CC BY-SA 4.0) - Image: © Tero Karppinen (CC BY 2.0)

OS-level virtualisation

What makes it different and desirable?

© Course authors (CC BY-SA 4.0) - Image: © Wendelin Jacober (CC0 1.0)

HW-level virtualisation

Pretend to be hardware well enough to make operating systems run.

OS-level virtualisation

Pretend to be an operating system environment well enough to run applications.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Pretend to be another OS
and/or
Create an isolated environment

© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

FreeBSD's Linuxulator and Windows' WSL v1 allows execution of (many) Linux applications.

Wine allows (many) Windows programs to run on Linux, *BSD and macOS.

© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

Pretend to be another OS
and/or
Create an isolated environment

© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

Why use OS-level virtualisation?

© Course authors (CC BY-SA 4.0) - Image: © Rolf Dietrich Brecher (CC BY 2.0)

Shared kernel saves memory and CPU cycles:

  • No HW emulation overhead
  • Concurrent usage of accelerators like GPUs
  • More efficient scheduling of processes

Cheaper, faster and more Greta-friendly.

© Course authors (CC BY-SA 4.0) - Image: © Freed eXplorer (CC BY 2.0)

The one-way mirror allows a hypervisor to continuously monitor guests for malicious activity.

Falco is a kool example.

© Course authors (CC BY-SA 4.0) - Image: © NASA/Chris Gunn (CC BY 2.0)

Virtuozzo

  • First commercially available solution for Linux
  • FOSSed in 2005 as "OpenVZ"
  • Popular in the low-end hosting space

LXC

  • Designed to fit into mainline Linux
  • Development currently lead by Canonical
  • Used by ChromeOS to run regular Linux apps
© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

LXC/LXD demonstration

A peak into OS-level virtualisation

© Course authors (CC BY-SA 4.0) - Image: © Freed eXplorer (CC BY 2.0)
$ sudo snap install lxd

lxd 5.9-9879096 from Canonical✓ installed
$ sudo lxd init --minimal
© Course authors (CC BY-SA 4.0) - Image: © Jan Hrdina (CC BY-SA 2.0)
$ sudo lxc image list images: | shuf

centos/8-Stream(3more)
openwrt/21.02(3more)
fedora/36(3more)
almalinux/9(3more)
opensuse/15.4/desktop-kde(1more)
mint/tina(3more)
alpine/3.17(3more)
[...]
© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)
$ sudo lxc launch images:almalinux/9 srv-1

Creating srv-1
Retrieving image: Unpack: 100% (3.81GB/s)
Starting srv-1
© Course authors (CC BY-SA 4.0) - Image: © Price Capsule (CC BY-SA 2.0)
$ cat /etc/os-release | grep -F PRETTY_NAME
PRETTY_NAME="Ubuntu 22.04.1 LTS"

$ sudo lxc exec -t srv-1 -- /bin/bash

[root@srv-1 ~]# cat /etc/os-release | grep -F PRETTY_NAME
PRETTY_NAME="AlmaLinux 9.1 (Lime Lynx)"
© Course authors (CC BY-SA 4.0)
$ sudo lxc info srv-1 --resources

Name: srv-1
Status: RUNNING

Resources:
  Processes: 15
  Disk usage:
    root: 2.62MiB
  Memory usage:
    Memory (current): 54.58MiB
[...]
© Course authors (CC BY-SA 4.0) - Image: © NASA (CC BY 2.0)
$ sudo lxc exec -t srv-2 -- /bin/ash

~ # apk add fortune
(1/3) Installing libmd (1.0.4-r0)
(3/3) Installing fortune (0.1-r1)

~ # fortune 

Matter cannot be created or destroyed,
nor can it be returned without a receipt.
© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)
$ sudo lxc snapshot srv-2 pre_upgrade
$ sudo lxc restore srv-2 pre_upgrade
© Course authors (CC BY-SA 4.0) - Image: © OLCF at ORNL (CC BY 2.0)
$ sudo lxc cluster list

node-1
node-2
node-3
$ sudo lxc info srv-1 | grep Location

Location: node-1
$ sudo lxc cluster evacuate node-1
$ sudo lxc info srv-1 | grep Location

Location: node-2
© Course authors (CC BY-SA 4.0) - Image: © Thierry Ehrmann (CC BY 2.0)

Other neat features

© Course authors (CC BY-SA 4.0) - Image: © NASA/Bill Stafford (CC BY 2.0)

Recent drama in the development community
has caused several contributors to leave
the LXD project.

Checkout the community-driven
fork called Incus.

© Course authors (CC BY-SA 4.0) - Image: © Rod Waddington (CC BY-SA 2.0)

OS-level virtualisation

How does it work on Linux?

© Course authors (CC BY-SA 4.0) - Image: © Enrique Jiménez (CC BY-SA 2.0)

There is no such thing as a OS-level VM in the Linux kernel.

Gluing together features like chroot, namespaces and cgroup creates the illusion.

This functionality has other neat use-cases besides virtualisation.

No worries if you don't grok all of this!

© Course authors (CC BY-SA 4.0) - Image: © Eric Nielsen (CC BY 2.0)

chroot

  • Introduced in UNIX during the 70s
  • "Change file system root"
  • Not designed as a security feature
© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)
$ sudo debootstrap buster my_debian_root http://deb.debian.org/debian/

I: Retrieving InRelease 
I: Resolving dependencies of required packages...
I: Retrieving libacl1 2.2.53-4
[...]
I: Configuring libc-bin...
I: Base system installed successfully.
$ ls my_debian_root/

bin  boot  dev  etc  home  lib  lib32  lib64 [...]
© Course authors (CC BY-SA 4.0)
$ cat /etc/os-release | grep -F PRETTY_NAME
PRETTY_NAME="Ubuntu 22.04.1 LTS"

$ sudo chroot my_debian_root /bin/bash

root@node-1:/# cat /etc/os-release | grep -F PRETTY_NAME
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
© Course authors (CC BY-SA 4.0)
root@node-1:/# dmesg | wc -l
1002

root@node-1:/# ps -xa | grep password
/bin/pacemaker loop --password G0d!

root@node-1:/# tcpdump -i eth0

tcpdump: listening on eth0
[...]
160 packets captured
© Course authors (CC BY-SA 4.0) - Image: © Andrew Hart (CC BY-SA 2.0)

Namespaces

"Functionality to partition a group of processes view of the system".

Host can see through all, members can't.

We'll focus on "process", "network" and "user".

© Course authors (CC BY-SA 4.0) - Image: © Torkild Retvedt (CC BY-SA 2.0)

Process/PID namespace

Members get their own view of the process tree.

© Course authors (CC BY-SA 4.0) - Image: © David J (CC BY 2.0)
$ ps -e | head -n 4
PID   TTY   TIME      CMD
1     ?     00:00:01  systemd
2     ?     00:00:00  kthreadd
3     ?     00:00:00  rcu_gp

$ sudo chroot my_debian_root ps -e | head -n 4
PID   TTY   TIME      CMD
1     ?     00:00:01  systemd
2     ?     00:00:00  kthreadd
3     ?     00:00:00  rcu_gp
© Course authors (CC BY-SA 4.0) - Image: © Helsinki Hacklab (CC BY 2.0)
$ ps -e | head -n 4
PID   TTY   TIME      CMD
1     ?     00:00:01  systemd
2     ?     00:00:00  kthreadd
3     ?     00:00:00  rcu_gp

$ sudo unshare --fork --pid -- chroot my_debian_root /bin/bash

root@node-1:/# ps -e
PID   TTY   TIME      CMD
1     ?     00:00:00  bash
2     ?     00:00:00  ps
© Course authors (CC BY-SA 4.0)

Network namespace

Separate "network stack" for members processes.

Example use-cases:

  • Configure per-application FW rules
  • Force cherry-picked services through a VPN
  • Handle overlapping network segments
© Course authors (CC BY-SA 4.0) - Image: © Dave Herholz (CC BY-SA 2.0)

User namespace

Root in chroot is root.

Lots of things such as package managers expect root privileges, but don't really need it.

User namespaces give members their own group and user lists.

© Course authors (CC BY-SA 4.0) - Image: © Thierry Ehrmann (CC BY 2.0)

cgroup

Members usage of system resources (CPU, memory, disk I/O, etc.) can be limited.

Used together with CRIU for live migration.

Not just used for virtualisation.

© Course authors (CC BY-SA 4.0) - Image: © Pete Seeger (CC BY 2.0)

seccomp

Limit which/how system calls can be used.

Some syscalls allows breakout of isolation.

Minimize attack surface of shared kernel.

© Course authors (CC BY-SA 4.0) - Image: © Wendelin Jacober (CC0 1.0)

Capabilities

Originally developed to make root less omnipotent.

Caps like "NET_BIND_SERVICE" and "SYS_CHROOT" can be given to non-root users.

Not very fine-grained and some are unsafe.

© Course authors (CC BY-SA 4.0) - Image: © Bengt Nyman (CC BY 2.0)

These and other features make up the beautiful mess we call OS-level virtualisation on Linux!

© Course authors (CC BY-SA 4.0) - Image: © Stacy B.H (CC BY 2.0)

OS-level virtualisation

The downsides compared to hardware-level virtualisation

© Course authors (CC BY-SA 4.0) - Image: © Pedro Ribeiro Simões (CC BY 2.0)

Reliability

Live migration is wonky.

Kernel version may be different from what is expected by the guest.

Linux can behave... interestingly... when many different tasks are running.

Bugs triggered by guests could crash the host.

© Course authors (CC BY-SA 4.0) - Image: © Thierry Ehrmann (CC BY 2.0)

Security

Gaps between the different isolation features.

Host exposes a huge attack surface to the guests.

Compatibility has been the priority, not security.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Reflections

What have we learned so far about OS-level virtualisation?

© Course authors (CC BY-SA 4.0) - Image: © Simon Claessen (CC BY-SA 2.0)

Answer the following questions

  • What is the largest benefit of OS-level virtualisation in your opinion?
  • Can you think of any other use-cases where namespaces could help?
  • Suggestions for ways to compensate/mitigate the risks introduced by OS-level virtualisation?
  • Was there anything that was unclear or that you didn't understand?

courses+virt_012101@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

Virtualised development

Why devops teams have grown to love it

© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

In theory, collaborative development and
testing of software shouldn't be that hard.

We got interpreted languages and
great tools like GitHub to help us.

© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

app.py

from flask import Flask, request

app = Flask("kool_app")

@app.route("/api/request_info")
def request_information():
    return {
        "ip_address": request.remote_addr,
        "browser": request.headers["User-Agent"]}
© Course authors (CC BY-SA 4.0)
$ flask run

* Environment: production
* Debug mode: off
* Running on http://127.0.0.1:5000
$ curl http://localhost:5000/api/request_info | jq

{
  "browser": "curl/7.81.0",
  "ip_address": "127.0.0.1"
}
© Course authors (CC BY-SA 4.0)

What dependencies (and versions) are
installed on developers' workstations?

How about in the different server environments,
such as staging and production?

What about other components in the stack?
(OS, load balancer, databases, etc.)

© Course authors (CC BY-SA 4.0) - Image: © Scott Skippy (CC BY-SA 2.0)
$ cat requirements.txt

Flask==2.0.3
$ python3 -m venv my_env
$ ls my_env

bin include lib64 lib app.py
pyvenv.cfg requirements.txt

$ my_env/bin/pip install -r requirements.txt

Successfully installed Flask-2.0.3

$ my_env/bin/flask run

* Environment: production
* Running on http://127.0.0.1:5000
© Course authors (CC BY-SA 4.0) - Image: © Qubodup (CC BY 2.0)

Some problems

  • Sharing venv's is not trivial
  • Only handles Python dependencies*
  • Code may not behave exactly the same on all operating systems
  • Doesn't describe how stack should be run
© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

Is all doom and gloom?

Let's explore some tools that tries
to help us with these challenges!

© Course authors (CC BY-SA 4.0) - Image: © Pelle Sten (CC BY 2.0)

Vagrant demonstration

Shareable development environments

© Course authors (CC BY-SA 4.0) - Image: © NASA/JPL-Caltech (CC BY 2.0)

What is HashiCorp Vagrant?

Software for automating the process of
setting up test/development environments
using virtual machines.

Released back in 2010
by Mitchell Hashimoto.

© Course authors (CC BY-SA 4.0) - Image: © NASA/JPL-Caltech (CC BY 2.0)

How does it work?

Instructions for how a VM should be
configured are defined as code in
shareable "Vagrantfiles".

"Boxes" are pre-configured OS images
that can be uploaded/downloaded from
a remote repository, like "Vagrant Cloud".

Relies on a "provider", like VirtualBox
or VMware Fusion, for the heavy lifting.

© Course authors (CC BY-SA 4.0) - Image: © NASA/JPL-Caltech (CC BY 2.0)
$ cat Vagrantfile 

Vagrant.configure("2") do |config|
  # Pre-built OS image used as base for guest
  # downloaded from a remote registry
  config.vm.box = "ubuntu/bionic64"

  # Directory shared between host and guest
  config.vm.synced_folder "./code", "/vagrant_data"

  # Commands to run after spawning guest
  config.vm.provision "shell", inline: <<-SHELL
    apt-get update
    apt-get install -y python3 python3-pip
    pip3 install -r /vagrant_data/requirements.txt
  SHELL
end
© Course authors (CC BY-SA 4.0)
$ vagrant validate
Vagrantfile validated successfully.

$ vagrant up

Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'ubuntu/bionic64'...
==> default: Forwarding ports...
==> default: Waiting for machine to boot.
==> default: Running provisioner: shell...
[...]
Done
© Course authors (CC BY-SA 4.0)
$ vagrant ssh

vagrant@ubuntu-bionic:~$ cd /vagrant_data/
vagrant@ubuntu-bionic:/vagrant_data$ flask run

* Environment: production
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
$ vagrant destroy --force
==> default: Forcing shutdown of VM...
==> default: Destroying VM...
© Course authors (CC BY-SA 4.0)

Other neat things

Multiple VMs can be configured and
launched using a single Vagrantfile.

"Vagrant Share" plugin enables you
to easily provide remote access
to a guest for collaboration.

For building production OS images,
try HashiCorp Packer.

© Course authors (CC BY-SA 4.0) - Image: © Fritzchens Fritz (CC0 1.0)

Docker demonstration

Tooling behind application containers

© Course authors (CC BY-SA 4.0) - Image: © Milan Bhatt (CC BY-SA 2.0)

What is Docker?

Software for packaging/running
applications and their dependencies
in predictable execution environments.

Users can download and execute an
"application container" without caring
about code interpreter configuration,
software library versions, startup scripts
and similar boring/error-prone stuff.

© Course authors (CC BY-SA 4.0) - Image: © Milan Bhatt (CC BY-SA 2.0)

How does it work?

Instructions for how a VM should be
configured are defined as code in
shareable "Dockerfiles".

"Images" are pre-configured OS images
that can be uploaded/downloaded from
a remote repository, like "Docker Hub".

Relies on OS-level virtualisation
(by default) to provide low overhead
and speedy execution of guests.

© Course authors (CC BY-SA 4.0) - Image: © Milan Bhatt (CC BY-SA 2.0)
$ docker run -ti ubuntu:18.04 /bin/bash

Unable to find image 'ubuntu:18.04' locally
18.04: Pulling from library/ubuntu
a055bf07b5b0: Pull complete 

root@79424287b988:/#
© Course authors (CC BY-SA 4.0) - Image: © Thierry Ehrmann (CC BY 2.0)
# Pre-built OS/application image used as base
# for guest downloaded from a remote registry
FROM debian:bookworm-slim

# Commands to run during the build process
# to install dependencies, configure
# options and similar
RUN apt-get update \
    && apt-get install -y curl qrencode

# Copy source code and other application
# artifacts into guest image
COPY my_script.sh /usr/local/bin/

# Instructions for how the guest should
# be executed when run by a user
CMD /usr/local/bin/my_script.sh --fetch
© Course authors (CC BY-SA 4.0) - Image: © Jan Hrdina (CC BY-SA 2.0)
# Pre-built OS/application image used as base
# for guest downloaded from a remote registry
FROM python:3.11.0

# Commands to run during the build process
# to install dependencies, configure
# options and similar
RUN pip install Flask==2.2.2

# Copy source code and other application
# artifacts into guest image
COPY app.py .

# Instructions for how the guest should
# be executed when run by a user
USER 10000
CMD ["flask", "run", "--host", "0.0.0.0"]
© Course authors (CC BY-SA 4.0) - Image: © Freed eXplorer (CC BY 2.0)
$ docker build --tag kool_app:0.2 .

Sending build context to Docker daemon  4.608kB

Step 1/5 : FROM python:3.11.0
Step 2/5 : RUN pip install Flask==2.2.2
Step 3/5 : COPY app.py .
Step 4/5 : USER 10000
Step 5/5 : CMD ["flask", "run", "--host", "0.0.0.0"]
Successfully tagged kool_app:0.2
$ sudo docker run --publish 5000:5000 kool_app:0.2

* Debug mode: off
* Running on all addresses (0.0.0.0)
© Course authors (CC BY-SA 4.0)

Our example application is
quite simple - let's grab a
more complex/realistic example...

I choose you, Nextcloud!

© Course authors (CC BY-SA 4.0) - Image: © Jan Hrdina (CC BY-SA 2.0)
center
© Course authors (CC BY-SA 4.0)
center
© Course authors (CC BY-SA 4.0)
center
© Course authors (CC BY-SA 4.0)
center
© Course authors (CC BY-SA 4.0)
center
© Course authors (CC BY-SA 4.0)
center
© Course authors (CC BY-SA 4.0)
$ docker run \
  --publish 80:80 \
  --volume my_nextcloud_data:/var/www/html \
  --env NEXTCLOUD_ADMIN_USER=bob \
  --env NEXTCLOUD_ADMIN_PASSWORD=hunt_er2 \
  docker.io/library/nextcloud:26.0.10
© Course authors (CC BY-SA 4.0) - Image: © Jan Hrdina (CC BY-SA 2.0)

The number of command line flags
to our docker run is growing.

Might need other components to
properly run our application,
like a proper database or
reverse proxy for HTTPS.

We could try to document the
setup or write a bash script
to start our containers.

To handle these problems,
"Docker Compose" was created.

© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)

nextcloud/docker-compose.yml

services:
  database:
    image: "mysql:8.2"
    volumes:
      - "my_db_data:/var/lib/mysql"
    environment:
      MYSQL_ROOT_PASSWORD: "hunt_er2"
      MYSQL_DATABASE: "nextcloud"
      MYSQL_USER: "nextcloud"
      MYSQL_PASSWORD: "4pp.pass"

  application:
    image: "nextcloud:26.0.10"
    volumes:
      - "my_nextcloud_data:/var/www/html"
    environment:
      NEXTCLOUD_ADMIN_USER: "bob"
      NEXTCLOUD_ADMIN_PASSWORD: "hunt_er2"
      MYSQL_HOST: "database"
      MYSQL_DATABASE: "nextcloud"
      MYSQL_USER: "nextcloud"
      MYSQL_PASSWORD: "4pp.pass"
    ports:
      - "8080:80"

volumes:
  my_db_data: {}
  my_nextcloud_data: {}
© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)

nextcloud/docker-compose.yml

services:
  database:
    image: "mysql:8.2"
    volumes:
      - "my_db_data:/var/lib/mysql"
    environment:
      MYSQL_ROOT_PASSWORD: "hunt_er2"
      MYSQL_DATABASE: "nextcloud"
      MYSQL_USER: "nextcloud"
      MYSQL_PASSWORD: "4pp.pass"

[...]
© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)

nextcloud/docker-compose.yml

[...]

  application:
    image: "nextcloud:26.0.10"
    volumes:
      - "my_nextcloud_data:/var/www/html"
    environment:
      NEXTCLOUD_ADMIN_USER: "bob"
      NEXTCLOUD_ADMIN_PASSWORD: "hunt_er2"
      MYSQL_HOST: "database"
      MYSQL_DATABASE: "nextcloud"
      MYSQL_USER: "nextcloud"
      MYSQL_PASSWORD: "4pp.pass"
    ports:
      - "8080:80"
© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)
$ docker compose up --detach

 [+] Running 2/2
 ✔ Container nextcloud-application-1  Started
 ✔ Container nextcloud-database-1     Started
$ docker compose stop

 [+] Stopping 2/2
 ✔ Container nextcloud-application-1  Stopped
 ✔ Container nextcloud-database-1     Stopped 
© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)

Options in the "docker-compose.yml" file
can be utilized to configure secrets,
health/ready checks, networking,
restart behavior, etc.

For more complex orchestration needs,
Kubernetes is a popular alternative.

© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)

Further awesomeness

Makes it easy to build immutable systems.

Tons of pre-built images on Docker hub.

Supports both OS-level and
(with some effort) HW-level virtualisation.

© Course authors (CC BY-SA 4.0) - Image: © Freed eXplorer (CC BY 2.0)

Hope you've enjoyed the introduction.

We'll get back to how Docker
works in detail later!

In the meantime,
Read The Fine Manual!

© Course authors (CC BY-SA 4.0) - Image: © Freed eXplorer (CC BY 2.0)

Container images

What are they really?

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Container pros

Benefits (AKA "All kool kids like it")

© Course authors (CC BY-SA 4.0) - Image: © Pedro Ribeiro Simões (CC BY 2.0)

Containers make it easy to...

© Course authors (CC BY-SA 4.0) - Image: © Mathias Appel (CC0 1.0)

Container cons

Everything ain't awesome

© Course authors (CC BY-SA 4.0) - Image: © Shannon Kringen (CC BY 2.0)

Containers make it easy to...

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Not understand what you are running.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Use software packaged by dubious people.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Forget to update/patch things.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Reflections

What have we learned about virtualised development so far?

© Course authors (CC BY-SA 4.0) - Image: © Kuhnmi (CC BY 2.0)

Answer the following questions

  • What are your most important take-aways?
  • Did you have any "Ahaaa!"-moments?
  • Was there anything that was unclear or that you didn't understand?

courses+virt_012801@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © Helsinki Hacklab (CC BY 2.0)

Virtualisation labs

Putting knowledge to practice

© Course authors (CC BY-SA 4.0) - Image: © Fritzchens Fritz (CC0 1.0)

Lab descriptions tell you what to do,
not exactly how you should do it.

Your submissions should document
how you solved the problem.

© Course authors (CC BY-SA 4.0) - Image: © Mario Hoppmann (CC BY 2.0)

Lab: Shared dev environment

Graded exercise to use HashiCorp Vagrant
and Docker Compose to package/run
a Python application and its dependencies.

For detailed instructions, carefully read:
"resources/lab/README.md".

Remember to download the latest version of
the resources archive! ("virt.zip")

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Docker revisited

© Course authors (CC BY-SA 4.0) - Image: © NASA (CC BY 2.0)

Docker's primary focus is to package/run
applications and their dependencies in a
predictable execution environment.

Let's take a deeper look at
how it achieves this!

© Course authors (CC BY-SA 4.0) - Image: © NASA (CC BY 2.0)

Let's run a container!

$ docker run \
  --publish 8080:80 \
  --volume my_nextcloud_data:/var/www/html \
  docker.io/library/nextcloud:26.0.10
© Course authors (CC BY-SA 4.0) - Image: © Rod Waddington (CC BY-SA 2.0)

Fetching container image

Unable to find image 'nextcloud:26.0.10' locally
26.0.10: Pulling from library/nextcloud
af107e978371: Already exists
6480d4ad61d2: Already exists
[...]
e9daddf7d237: Pull complete
d929539deed8: Pull complete
Digest: sha256:7f053b3fd21391e08[...]ec9c1fe7a7
© Course authors (CC BY-SA 4.0) - Image: © Steven Kay (CC BY-SA 2.0)

Observing application output

Initializing nextcloud 26.0.10.1 ...
New nextcloud instance
Initializing finished
[...]
[mpm_prefork:notice] [pid 1] AH00163:
Apache/2.4.57 (Debian) PHP/8.2.14 configured
-- resuming normal operations
[core:notice] [pid 1] AH00094:
Command line: 'apache2 -D FOREGROUND'
© Course authors (CC BY-SA 4.0) - Image: © Jorge Franganillo (CC BY 2.0)
center
© Course authors (CC BY-SA 4.0)
center
© Course authors (CC BY-SA 4.0)

Hmm, a lot just happened there.

Let's try to brake it down step-by-step!

© Course authors (CC BY-SA 4.0) - Image: © Martin Fisch (CC BY 2.0)

Let's run a container!

$ docker run \
  --publish 8080:80 \
  --volume my_nextcloud_data:/var/www/html \
  docker.io/library/nextcloud:26.0.10
© Course authors (CC BY-SA 4.0) - Image: © Rod Waddington (CC BY-SA 2.0)

The docker CLI utility is a client
speaking HTTP with the Docker daemon.

Can be used to run and interact with
containers on local/remote hosts.

Sub-command "run" provides hundreds
of different option flags to control
how the container is executed,
such as CPU and RAM limits.

docker.io/library/nextcloud:26.0.10
indicates which container image
should be used...

© Course authors (CC BY-SA 4.0) - Image: © Rod Waddington (CC BY-SA 2.0)
$ docker volume inspect my_nextcloud_data
[
  {
    "Name": "my_nextcloud_data",
    "Driver": "local",
    "Labels": null,
    "Mountpoint":
"/var/lib/docker/volumes/my_nextcloud_data/_data",

    "Options": null,
    "Scope": "local"
  }
]
© Course authors (CC BY-SA 4.0) - Image: © Rod Waddington (CC BY-SA 2.0)

The docker CLI utility is a client
speaking HTTP with the Docker daemon.

Can be used to run and interact with
containers on local/remote hosts.

Sub-command "run" provides hundreds
of different option flags to control
how the container is executed,
such as CPU and RAM limits.

docker.io/library/nextcloud:26.0.10
indicates which container image
should be used...

© Course authors (CC BY-SA 4.0) - Image: © Rod Waddington (CC BY-SA 2.0)

Fetching container image

Unable to find image 'nextcloud:26.0.10' locally
26.0.10: Pulling from library/nextcloud
af107e978371: Already exists
6480d4ad61d2: Already exists
[...]
e9daddf7d237: Pull complete
d929539deed8: Pull complete
Digest: sha256:7f053b3fd21391e08[...]ec9c1fe7a7
© Course authors (CC BY-SA 4.0) - Image: © Steven Kay (CC BY-SA 2.0)

The container image is a bundle containing all
files, scripts and code libraries required to
run an application predictably.

Could be compared to a ZIP file of a
"debootstrapp'ed" directory used in
previous examples with chroot.

Container images are typically stored/shared
using a repository such as "Docker Hub".

© Course authors (CC BY-SA 4.0) - Image: © Steven Kay (CC BY-SA 2.0)

How do we create container images?

Typically, a "Dockerfile" and the
command docker build is used...

© Course authors (CC BY-SA 4.0) - Image: © Steven Kay (CC BY-SA 2.0)
# Name of existing image to use as base
FROM debian:bookworm-slim

# Command(s) executed inside container
# during image build process
RUN apt-get update \
    && apt-get install -y curl qrencode

# Copies file from host into container
COPY my_script.sh /usr/local/bin/

# Metadata indicating to container runtime
# which program to execute on start-up
CMD /usr/local/bin/my_script.sh --fetch
© Course authors (CC BY-SA 4.0) - Image: © Steven Kay (CC BY-SA 2.0)
$ docker build --tag my_script:v2.3 .

=> [internal] load build definition from
Dockerfile [...]
=> [1/3] FROM debian:bookworm-slim
  => resolve docker.io/library/
  debian:bookworm-slim@sha256:bac354c[...]
=> [2/3] RUN apt-get update [...]
=> [3/3] COPY my_script.sh /usr/local/bin/
=> exporting to image
[+] Building 13.3s (8/8) FINISHED

$ docker image save my_script:v2.3 \
  > container_image-my_script-v2.3.tar
© Course authors (CC BY-SA 4.0) - Image: © Steven Kay (CC BY-SA 2.0)
# Minify client side assets (JavaScript)
FROM node:latest AS build-js

RUN npm install gulp gulp-cli -g
WORKDIR /build
COPY . .
RUN npm install --only=dev && gulp

# Build Golang binary
FROM golang:1.15.2 AS build-golang

WORKDIR /go/src/github.com/gophish/gophish
COPY . .
RUN go get -v && go build -v

# Runtime container
FROM debian:stable-slim

WORKDIR /opt/gophish
COPY --from=build-golang \
  /go/src/github.com/gophish/gophish/ ./

COPY --from=build-js \
  /build/static/js/dist/ ./static/js/dist/

[...]
© Course authors (CC BY-SA 4.0) - Image: © Steven Kay (CC BY-SA 2.0)

Observing application output

Initializing nextcloud 26.0.10.1 ...
New nextcloud instance
Initializing finished
[...]
[mpm_prefork:notice] [pid 1] AH00163:
Apache/2.4.57 (Debian) PHP/8.2.14 configured
-- resuming normal operations
[core:notice] [pid 1] AH00094:
Command line: 'apache2 -D FOREGROUND'
© Course authors (CC BY-SA 4.0) - Image: © Jorge Franganillo (CC BY 2.0)

Most of the heavy lifting is
delegated to containerd.

Shared component used by several
container management tools besides Docker,
like Kubernetes, "Pouch" and balenaEngine.

Uses runc by default to spawn containers
using OS-level virtualisation, but supports
other runtimes like runhcs (Windows) and
Kata (HW-level virtualisation).

© Course authors (CC BY-SA 4.0) - Image: © Jorge Franganillo (CC BY 2.0)
center
© Course authors (CC BY-SA 4.0)

Tweaking application configuration

$ docker run \
  --publish 8080:80 \
  --volume my_nextcloud_data:/var/www/html \
  --env NEXTCLOUD_ADMIN_USER=bob \
  --env NEXTCLOUD_ADMIN_PASSWORD=hunt_er2 \
  docker.io/library/nextcloud:26.0.10
© Course authors (CC BY-SA 4.0) - Image: © Rod Waddington (CC BY-SA 2.0)

The "Docker Compose" tool enables us to
describe how one or more containers should
be run (environment variables, volumes, etc.).

Configuration defined in a text file called
"docker-compose.yml", which can be shared
with developers, system administrators
and end-users.

© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)
services:
  database:
    image: "mysql:8.2"
    volumes:
      - "my_db_data:/var/lib/mysql"
    environment:
      MYSQL_ROOT_PASSWORD: "hunt_er2"
      MYSQL_DATABASE: "nextcloud"
      MYSQL_USER: "nextcloud"
      MYSQL_PASSWORD: "4pp.pass"

  application:
    image: "nextcloud:26.0.10"
    volumes:
      - "my_nextcloud_data:/var/www/html"
    environment:
      MYSQL_HOST: "database"
      MYSQL_DATABASE: "nextcloud"
      MYSQL_USER: "nextcloud"
      MYSQL_PASSWORD: "4pp.pass"
    ports:
      - "8080:80"

[...]
© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)
$ docker compose up --detach

 [+] Running 2/2
 ✔ Container nextcloud-application-1  Started
 ✔ Container nextcloud-database-1     Started
$ docker compose stop 

 [+] Stopping 2/2
 ✔ Container nextcloud-application-1  Stopped
 ✔ Container nextcloud-database-1     Stopped 

$ docker compose rm --volumes --force

 [+] Removing 2/2
 ✔ Container nextcloud-application-1  Removed
 ✔ Container nextcloud-database-1     Removed
© Course authors (CC BY-SA 4.0) - Image: © Kurayba (CC BY-SA 2.0)

Introducing VDI

"Virtual Desktop Infrastructure"

© Course authors (CC BY-SA 4.0) - Image: © Thierry Ehrmann (CC BY 2.0)

Goal: Centralise compute resources.

© Course authors (CC BY-SA 4.0) - Image: © Scott McCallum (CC BY-SA 2.0)

Resource pooling means...

  • Less overall hardware
  • More processing power when needed
  • Shared usage of expensive software and accelerators such as GPUs
© Course authors (CC BY-SA 4.0) - Image: © Bret Bernhoft (CC0 1.0)

Minimize cost of setup, management and support.

Allows admins to focus their efforts in one place to store and backup data securely*.

© Course authors (CC BY-SA 4.0) - Image: © Bret Bernhoft (CC0 1.0)

How do users access their virtual desktops?

© Course authors (CC BY-SA 4.0) - Image: © Jan Bocek (CC BY 2.0)

Thin clients

Low-cost devices with a NIC, display output and basic I/O. No internal storage.

Minimum local compute resources.

Typically shared between multiple users.

Connects to appropriate virtual desktop
based on user+password, smart card, etc.

© Course authors (CC BY-SA 4.0) - Image: © VIA Gallery (CC BY 2.0)

Web and native apps

Allows users to access their virtual desktop from any device anywhere at any time.

© Course authors (CC BY-SA 4.0) - Image: © Adam Greig (CC BY-SA 2.0)

The downsides...

  • Many users still need offline compute
  • Requires stable low latency connection
  • Client must still be trusted
  • Many eggs in one basket
© Course authors (CC BY-SA 4.0) - Image: © Maja Dumat (CC BY 2.0)

Virtualisation course recap

Summarising what we've covered

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Exercise: Help the CTO

Participants are split into four or more groups.

Each group is assigned to present either the economic, reliability, security or environmental
pros/cons of virtualisation, targeted towards a CTO role in an assigned example organisation.

The presentation should cover usage of HW-/OS-level virtualisation, virtual/container development
and VDI. After presentation, send slides to
courses+virt_013301@0x00.lt

© Course authors (CC BY-SA 4.0) - Image: © Freed eXplorer (CC BY 2.0)

Org. 1: Examplify

Cash strapped startup on a mission to make customers (and stockholders) happy.

Develops, hosts and supports web pages/apps for (temporary) PR campaigns.

Needs to grow/shrink fast and has employees all over the world working from home.

Focus on the economic pros/cons.

© Course authors (CC BY-SA 4.0) - Image: © Yves Sorge (CC BY-SA 2.0)

Org. 2: Xample Industries

Well-established company in heavy-manufacturing.

Every second of downtime and delays means huge monetary losses.

Depend on many legacy IT systems for their factory production line.

Focus on the reliability pros/cons.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Org. 3: Department Of Examples

Spook agency making the world a safer place.

Keeping information secret and systems secure is of uttermost importance.

Plagued by bureocrats, spies and sinister insiders.

Focus on the security pros/cons.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Org. 4: Example Airways

Transportation company with urgent need for efficient greenwashing.

Has several data centers spread around the world.

Need to handle the occasional surge of visitors and bookings on their website.

Focus on the environmental pros/cons.

© Course authors (CC BY-SA 4.0) - Image: © Gerben Oolbekkink (CC BY 2.0)

Remember the target audience/focus area.

If it suites your presentation, add/modify the example organisation's challenges.

© Course authors (CC BY-SA 4.0) - Image: © Mauricio Snap (CC BY 2.0)

Let's summarise!

© Course authors (CC BY-SA 4.0) - Image: © OLCF at ORNL (CC BY 2.0)

Economic benefits

Less HW == Less electricity, DC space and painful human labor.

Enables self-service and automation which helps organisations move faster.

© Course authors (CC BY-SA 4.0) - Image: © Jennifer Morrow (CC BY 2.0)

Reliability benefits

Maintenance of underlying infrastructure without disruptions.

Easier testing and operations due to predictable runtime environments.

Enables quick recovery to a good known state.

© Course authors (CC BY-SA 4.0) - Image: © USGS EROS (CC BY 2.0)

Security benefits

Reduces attack surface through fine grained segmentation of systems.

Enables users to somewhat safely handle untrusted data.

Enables quick recovery to a good known state.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

Environmental benefits

Less HW == less e-waste and energy consumption.

Power up/down hypervisors depending on load.

Migrate workloads to regions with green electricity.

© Course authors (CC BY-SA 4.0) - Image: © Nacho Jorganes (CC BY-SA 2.0)

HW-level virtualisation

Pretend to be hardware well enough to make operating systems run.

OS-level virtualisation

Pretend to be an (isolated) operating system environment well enough to run applications.

© Course authors (CC BY-SA 4.0) - Image: © Mauricio Snap (CC BY 2.0)

OS-level > HW-level

  • Faster!
  • More efficient/higher density
  • Easier to inspect and monitor
© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

OS-level < HW-level

  • Easier to break isolation
  • Less stable/predictable
  • Barley supported live-migration
© Course authors (CC BY-SA 4.0) - Image: © Marco Verch (CC BY 2.0)

Development tools

Virtualisation helps developers and users
by providing predictable execution environments
for applications.

Vagrant can be used for automated setup of
test environments on any OS using
HW-level virtualisation.

Docker can be used for packaging and
running (Linux) apps using
OS-level virtualisation.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

The main downside with virtualisation
is decreased performance ("overhead"),
especially for HW-level.

Para-virtualisation features,
like Intel VT-x and Virtio devices,
can minimize this problem.

Increased technical complexity
is another challenge.

© Course authors (CC BY-SA 4.0) - Image: © Dennis van Zuijlekom (CC BY-SA 2.0)

DE_END

Thanks for listening!

© Course authors (CC BY-SA 4.0) - Image: © Bret Bernhoft (CC0 1.0)

Welcome participants and wait for everyone to get settled. Introduction of the lecturers and their background. Segue: In this part of the course we'll talk about virtualisation...

- These days almost everyone virtualise their systems, an ongoing transformation for the past 20y - Why they do it: basically what the slide says - Enabled the success of companies such as AWS and countless startups - Learn about the different kinds of virtualisation technologies and their pros/cons

- We'll cover lots of things in a short amount of time - In order to be able to do this we'll use scientifically proven methods to Make It Stick - Basically what the slide says - Don't forget to have fun!

- There are several resources to help you learn - Speaker notes in slides are heavily recommended for recaps/deep diving - May also be available through LMS, depending on how the course is consumed - The course is designed to be instructor lead, won't make the most of it on your own, see as aid

- Encourage participants to make the course better - Learners are likely the best to provide critique, lecturers are likely a bit home-blind - No cats or dogs allowed! - Feel free to share it with friends or use it yourself later in your career

The course wouldn't be available if it wasn't for financial support - Thanks!

- We'll begin with a retrospective to understand how we ended up where we are today - Try to clarify the different kinds of virtualisation available

- A bit of pop history: The picture shows a group of NACA computers, precursor to NASA - Often rooms full of women, minorities and anyone else not considered fully esteemed - Things started to change with a lot with Turing's Christof computer - Not get stuck here, it's fun but there are whole books available for those who are interested

- Digital computers became a thing a big thing as they could solve things a lot faster than humans - Slowly but surely became field-programmable and general purpose, saved a lot of money! - Everyone (see all scientists and fortune 50s) wanted access to a computer, so time sharing and remote access became a thing: a very early version of the cloud, many experts at the time thought computing would be provided by regional providers in a similar manner to water and electricity. The word terminal is still with us - but the view of computing as a utility look slightly dif. - Multi-user systems became the norm and suddenly not everyone with access to the computer were equally trusted or given equal amounts of processing power. Things started to get more and more complex. Richard Stallman, the father of GNU and the FOSS movement, was famous for actively working against restrictions of computer usage. Segue: Proto-virtualisation already exist in different forms on mainframes, but in the beginning of the 2000s the need becomes apparent in the commodity server market as well.

- So what were the main motivations: basically what the slide says - Up until that point, every hertz could have a use and the benefits (see dreamy eyed potential) of digitalization had been many. Maybe the IT bubble burst started putting focus on cost again. - Computers had started to become way too powerful for single tasks, but the security model of OSes had remained largely the same - multi-tenancy was not simple and admin privileges, which were more or less required for everything, were omnipotent and hard to restrict. - Ofc these problems aren't impossible to solve, but the effort required to change hardware, software and way-of-working (see knowledge) was deemed like too big of a task. Segue: Two primary methods for solving these problems started to appear

- HW-level virtualisation is probably what comes to mind for most people when they hear the term. - If you used VirtualBox, you know this. - Quite complex and slow to pretend to be hardware, but the method soon showed many benefits. - Make legacy operating systems and applications run without customisation: business as usual. - Most users do however not care about how the OS is doing, they care about their apps running. - OS-level virtualisation tries to provide an environment that is as close as possible to what applications and admin expects: It's job is not to fool the operating system. - OS-level virtualisation requires changes of OS, but probably easier than pretend to behave like a physical object with all it's quirks. - Mostly used to virtualise OSes of the same kind (Linux on Linux) but not only - The ideas of uncoupling applications from hardware and/or operating systems became a thing as well, but more about that later in the course: For now focus on HW-level and OS-level virt. - OS-level instances are often called "containers", I try to not use the term as it is a bit confusing and could mean mean other things (more about that later) Segue: To make things a bit less diffuse, let's talk about early contenders in the field.

- Company that brought HW-level virtualisation to the masses since the early 2000s. - Many people considered it a bit of a moon shot at the time. - While slow it quickly became popular as it helped companies save a lot of money (more about that later in the course). - Early in the space of clustering nodes. - Used to be unchallenged king of HW-level virt for a long time, but these days there are several who battle for the title: many which are freely available.

- First virtualisation on UNIX, inspired Solaris Zones, OpenVZ and other solutions. - The "UID 0 problem" was the main focus: Splitting root up into smaller parts was deemed to much a hassle. - Create isolated jails, which have their own root and own view of the system, such as processes and the file system root. Segue: We are already using a lot of terms, let's try to clarify what some of them mean.

Many of these terms are not strictly correct, but they are widely used so I've tried to group them.

- Participants need to know basic ethernet and IP networking to understand some concepts and labs. - Being somewhat comfortable in the Linux shell is a requirement. Some of the things we'll cover can't be done through a GUI. - When covering OS virtualisation we'll also dig a bit deep down into Linux internals. You don't need to be an expert but knowing what kernel space and user space is will surely help.

- We'll need some software and a lab environments for the course - Ubuntu will be used in most demos and labs. - For now, just make sure that the software is installed and seem to be working. - Check out the presentation links: I will however not provide a step-by-step guide, you won't get it in real life.

- It's all bout' the money, it's all bout the dumdudmudumdudm. - Unfortuneatly this is what people other than nerds general care about. - Useful knowledge to understand how we got here. - Important for you to make friends/get promoted/earn a decent salary at most places. Segue: Let's start with the simple stuff - HW.

- Data center space is expensive, as it needs to be secure, climate controlled and somewhat clean. - Servers need a lot of supporting infrastructure such as network switches, UPSes and (especially) back in the days KVMs (Keyboard-Video-Mouse, yikes). It all cost money and take space. - These days it should not come as a surprise to anyone, but electricity is expensive. Cooling also requires a lot of energy (in general). - Lots of HW means that many things must be maintained and that can break - not just servers but also supporting infrastructure. Humans need to fix/order/replace these, humans are expensive and annoying: you need to find them, give em money and keep them happy. Segue: Another thing to consider is that we barely use the computers that we've bought.

- There are studies out there made by various actors: all of them show very low utilization. - These have been produced with more or less effort: many contains circular references. - Others, such as those published by Google and Alibaba are not likely not very representative of average enterprise environments. - Match instructors anecdotal observations. - We can like fit these computing needs on a few hypervisors instead.

- We also don't need to think so much about margins. We can always vertically scale guests, in some cases even without shutting down the instance: Linux supports hot swapping of CPUs and similar. - Same goes for capacity planning, don't need to spend so much effort for every system. - In general, more powerful HW is more cost efficient from a power and/or space perspective. It would probably hard to motivate investment in such fancy HW without knowing if it would be used.

Let's take a deep breath: in order to explain further economic benefits we need to understand the underlying technology a bit better. Let's talk about guest migration.

- In a clustered environment we can move guests between hypervisors - Some solutions support migrating state (disk, memory, CPU registers, etc) and through slight of hand make the guest run seamlessly on the other host. Some don't: offline VS online. - While not a strict requirement, some type of shared storage for VM disks/file systems is required - Not trivial, but we'll cover the implementation details later in the course.

Time to tie the tie again.

- Being able to do maintenance without screaming users is huge. - Handling HW breakage seams trivial, not so much back in the days when Windows was tightly coupled with the hardware it was installed on. Still not trivial - both due to actual and license problems - Legacy systems can survive long past their HWs due date. Companies love this, sysadmins not so much. - Deserves repeating: Human labor is expensive and something most organisations try to avoid.

- Costs needed to anticipated in detail. The requirement to pay a lot up front for new systems was a huge barrier for startups/small players. - Buying HW, finding space in a rack and making somewhat put it up takes a lot of effort. - Lead times for hardware is really long these days and the supply chain is unpredictable. - If we can remove humans we can start to automate things. Time consuming (expensive) to move tasks between developers and HW team, lot's of room for errors and miscommunication. - We wouldn't have devops the way it is today without virtualisation.

- Selling software that other parties are suppose to install, manage and/or maintain is a real mess and quite annoying. - You need to take a lot of variables into question: OS version, hardware/drivers, faulty setup. - Before the problem used to be solved by shipping pizza boxes to customers, but that's also meh. - Virtual appliances can be delivered preconfigured in a somewhat predictable environment, only exposing limited configuration options that the customers can screw up.

Virtualisation helps us run systems reliably, but how?

A lot of the same things that make virtualisation good from an economic perspective applies to reliability as well. Segue: There are some other neat benefits as well.

- Computers break, we should not rely on a single one. Same goes for data centers. - Three is a magic number in computing (and voting in general) as it enables a quorum to be kept. Redundancy with two nodes may be better than with none, but handling state and data is a mess: requires a lot of fencing and painful reconciliation. - Would be quite expensive and out of the questions for most organisations to do this with HW only.

- The ability to capture state of a guest and roll back. - Some virtualisation solutions only support disk snapshots and not RAM/CPU state. Really only matters if you want to be able to revert instance to an already running state. - Modern filesystems also implement snapshots, but VM snapshots leave less room for trouble. - Important to note that a disc only snapshot should preferably be taken when the guest is shut down: things could be cached in memory, databases are notorious. Segue: Let's put this feature to some use for reliability.

- Organisations are often scared to mess with things "that work", even when needed or the change improves reliability long-term. - Move fast and don't break thing.

You young people care about the environment, don't you?

- Many of the properties provided by virtualisation showed both benefits for cost-savings and reliability. Same goes for it's environmental benefits. - There are also other things that are relevant, put some effort into thinking about it.

Basically the same as for saving money on HW.

- Basically the same as for saving money on HW. - Besides the peaks, most hypervisors could likely be shutdown to electricity/produce less heat. For this to be reasonable, live migration and/or batch workloads are likely needed.

- One of the initial motivations for using virtualisation. - Most companies don't fully trust all their employees: even fewer their competition. Multi-tenancy is an important use-case to save money but hard to do in commodity general purpose OSes. - Simpler to just use VMs per customer/user/applications. Segue: Besides keeping things simple, limiting what the job for a host is has other benefits.

- From a infosec perspective, we want a small attack surface. Less services means less code means less risk for security vulnerabilities. - Perhaps not everyone in the IT staff needs to be root on every system? Still often the case, but for security sensitive organisations this is now an option. - Systems that perform a limited set of tasks are easier to restrict and monitor, lot's of noise otherwise. Segue: Some people have taken this reasoning to heart and very far.

- Qubes OS runs everything in VMs: there is no reason why your personal web browsing should be able to compromise your work for customer X. We'll talk more about Qubes later in the course. - Unikernels (or library OSes) builds a runtime environment which bundles an application and only enough OS to get things running. Great to minimize attack surface and keep things performant, low overhead for kernel context switching. - Wouldn't be reasonable if it wasn't for the predictable runtime environment that virtualisation provides.

- Being able to utilise snapshots to aggressively change systems (such as apply updates) has many security benefits as well. - Snapshots provide a known good state: extremely helpful if a system is hacked. Can also be used to dif in order to find what the hacker did. - Physical computers store a lot of state that can be changed and persisted even after an OS is reinstalled (or even if the disk is physically destroyed. - Also relevant for multi-tenancy: how had and who will have access to the server after you? - The server HW industry has started to make changes, but VMs are far supreme in this case.

Not all is a dance on roses.

- Software vendors that sell licenses are happy for virtualisation. - Quite easy to spin up new systems without thinking about the effort required to take care of them: they still need to patched and when they break humans take care of them.

- All the abstractions can introduce... interesting problems and breakage. - High-availability is usually best implemented near the application. A client that can handle reconnections and migration between remote endpoints is likely a way simpler solution. - Reality may not always meet expectations: migration can fail and introduce problems.

Basically what the slide says.

- All emulation has a performance cost. OS-level virt usually introduce way less, but still there. - Depending on workloads this can be a big deal: especially I/O intensive workloads tend to suffer. - You can't always know how well your instance will perform, has the HW is shared among others. One day can be fine while the next is shit. Hypervisors are almost always overprovisioned. - Lot of sensors and counters used for advanced performance analysis can't be observed from inside the guest, making debugging and performance analysis trickier.

Take some time to reflect before forcing more knowledge into your brain.

- Put some effort into this. - A lot of value for your learning by just putting what we covered into your own words. - Helps instructor to make the course better and clarify questions.

For those who wish to dig deeper into the things we've talked about.

- So far we've talked about the "soft" parts: pros and cons of virtualisation as a general concept - These apply to both HW-level and OS-level virtualisation

Clarify and remind students about required software setup for the course

Spaced repetition and generation

- So far everything we've covered applies more or less to both HW-level and OS-level virtualisation - Look at a few solutions and their fit on the market - We've been quite soft and general, this section will be slightly more technical - Remind students to ask questions if things are unclear

- ESXI is installed on HW just like you would a Linux distribution - Hypervisor cluster is controlled by vCenter - Live migration, vSwitch and vSAN makes setting up highly availble hypervisor clusters easy - Decent choice for SMBs with limited IT staff and high availability requirements - "No one ever got fired for buying VMware" - Owned by Broadcom, getting more expensive

- Built in to Windows - VM manager (similar to Virtualbox) can be activated as a Windows feature - Requires Windows 10 Pro/Enterprise or Windows 11 Pro. Included in Windows Server, but you often need to pay per VMs and/or CPU cores. It's boring. - Was quite late to enter the game and even later to become a competitive option, but works alright for both Windows and Linux guests these days - WSL == Windows Subsystem For Linux, ability to run Linux distros in highly integrated VMs - Clarify that this applies to WSL version 2, not WSL version 1 which is OS-level virtualisation - VBS == Virtualisation Based Security, examples are Cred Guard, Drive Guard and old Edge sandbox - VBS uses security properties provided by virtualisation and related HW features without running traditional OSes in VMs

- People use the term KVM incorrectly - Several different software components make up what people refer to - For a long while this was the most common solution for virtualisation on Linux servers - Red Hat makes RHEL (which Centos/Alma Linux/Rocky Linux is based on) and sponsors Fedora - Canonical makes Ubuntu Segue: Let's break these components down and see what purpose they serve

- If you have used virt-manager or virsh, you've used libvirt - Most often a client (such as virsh) talks to libvirtd running as a system service - Handles a lot of the Grunt work, configures routing/NAT rules etc. - The project shows it's age

- QEMU is very flexible, it can emulate obscure CPU architectures and ancient floppy devices - Most OSes expect a BIOS, hardware devices, etc: QEMU emulates several kinds of these depending on the OS's needs. - QEMU's primary goal (at least initially) was to be a somewhat fast and very flexible solution for emulation. These days, it's primarily used to run server VMs in data centers which is quite far from this initial goal. - It's a bit like the "Good/Cheap/Fast"-rule, you need lots of code to implement all these features and options - lots of code means high risk for bugs, worst case security vulnerabilities - A great example is the "Venom" VM escape vuln (https://en.wikipedia.org/wiki/VENOM), which affected code used to emulate a floppy device which was enabled in most (every?) guest. Most users in the current century probably never needed this feature, but it still was available.

- What makes QEMU become fast and usable, fully emulating a CPU and RAM is very slow - Not only used by QEMU, as we are about to see Segue: Before we get into other users of KVM, there is a project that only use two of the components (not libvirt)

- Makes use of QEMU and KVM, but not libvirt as it is a bit of a mess - Basically same installation procedure as for ESXI - Supports automation through API: Terraform and Ansible can use this - Besides HW-level VMs it can run lightweight OS-level using LXC, more about that later - Includes out-of-the-box support for ZFS, which provides great software RAID, snapshots and data corruption detection among other things - Supports many of the same clustering features as VMware ESXI, but not nearly as polished - Great for a similar usecases as ESXI, also for home labs - it's FOSS! - https://www.proxmox.com/en/proxmox-virtual-environment/overview

- Initially developed by Intel, but know governed by the Linux Foundation - Main focus is not to run everything like QEMU, but run modern server OSes - Supports Linux and since somewhat recently the latest Windows Server - Priorities are security and performance, not flexibility - Development is sponsored by several (and in some cases competing companies), such as Intel, ARM, AMD, Microsoft, ByteDance (developers of Tiktok) and Alibaba (probably largest non-US cloud) - Uses KVM on Linux to make things fast! - Coded in Rust, QEMU is written in C. Preferable for safety - Will likely be a reasonable replacement for libvirt + QEMU for most users soon Segue: Another user of KVM is Firecracker

- Firecracker isn't built to run general purpose server operating systems/workloads - Initially developed for Amazon's FaaS Lambda, later Fargate - Wanted the speed and low overhead of OS-level virtualisation but security of HW-level - Both Firecracker and Cloud Hypervisor are written in Rust, which is a language that help developers make safer and more performant applications. A lot of the low-level code in these projects share code, resulting in that they make each other better

- Originally a research project, released as FOSS in 2003 - High performance with focus on para-virtualisation (more about that later in the course) - Served as the base for AWS instances (now days they seem to use KVM more and more) Segue: Xen has an interesting design and is quite different from QEMU + KVM....

- https://en.wikipedia.org/wiki/Hypervisor#Classification - https://wiki.xenproject.org/wiki/Xen_Project_Beginners_Guide - TODO: Add better speaker notes for this somewhat complicated topic

- TCB == Trusted Computing Base, HW and SW that if compromised could affect the overall security of a system. Kernel level software, such as drivers, and components with firmware often have more or less unrestricted access to memory, so a compromise of them could allow an attacker to gain full control of all software running on the machine - Having a small TCB is desirable as less code is easier to test and audit for security issues - As Xen acts as a thin layer and lots of functionality normally associated with a virtual machine manager is delegated to dom0 (the privileged VM used to manage the system and other VMs), Xen has a relatively small TCB - If you wanna try it out, the easiest way is probably to install XCP-ng (https://xcp-ng.org/) Segue: Due to these properties, Xen was chosen as the virtualisation technology for Qubes OS...

- Qubes OS is a FOSS desktop operating system with focus on security - Utilizes Xen virtualisation to provide security compartmentalisation (expl. will follow) - Co-founded by security researcher Joanna Rutkowska - Allegedly used by Edward Snowden and lots of other paranoid/security concious users, such as Freedom of the press foundation, the Mullvad VPN company and the organisation running "Let's Encrypt" Segue: To demonstrate the OS, I've set it up on a laptop and taken some screenshots in action

- What you're seeing is the default Qubes desktop when booted and logged in - Not a pretty sight, made slightly worse by trying to increase font size for the demos - Shipped with the XFCE (https://xfce.org/) WM, but also available with i3 (https://i3wm.org/) Segue: Things look a bit more interesting with applications running...

- The screenshot is a bit messy and probably doesn't make a lot of sense right now - Most important is to make note of the border color of the windows (red, grey, yellow and blue), these let us know which "qube" (their terminology for a VM) the application is running in - Each qube acts as a of a security boundry: The video player running in the "personal" qube can't access files or processes running in the "work" qube - The same is true for the "vault" cube which is provided to store very sensitive information in, such as private keys and password databases. This qube is further restricted as it isn't even provided with network access to minimize the risk for leaks - Qubes shines in the way qubes are integrated with the desktop and each other Segue: Lets clean up a bit and talk about how you use Qubes...

- Scrot shows the Qubes menu (a bit like the Windows start menu) - The list shows Qubes installed on the system and have sub-menus for starting applications in them Segue: Let's start a file browser in the "work" qube...

- Qubes utilise snapshots and temporary VMs in some very neat ways - If you have an untrusted/potentially malicious file, such as a document received via email or a program downloaded from an suspicious site on the Internet, it can be a risky thing to open/run it - The scrot shows a menu in the file manager that allows us to view the document in a "disposableVM"

- Once selected, we get a status notification telling us a qube is started - A fresh new VM is created (from a template) just for viewing this document, and it's quite fast

- Notice the border color (and title) of the document editor window, it's different from the file browser in the "work" qube - Even if the document contained malicious code, it would be restricted to data and processes in the disposable qube (which is likely very little of interest)

- Once done reading the document and the window is closed, we get another notification that the disposable qube is shut down - It is not only shut down, but also deleted - This enables us to feel quite comfortable that any malicious backdoors are also deleted

- DisposableVMs are just like other Qubes, except that they are created (from a snapshot) on the fly and destroyed once all running apps in has been closed - Using a disposable qube for web browsing is quite a good practice for the reasons described in the previous example Segue: Qubes OS comes with a bunch of default qubes as we've seen, but you can easily create more

- The "Qubes tools" section of the menu contains utilities for managing the system - Lets select "Create Qubes VM"

- Choose a name and a border color for the Qube: a qube per customer and/or project is a good thing - We can also select which template to use for the VM, Qubes OS comes with Debian and Fedora, Kali Linux (https://www.kali.org/) can be installed as well (a good choice for penetration testers)

- We can also select which type of qube to create - "TemplateVM" is a qube which you can install software in, configure and use as the base for other qubes/VMs - "AppVM" provides a persistent home directory for the qube, but is otherwise cloned from a VM each time it is started. If you want to install additional software or similar, these will likely need to be added to the template VM it is based on instead - "StandaloneVM" is also based on a template, but is once created fully persistent and can be modified without performing changes to the template. A good choice if you need to setup some complex software that you don't wanna include in your template VM, but it will consume more disk space and be a bit slower to start than an "AppVM" - "DisposableVM" is just what we talked about previously Segue: Qubes OS provides some tools that help us the desktop without giving up on security

- A quite common need is the ability to copy files between VMs - Qubes OS shows a dialog in the grey dom0 (administrative VM, basically not a regular qube, could be considered "the host" to keep things simple) that tells us that VM X wants to copy a file to VM Z, which prevents qubes from moving around data without the user noticing

- Another example is the Qubes clipboard, which controls which VMs can access copy/paste data - This may seem silly, but the clipboard could contain sensitive information such as passwords

- The same goes for peripherals such as microphones, webcams and USB storage drives - Sure, there are likely use-cases where you want VMs to access to them, but not all of them and not all the time - You probably want your work qube (or disposable) to access the webcam and microphone for meetings

- It may be a bit hard to see due to scaling issues, but the upper right menu bar contains a notification area - If we click on the red signal strength indicator, we see drop-down menu appear with a red border containg network interfaces and visible WiFi networks - The laptop's network interfaces are forwarded to a dedicated qube that provides networking for other VMs (qube "sys-net" by default) - It's actually the same for USB and other peripheral devices as we just saw (qube "sys-usb") - Qubes OS does it this way to protect the system from attacks delivered via network and external devices - If for example firmware in the wireless card is compromised by an attacker on the same network, the attacker's access is limited to the sys-net qube. The attacker would likely be able to intercept network traffic, but not much more and the traffic from other qubes is hopefully encrypted

- You can actually see this isolation by listing network interfaces in dom0, which has none besides the loopback interface Segue: We can take a look at how this is done by opening "Qubes manager" in the tools menu

- This quite messy application shows us information about qubes configured on the system Segue: It also allows us to edit their settings...

- The scrot shows the configuration windows for a qube - Allows us to modify VM settings in a similar fashion to other VMMs

- By clicking on the "Devices" menu, we can see PCI devices forwarded/dedicated to the qube - The current example shows how the laptop's two USB controllers are dedicated to the "sys-usb" qube, which we saw in a previous example - Similar configuration exist for the networking VM which has a wired and wireless NIC forwarded - You could forward other devices as well to qubes of your choice, such as a GPU or studio sound card

- The qube settings also allows us to configure network restrictions for the Qube - This can of course be done in other ways inside the specific qubes, but it's quite neat to have the firewall functionality so easily accessible - Firewalling is actually by default done by a separate qube - you can have multiple of these, some which are connected to VPN for example, forcing all qubes configured to use it through the tunnel or proxy

- If you fancy the terminal or want to automate things, everything showed in the demo and much more can be done with CLI tools

- While Qubes OS is really cool and I recommend everyone to try it, it's not problem free - To run qubes smoothly, you need a somewhat beefy computer with CPU virtualisation features that are not available on all consumer systems - check out their system requirements - While working quite well, there is a lot of overhead with all qubes running - if the "system" and "service" qubes are included, there were 12 different VMs running during my demo - Especially RAM is a thing you'll need quite a lot of - For security reasons and the way windows from different qubes are shown, there is not 3D or hardware codec acceleration available. You can't really game or do things like CAD on Qubes unless you forward a dedicated GPU to the qube (which is seldom completely trivial) - Lot's of things to manage, including several templates that must be patched and maintained, requires a bit of knowledge of the underlying technology - Qubes dom0 kernel is based on a patched older version of Fedora, which means that it rarely has good support for new hardware. If you are gonna use qubes, you have better chances with a device that has a few years on it's neck Segue: Even if you consider these problems deal-breakers or find it slightly too paranoid, there is still a lot of things to learn and adopt from Qubes' design....

- Even if you don't use Qubes you can still have a snapshot'ed VM that can be considered disposable - Why not do your web browsing in a VM that can be nuked and paved every day? - If you don't want all your Internet traffic to pass through a VPN, why not setup a dedicated VM for it? If you setup two VMs, one as a router/gateway and the other as a client to it, you can restrict connections and minimize the risk for leaks - Basically what the slide says

- Virtualisation is in general very beneficial for security, as we've discovered during the course Segue: It does however have it's downsides and doesn't solve every problem...

- Guests depend the host's security, as it is in full control of their execution* - The hypervisor can snoop on the guests' disks and memory. This could be used to steal sensitive information such as credentials from memory and confidential databases from disk - From the perspective of the guest, it is in general not possible to inspect the security of the underlying hypervisor and it's supporting infrastructure. Instead, users are forced to more or less trust the infrastructure operators - This problem also exist for physical servers that are colocated in a third-party's data center (which is the common thing these days as data centers are expensive facilities to own/operate), but in many cases this requires physical access to the servers (which are usually in a "locked" rack or cage). A malicious actor who have access to a hypervisor or the virtualisation control plane (think vCenter or similar) usually has a far easier time gaining access to VMs - Add brasklapp about the terrible state of BMCs/IPMI/HW management interfaces Segue: So what can we do to limit trust in the operators of physical infrastructure?

- Disk encryption can be used to protect data at rest - If someone would walk away with the hypervisor or network storage appliance that the guest's disk image is stored on, at least it would be encrypted - Relevant as we don't necessarily know how the operator handles backups of data and HW decommissioning/broken disks (perhaps they are sent to a third-party vendor?) - Regardless the key needed to unlock the disk must be in memory of the guest that access the data - Probably stored somehow as we want to be able to reboot servers without entering a password or similar every time. Encrypted storage on servers, especially OS disks, is a bit of a mess - If the decryption key for the data is available in the guest, chances are that the hypervisor can access it as well - either through snooping of memory or by manipulating storage for the guest operating system

- Infrastructure operators, regardless if they are your company's IT department or a third-party hosting provider, can do things to earn trust even tough we can't inspect security from the guest - Allow independent auditors to inspect and test security controls/processes - Not just technical things such as patch levels and network restrictions, but also physical security, background checks of personnel and similar - As a wise and pessimistic man once said: "trust is just a word for lacking security" - Laws and industry compliance frameworks may prevent organisations from letting a third party host/control their infrastructure even if they want to - Even if operators are not actively malicious and auditors didn't find any obvious problems, systems can still get hacked. New vulnerabilities and attack techniques are discovered every day

- Many different organisations would like to solve the problem that sounds a bit like a pipe-dream: not having to trust the computer your software (in this case VM) is running on - Organisations doesn't necessarily have the skills, resources or interest in operating their own infrastructure - Many would like to a cloud provider, but don't feel comfortable giving away access to their data - Cloud providers would very much like to take these peoples' money - Large and security concious organisations would like to save money by using internally pooled virtual infrastructure, but have some many different security requirements that it is hard to do so - Growing interest in edge computing (https://en.wikipedia.org/wiki/Edge_computing), which often means that servers will move from relatively safe data centers (from a physical perspective) to small service lockers next to the highway or in the forest. It may be tricky to prevent malicious actors from physically gaining access to the HW in these cases - Would likely also be way to expensive for everyone with "edge needs" to own their own geographically spread out infrastructure. Sharing will be necessary if not only the biggest players wanna utilize it.

- OT intro: The benefits of disk encryption is widely understood, but many don't know that keys stored in RAM can often be extracted using so called "cold boot attacks" (https://en.wikipedia.org/wiki/Cold_boot_attack) or similar methods - Data is typically stored unencrypted in RAM (or otherwise the keys to decrypt it are stored in RAM next to it), which means that an attacker with physical access to RAM may be able to extract it - This is obviously bad if someone gains physical access to the hypervisor, but the hypervisor can as previously mentioned most often snoop on the memory of guests and even manipulate it - Processor/chipset vendors started introducing features that encrypt all data stored in RAM with a random key generated per boot by the CPU (in which the key is, supposedly securely, stored) - Initially, the goal was to prevent against physical attacks as described above, but new features were added that generated a random encryption key per guest (or rather the area of memory dedicated to each guest), which in theory would prevent the hypervisor (and other guests which may have been able to escape/breakout) from peeking at it - Hypervisor can still modify executable files on the guest disk or use a number of other techniques to gain access, which makes the memory encryption a bit irrelevant - AMD's naming scheme is a mess, see https://developer.amd.com/sev/ Segue: The dream lives on and since then new features have been introduced which tries to address some of these vulnerabilities/attacks...

- HW vendors switched focus from just memory encryption to a more holistic approach - These features would not only prevent hypervisors from messing with guest memory, but also CPU registers/state - Mayhaps most importantly, they provide a method for guests to verify during startup that they actually running in this encrypted/isolated environment - Once the guest knows that the coast is clear, it can decrypt/process sensitive data - Quite complicated how this works on a technical level and a bit out-of-scope for this course Segue: This sounds wonderful and almost to good to be true...

- These HW features have just started getting upstream support in the Linux kernel and virtualisation stack - will take a while before they are fully rolled out and stable - Verified/attested boot chains, which seems to be requirement for guests to run without trusting the host operating system, is far from problem free. The Linux implementations for secure boot (which is one of the pieces) seems to be motivated by getting it run on all HW, not making things actually more secure - In practice, we are moving trust from the hypervisor (which is often based/running on FOSS and auditable code) to CPUs running closed-source firmware with a bad track record - These isolation technologies are not perfect and vulns have been discovered in them. Intel SGX, which is a similar technology that has been on the market for quite some time, have had several flaws that completely break the security promise Segue: So, should we just give up on these technologies?

- The security of a system should be multi-layered: technologies such as Intel TDX may be able to stop an attacker that has managed to get a foothold on a hypervisor - Everything that improves the overall security of a system should be considered, even if it's not a silver bullet - Perhaps it prevents companies from building business models based on snooping - The main danger is the promise it makes and how people interpret that

- We've talked about the many benefits provided by virtualisation - Performance is not one of them - There are ways however to make the overhead of HW-level VMs smaller

- One of the largest gains is to move things to HW (seems a bit counter intuitive) - Emulating a CPU in SW is complex and usually quite slow - Supported by most CPUs these days, features allowing direct access to PCI devices (such as GPUs) may only be accessible on workstation and server systems - KVM utilise and expose these hardware acceleration features - In order to allow live migration of guests, we can't forward HW devices or use fancy CPU features that may not exist on the target hypervisor. Feature masking enables a host to hide CPU features from the guests

- Back in the days when virtualisation was new, hypervisors emulated commonly supported devices such as NICs and SATA HDDs - These days almost all servers are run virtualised and it seems a bit silly to pretend they are HW - Para-virtualisation is about stopping the pretending while still benefiting from virtualisation - Virtio devices are supported out-of-the-box by most modern operating systems running as guests - Usage of these can drastically improve performance/lower load on the hypervisor

- These techniques don't improve performance, but the minimizes the overhead cost - Deduplication == Only store one piece of data ones, for memory/RAM pages and for disks blocks - If a hypervisor is running 100s of Windows VMs, chances are that a lot of storage space can be saved if duplicated files can be deduplicated - Same goes for RAM, many VMs will likely run the same Linux distro and store the same kernel in memory - Deduplication can be costly from a performance perspective, but sometimes that is overweighted by the potentially huge savings - "Offline/Post-process" deduplication is often way cheaper from a performance perspective than "online/in-line" (https://en.wikipedia.org/wiki/Data_deduplication#Classification) - Memory ballooning allows the host to signal to guests that they can use more memory/should clean up their caches to save memory during runtime. This enables hypervisor operators to overprovision memory slightly more and make each hypervisor more cost effective

- Let's take a step back and think about the security pros/cons of virtualisation

- Reminder: Focus on security aspects of virtualisation, not the rest

- Take your time to think these over, it's good for your learning - If you have patients for it, revist the questions in a week or so to further engrave them into your knowledge

- As we learned, virtualisation has many benefits related to security, economics, reliability and environmental aspects - HW-level virtualisation == Emulation of a computer with the goal of making an operating system run on it like it would on real HW - Has anyone tested Proxmox, Qubes or related?

- Basically course progress status

- Some of the following topics will be more complex and technical - Remind students that they need to spend time studying outside of lessons - Ask for help and clarifications - Remind students that they can help make things better by clarifying things for future students, no stupid questions exist!

- Throughout the course, students have been asked to submit their reflections after each segment - Several things were brought up that seems to have left impressions, many found Qubes kool and were surprised to find some many benefits of virtualisation - Several students requested a glossary - Questions and other feedback came up that I would like to take some time to address before we continue progressing into OS-level virtualisation

- We've talked a lot about the security aspects related to virtualisation, both good and bad - One of the exercises included argumentation against virt from a security perspective - Why many of these arguments are true, for most use-cases the pros far outweigh the cons - Talk a bit about the pros - Hypervisors are usually configured in large cluster setups with a single control plane (like vCenter for VMware), which involves some risks. While this is usually cost effective, you can run hypervisors more or less independently to mitigate the risks. It's not an inherit problem with virt - Performance and complexity probably has the strongest argument against virtualisation

- Students showed interest in Qubes OS, but don't know which HW to run it on

- In the beginning, hypervisors wanted to be able to run the OSes everyone was using - To make things smooth, they emulated devices that were commonly supported/had drivers shipped with the OSes - These days almost all servers are virtualised and have been for quite some time - Purpose built virtio drivers are included in most modern OSes that talk to a virtual device that doesn't pretend to be physical (abide by the laws of nature and act in ways that are fast in HW but slow in SW) - This removes unnecessary emulation overhead, which results in lower load on hypervisors that can be translated into heigher VM density and thereby economic benefits. Segue: On the topic of performance, there may be need to clarify some terms commonly used...

- Upscaling: Making systems powerful/able to handle more load - Downscaling: Making systems less powerful/able to handle less load, but saves money - Vertical upscaling is done by adding more CPU cores or what ever resource is necessary to a VM. Some hypervisors and OSes supports doing this while the guest is running, other require a reboot - Horizontal upscaling means deploying more VMs running the same application and using a load balancer or similar to spread user traffic against the servers - Vertical upscaling only takes you so far, eventually you'll outgrow the physical server. Not all OSes and apps handle super big instances very well - Horizontal scaling may be a better choice and adds redundancy, but adds additional complexity

- Up until this point, we've mostly talked about things from a HW-level virtualisation perspective Segue: May be time to refresh the differences between HW-level and OS-level

- In general, developers and companies don't care much if the OS believes it's running on HW, they just want to run their application stacks - The goal of OS-level virtualisation is to convince users and applications, not OSes - OS-level instances are often called "containers", I try to not use the term as it is a bit confusing and could mean mean other things (more about that later) Segue: To further complicate things, we can create sub-groups of OS-level virt...

- Basically what the slide says

- Syscalls are translated from Linux to FreeBSD or Windows syscalls - Some other parts of the system are also emulated to create the illusion of running on Linux - Linux on x86_64 has over 300 different syscalls that have many different options. It's quite hard to fully emulate this and implement the same quirky behavior as on Linux - FreeBSD is stuck on an old version of Linux and WSL v2 changed to running HW-level virtualised Linux kernels instead - Wine has gotten a lot better during the last couple of years since Valve (developers of Steam and games like Counter-Strike) got involved to make software run on Steam deck (https://www.steamdeck.com) - Far from problem free, but can be very useful

- Our main focus will be virtualising the same OS as the host, this is by far most common

- What motivates us to use OS-level instead of HW-level? We already got a solution Segue: Well, there are several benefits...

- As we don't need to pretend to be hardware, we can get rid of emulation layers that cause a lot of overhead. Applications can get more or less direct access (while restrict) access to actual HW - When running HW-level virt, each guest runs it's own OS kernel. This requires RAM and processor capacity. - The same kernel is used by the host and all guests, shaving of the overhead - This also enables concurrent sharing of HW devices that could otherwise only be used by one HW-level guest at a time - Efficiently sharing resources among HW-level VMs is hard, as the hypervisor has fairly limited knowledge what goes on inside each guest. Are they using RAM because they need to or because of caching? Are the processes using CPU batch jobs that could be give less priority? - For the hypervisor, OS-level virtualisation is a bit like a one-way mirror: it can see everything going on in the guests but not the other way around - This allows the hypervisor to treat all programs running in guests as regular processes and efficiently schedule them - All in all, these shared kernel benefits allows much higher density of guests Segue: The one-way mirrorness has other benefits as well...

- While it's true that HW-level hypervisors can typically snoop on guests (as we've discussed), in-practice it is a bit cumbersome and involves quite a bit of overhead - As the host has great insight into what is happening in OS-level guests, activity can be observed - Falco is an Intrusion Detection System (IDS) for OS-level virtualisation on Linux (specifically containers, we'll talk more about that later). You can configure it to alert you when specific applications are run and abnormal behavior occurs: Should your web server really spawn a port scanner process? Segue: Enough about the benefits, what are our options?

- We've already talked a bit about FreeBSD jails (see "Types of virtualisation" presentation) - Limit ourselves to Linux, no really wide-spread solutions for Windows - Virtuozzo is often shorted to just VZ - Very early solution that showed great potential - Required use of proprietary software with license fees and a heavily patched Linux kernel, which mayhaps prevented it from being more widely used - The core was open-sourced as "OpenVZ", still quite tightly tied to the SWsoft company - Still quite popular today, especially in web hosting and for cheap VPSes as it allows great density of guests on a single node - A bit quirky behavior sometimes, applications didn't always behave as expected. Full HW-level virtualisation probably seemed easier for many users (as they were used to one OS on a physical server) - As stated, one of the things preventing wide-spread adoption was that the development and community was heavily controlled by SWSoft. You had to use a kernel with their patches, which was somewhat limiting - LXC was initially developed by IBM and other parties with the goal of only using (or at least directly upstreaming) functionality in the Linux kernel (more about that later) - Canonical are the makers of Ubuntu and it's where its most tightly integrated

- Let's look at how LXC can be used and what neat things it provides - LXC == Container technology, LXD == service to manage it and other things (think libvirt) - Far from a complete demo of the feature set and I don't expect you to remember the commands, more of a tasting to see what it's capable of

- Installation is trivial: If you're on Ubuntu, chances are that it's already installed - As Canonical is lead, Ubuntu has the best integration. The upstream project recommends install via snap (https://snapcraft.io/), but some other distros have native packages - Once installed, the "lxd init" command allows you to enable lxd and customize settings, such as cluster configuration. Passing the "--minimal" flag provides automated configuration without any questions

- Before we can create a OS-level VM (or "system container" as they are called in LXD lingo) - The projects builds images for a large number of different Linux distros, you aren't just limited to the one running on your host/hypervisor - The example command shows a randomized sample of distros - These are preconfigured, no need to go through an install wizard Segue: Let's select one and get it started...

- In this example we'll create an instance based on the Alma Linux image (successor to Centos) - Ofc there are many options that can be tweaked, but creating an OS-level container can be this easy and (if the image has already been downloaded to the host before) deployed in a few seconds

- Once created, we can execute an interactive shell in the guest to start running commands - The example output shows that the host is running Ubuntu 22.04, while our newly created guest is a Alma Linux instance Segue: This is neat quite neat and sure it was easy to get started, but previously density and low overhead was raised as the main advantages...

- The command shows a truncated version of the info command, which among other things show resource usage of the instance - Granted that we have not installed any particular software or service besides what is provided by the container image, but it currently only consumes 2.6MiB of disk space and 55MiB of RAM - The disk usage in this output does not include the Alma Linux image, but the kool thing is that it will be shared by other instances using the same template - In other words, the only per-instance storage requirements are changes or files we add after initial deployment - We'll talk more about this sorcery later in the course

- We can do most thing's we expect out of a Linux server, such as install and run software - "srv-2" is based on the "Alpine Linux" distributions, which is very minimal and popular in the space

- As with HW-level VMs, we can create snapshots on regular basis or before we do stupid things - The can be disk-only or contain runtime state (memory, CPU registers, etc.) Segue: As briefly mentioned, LXD has built-in clustering functionality which is quite easy to get going (especially compared to other virtualisation solutions)

- The example cluster contains three hypervisor nodes - Let's say that we want to perform service (upgrade, repair, etc.) on a node - The "cluster evacuate" sub-command migrates all instances running on the specified host and disables scheduling on it - The example shows how our Alma Linux instance is moved from "node-1" to "node-2"

- Runs on a Raspberry Pi, no need for fancy hardware. If you have a few you can setup a cluster! - As noted, LXD clustering is easy to get started with. The "lxc init" wizard can be setup an overlay network for you, which enables communication between guests on different nodes regardless if one of the hypervisor nodes is in your basement and the other on a VPS in a public cloud - To make things a bit more confusing, LXD supports HW-level virtualisation with QEMU + KVM since a few versions ago. While still a bit early, it's nice to not deal with libvirt and make use of the easy clustering - LXD brags about live migration of both OS-level and HW-level guests. This doesn't always work, especially not for OS-level guests. As a rule of thumb, expect the ability to live migrate HW-level guests but not OS-level guests (they are however usually quite fast to offline migrate)

- We'll take a peak into which features of Linux that enable solutions like LXC and VZ to work - Don't expect you to remember all details and commands, but these isolation features serves as the for OS-level virtualisation security.

- There is no spoon - Unlike OSes such as FreeBSD and Solaris, there is no kernel concept of a OS-level VM in Linux - For better or worse, a hodgepodge of technologies are used together to build something that looks like an OS on a real server - The flip side of this is that the features can be used to achieve other kool things outside of the system virtualisation space - We won't cover all technologies, but rather focus on the core ones Segue: We have to start some where, may as well travel back to the epoch...

- Somewhat mysterious origins - Theory is that it was originally developed to aid development of new UNIX OS versions - Syscall to change a process (and sub-processes) view of the file system three - Prevent processes in chroot from accessing files outside of the specified directory - Not designed for security, as were about to see Segue: Probably makes more sense with an example...

- Debootstrap is a tool to download and "install" a Debian-based distribution into a directory - Used by installers and similar during system setup - In this example Debian 10 (codename "buster") is installed to the directory "my_debian_root" - When listing files in the directory, we recognize these from the file system root ("/") - This will serve as the base for our chroot

- To show that this actually, OS information is printed on the host and in the chroot ("guest") - The CLI program "chroot" uses the "chroot" syscall - Spawn an interactive shell in the chroot (bash) - Notice how the prompt change, this is Debian's default prompt - The file "/etc/os-release" in the Debian chroot is actually "my_debian_root/etc/os-release" Segue: This kind-of-works, we have some sort of proto-virtualisation going on here, but as I said chroot was not designed as a security feature...

- The same kernel is shared inside and outside of the chroot - root is still root and can list/kill/manipulate processes, read sensitive kernel logs, dump network traffic, load kernel modules, restart the host, etc. - The only thing that is isolated is the file system Segue: How do we solve this? Kernel namespaces is a part of the solution...

- Since the early 2000s, the Linux kernels namespace functionality has grown organically - Allows us to create new and isolated "views" of parts of the kernel, such as users, file systems, network interfaces/routes processes - Members in this case is one or more processes (including their sub-processes) - Works like a one-way mirror, the host can see what is going on inside all namespaces, but their members can only see things in their dedicated namespace(s) - There are currently eight different ones and new ones are in development, we'll focus one the ones that are most important and/or easy to understand - Cgroup, IPC, Network, Mount, PID, Time, User, UTS Segue: This is a bit complicated to explain, some examples may help...

- All programs that are started on a Linux system (processes) gets their own process ID (PID) - The PIDs start from 1, which is the first program started upon boot and typically an init system like systemd or OpenRC - A user can view/kill/interact with their own processes, root can do it for all (by default) - A malicious program could do bad thing with other processes - The PID namespace gives it members it's own listing - a PID inside a NS may be 5 from the perspective of the host, but 1 inside the namespace - This means that namespace members can only mess with their own processes

- Listing of processes inside and outside of the chroot, showing their the same

- "unshare" and "nsenter" are two good CLI utilities for playing around with NSes - In this case, we create a new PID namespace that the chroot command is executed in - The example shows how PID 1 is different in these contexts

- The network namespace may be the most versatile one - When a new network NS is created, no network devices are in it and it got it's own routing table, firewall rules, etc - We can either move one or more physical interfaces into the NS or virtual interface route through the host namespace - This is how OS-level guests are network isolated from each other, but very useful in other cases as well as the slide says TODO: Write better descriptions of non-virt use-cases

- Read the first sentence five times fast! - As mentioned, root in the chroot can still mess with the shared kernel - Many things expect to be run as root/UID 0, but don't usually need it as they only poke around on the file system ("my_debian_root" in this case) and won't use dangerous/fancy kernel features - Like the PID NS, members of a user NS get their own account list: UID 0 (root) for the NS members may be UID 2031 on "the host"/outside of the NS Segue: Could talk about NSes, let's move on to cgroups...

- Control groups - Limit how much resources a group of processes can use - Possibility to pin the on specific CPU cores for example, which can be good for performance and security (prevent "noisy neighbors" on the CPU core, keep cache warm/isolate) - Supports a feature called checkpointing which can be used to snap the state of processes in a cgroup - this state can be restored or migrated to another host (in some cases hehe) - Heavily used by systemd, all services and users run in their own cgroup to allow granular resource measurements/restrictions. Also good if you wanna kill a service and don't leave zombie process lying around (https://en.wikipedia.org/wiki/Zombie_process)

- Mentioned system calls (syscalls) several times during the course: the way user applications tell the kernel to do things, such as open files or network sockets - x86_64 Linux expose over 300 different syscalls, some of which are simple and others with complex configuration options (syscall arguments) - Large attack surface, some are know to break isolation. We don't want a guest to be able to issue a syscall that reboots the system, do we? - As always, we want to limit the attack surface: most applications don't use too many of them any way - Some applications, like the Firefox rendering process, do this to sandbox itself and limit potential things an attacker could do if compromised Segue: We've talked about how OS-level virtualisation was partly introduced as making proper multi-user/admin restrictions to tricky. Well, we've tried to do that also...

- Does root in all cases need to be able to do everything? - Doesn't it seem stupid to give a user root privileges if they "only" need to modify network settings? - Capabilities can be given/taken away from processes, including root processes - It can also be configured as an extended attribute for executable files - A typical historic use-case was to run network services as root if they needed to listen to a port beneath 1023 (https://www.w3.org/Daemon/User/Installation/PrivilegedPorts.html). This is an unnecessary risk, as an attacker would often gain complete root access if the service was owned - The capability "CAP_NET_BIND_SERVICE" also non-root processes to do this - chroot normally requires root privs, but there exist a capability to change that - The available caps often allow much more functionality than may be needed. Some like "CAP_NET_ADMIN" have known flaws as it exposes functionality that can be used to gain unrestricted root access. It's tricky to bolt on these types of restricts after a system has been built and the interest in fixing it is limited - Removing caps for a OS-level guest can minimize the kernel attack surface

- We've only scratched the surface, a lot of other things make up OS-level virt - Could go more in-depth if interest and time exist - Understand that it's a lot to take in, especially if you're not super comfortable with Linux - As noted, you're not expected to remember all of this in detail. If you can only take away one thing, that is OS-level virtualisation on Linux is comprised of several parts and not developed as one feature, which will lead to some issues as we're about to see

- Important to be aware of when OS-level virtualisation may be a bad choice - We've touched on it a bit and some of you may be able to guess: please do!

- Live migration is a feature that, in some environments, is heavily used to minimize disruptions. While this in-theory works and even sometimes in practice, it's hard to depend on it - Offline migration can still be used, but may not be good enough - It's nice that we can run different flavors and versions of distros, but a ten year old Ubuntu version may behave slightly different from a modern Fedora kernel - The kernel is shared among the host and all guests. It quickly becomes a lot of different processes and not all resources (like inotify watches) are properly name-spaced. Cgroups doesn't regulate everything. This is usually less of a problem with HW-level VMs, as the host kernel is doing far less different things - A bug in an obscure syscall could crash the kernel. As the kernel is shared, it would affect every guest and host

- As OS-level virt was not coherently developed for Linux, there are gaps between the different isolation technologies such NSes and LSM (https://en.wikipedia.org/wiki/Linux_Security_Modules) - All the syscalls pose a huge attack surface. Quite complex compared to what a HW-level hypervisor needs to do - Syscalls and capabilities that are known to be dangerous (break isolation) are black-/deny-listed - A far better approach when it comes to security would be to white-/allow-list known safe ones - This probably break a lot of apps and make OS-level virt harder to adapt

Take some time to reflect before forcing more knowledge into your brain.

- We've touched on this, especially related virtual appliances - Let's dig a bit deeper to understand what the problem really is

- Take your time to think these over, it's good for your learning - If you have patients for it, revist the questions in a week or so to further engrave them into your knowledge

- So far, we've been going through a lot of theory - We're about to put that knowledge to the test

- Labs will be based on different scenarios you may encounter in real life - Just like in real life, no one will tell you exactly how to do it. Then the job should be replaced by a shell script - No one knows every on-top of their head, feel free to Google, discuss with class mates, look through old slides and ask for support - Let me know how you approached the challenge and why you chose the solution steps you did

- We've talked about virtualising servers and running VMs on the desktop - Why not virtualise the desktop? - Many do and any many more have tried, turns out that what seems like a good idea may not be so - VDIs popularity comes and goes. Segue: But why, whats the idea?

- In the beginning, there were large computers with user facing terminals connected to them - These terminal were little less than a screen, keyboard and perhaps a futuristic looking mouse - Didn't have much of choice at the time, when computers became cheaper everyone started getting their own. This represented a decentralisation Segue: What are the benefits with centralising people's desktops? Many are the same as for servers

- As with server workloads, we can often overprovision the hypervisors running virtual desktops - Margins need to be a lot smaller. If everyone has the occasional need to perform a performance critical task, everyone needs a somewhat beefy machine. Chances are that these needs are not concurrent for all users, meaning that pooling makes a lot of sense. - The same goes the other way: a virtual desktop could be a lot more powerful if the guest is running on server HW than would be possible to fit in a light-weight laptop - Powerful accelerators, such as CAD GPUs, could be plugged into energy-efficient servers in climate-controlled data centers instead of being crammed into bulky desktop workstations - Access to powerful computers on demand make a lot of sense and, at least in theory, would save a lot of money for each individual user - Some software has very annoying licenses and/or very specific runtime environment requirements - Especially if there is only the occasional need, sharing this SW through virtual desktops could make a lot of sense - Basis for streaming gaming services such as Google Stadia and Geforce Now - Many cloud providers offer powerful virtual desktops with accelerators for usage on-demand

- IT admins can use scripts to automatically create and update virtual desktops when needed - Supporting users with physical desktops around the world can be a PITA - If the virtual desktop has been corrupted or suspicious software has been installed on it, snapshots and/or image templates can be used to quickly return to a working state - User stores important/sensitive data on their local HDDs, USB drives and similar. Besides the risks of these getting stole/lost, backing up the data can be hard - You can't really steal virtual desktop. When a physical laptop is stolen/lost (especiall if powered on), lots of work is required to make sure that access is revoked and that the sensitive information was actually inaccessible/encrypted at the time

- So far we've talked about these things very abstractly - Usually the virtual desktops are run with HW-level virtualisation and in some cases OS-level - How to users gain access to these? They need a network connection, but what else?

- Boring low-cost low-energy computers with one job: connect the user to a central computer - Designed to be low maintenance - Connect a screen, keyboard, mouse and network connection - Typically mounted behind a screen or under the table. Commonly spread throughout an office/campus - Used to all look more or less like in the picture, these days many look like a ChromeCast - Quite need, especially if combined with a smart card: just plug it into any device in any room and access your desktop how you last left it

- If you've used RDP, it's a lot like that - Can be quite nice for users that work from multiple devices in multiple locations: at least their desktop/running apps become consistent - Quiet cheap if users can utilize their own devices to run the client on - IT doesn't need to maintain any desktop hardware, yay! - Popular with economically constrained and globally diverse organisations Segue: This presentation wouldn't be part of the course if we didn't end on a low note...

- These days, its quite uncommon that users stay in one physical location all the time. They go to visit customers/providers, work from home and need to participate in the occasional off-site - While some of they may be able to survive on a tablet, most will need a laptop. Now you have to manage twice the amount of computers and the wins of VDI get a lot smaller if not nullified - Requires a decent network connection between the client and virtual desktop server. Especially crucial is latency: the experience quickly becomes painful with just a little bit of input lag - The biggest user critique against VDI is the experience: if the network is not great, things feel sluggish. This somewhat mitigates the "work from anywhere" argument - While being quite common to combine BYOD and VDI client apps, it may not be desirable from a security perspective. The org has no or very limited control of the user's endpoint device. While sensitive data may "only be streamed" to the client and not stored on the user's personal device, security critical information such as passwords or other credentials are still inputted using the user's keyboard. In practice, devices that run VDI clients must be trusted - When the VDI solution fails, it often prevents a lot of people to work at the same time. While users are typically very dependent on shared services such as email or internal file sharing, at least some tasks may be available if these are unavailable. If VDI is inaccessible and a requirement, productivity will likely be severely lowered (especially if thin clients are used)