Virtualisation course: Introduction

- These days almost everyone virtualise their systems, an ongoing transformation for the past 20y - Why they do it: basically what the slide says - Enabled the success of companies such as AWS and countless startups - Learn about the different kinds of virtualisation technologies and their pros/cons

- We'll cover lots of things in a short amount of time - In order to be able to do this we'll use scientifically proven methods to Make It Stick - Basically what the slide says - Don't forget to have fun!

- There are several resources to help you learn - Speaker notes in slides are heavily recommended for recaps/deep diving - May also be available through LMS, depending on how the course is consumed - The course is designed to be instructor lead, won't make the most of it on your own, see as aid

- Encourage participants to make the course better - Learners are likely the best to provide critique, lecturers are likely a bit home-blind - No cats or dogs allowed! - Feel free to share it with friends or use it yourself later in your career

The course wouldn't be available if it wasn't for financial support - Thanks!

- We'll begin with a retrospective to understand how we ended up where we are today - Try to clarify the different kinds of virtualisation available

- A bit of pop history: The picture shows a group of NACA computers, precursor to NASA - Often rooms full of women, minorities and anyone else not considered fully esteemed - Things started to change with a lot with Turing's Christof computer - Not get stuck here, it's fun but there are whole books available for those who are interested

- Digital computers became a thing a big thing as they could solve things a lot faster than humans - Slowly but surely became field-programmable and general purpose, saved a lot of money! - Everyone (see all scientists and fortune 50s) wanted access to a computer, so time sharing and remote access became a thing: a very early version of the cloud, many experts at the time thought computing would be provided by regional providers in a similar manner to water and electricity. The word terminal is still with us - but the view of computing as a utility look slightly dif. - Multi-user systems became the norm and suddenly not everyone with access to the computer were equally trusted or given equal amounts of processing power. Things started to get more and more complex. Richard Stallman, the father of GNU and the FOSS movement, was famous for actively working against restrictions of computer usage. Segue: Proto-virtualisation already exist in different forms on mainframes, but in the beginning of the 2000s the need becomes apparent in the commodity server market as well.

- So what were the main motivations: basically what the slide says - Up until that point, every hertz could have a use and the benefits (see dreamy eyed potential) of digitalization had been many. Maybe the IT bubble burst started putting focus on cost again. - Computers had started to become way too powerful for single tasks, but the security model of OSes had remained largely the same - multi-tenancy was not simple and admin privileges, which were more or less required for everything, were omnipotent and hard to restrict. - Ofc these problems aren't impossible to solve, but the effort required to change hardware, software and way-of-working (see knowledge) was deemed like too big of a task. Segue: Two primary methods for solving these problems started to appear

- HW-level virtualisation is probably what comes to mind for most people when they hear the term. - If you used VirtualBox, you know this. - Quite complex and slow to pretend to be hardware, but the method soon showed many benefits. - Make legacy operating systems and applications run without customisation: business as usual. - Most users do however not care about how the OS is doing, they care about their apps running. - OS-level virtualisation tries to provide an environment that is as close as possible to what applications and admin expects: It's job is not to fool the operating system. - OS-level virtualisation requires changes of OS, but probably easier than pretend to behave like a physical object with all it's quirks. - Mostly used to virtualise OSes of the same kind (Linux on Linux) but not only - The ideas of uncoupling applications from hardware and/or operating systems became a thing as well, but more about that later in the course: For now focus on HW-level and OS-level virt. - OS-level instances are often called "containers", I try to not use the term as it is a bit confusing and could mean mean other things (more about that later) Segue: To make things a bit less diffuse, let's talk about early contenders in the field.

- Company that brought HW-level virtualisation to the masses since the early 2000s. - Many people considered it a bit of a moon shot at the time. - While slow it quickly became popular as it helped companies save a lot of money (more about that later in the course). - Early in the space of clustering nodes. - Used to be unchallenged king of HW-level virt for a long time, but these days there are several who battle for the title: many which are freely available.

- First virtualisation on UNIX, inspired Solaris Zones, OpenVZ and other solutions. - The "UID 0 problem" was the main focus: Splitting root up into smaller parts was deemed to much a hassle. - Create isolated jails, which have their own root and own view of the system, such as processes and the file system root. Segue: We are already using a lot of terms, let's try to clarify what some of them mean.

Many of these terms are not strictly correct, but they are widely used so I've tried to group them.

- Participants need to know basic ethernet and IP networking to understand some concepts and labs. - Being somewhat comfortable in the Linux shell is a requirement. Some of the things we'll cover can't be done through a GUI. - When covering OS virtualisation we'll also dig a bit deep down into Linux internals. You don't need to be an expert but knowing what kernel space and user space is will surely help.

- We'll need some software and a lab environments for the course - Ubuntu will be used in most demos and labs. - For now, just make sure that the software is installed and seem to be working. - Check out the presentation links: I will however not provide a step-by-step guide, you won't get it in real life.

- It's all bout' the money, it's all bout the dumdudmudumdudm. - Unfortuneatly this is what people other than nerds general care about. - Useful knowledge to understand how we got here. - Important for you to make friends/get promoted/earn a decent salary at most places. Segue: Let's start with the simple stuff - HW.

- Data center space is expensive, as it needs to be secure, climate controlled and somewhat clean. - Servers need a lot of supporting infrastructure such as network switches, UPSes and (especially) back in the days KVMs (Keyboard-Video-Mouse, yikes). It all cost money and take space. - These days it should not come as a surprise to anyone, but electricity is expensive. Cooling also requires a lot of energy (in general). - Lots of HW means that many things must be maintained and that can break - not just servers but also supporting infrastructure. Humans need to fix/order/replace these, humans are expensive and annoying: you need to find them, give em money and keep them happy. Segue: Another thing to consider is that we barely use the computers that we've bought.

- There are studies out there made by various actors: all of them show very low utilization. - These have been produced with more or less effort: many contains circular references. - Others, such as those published by Google and Alibaba are not likely not very representative of average enterprise environments. - Match instructors anecdotal observations. - We can like fit these computing needs on a few hypervisors instead.

- We also don't need to think so much about margins. We can always vertically scale guests, in some cases even without shutting down the instance: Linux supports hot swapping of CPUs and similar. - Same goes for capacity planning, don't need to spend so much effort for every system. - In general, more powerful HW is more cost efficient from a power and/or space perspective. It would probably hard to motivate investment in such fancy HW without knowing if it would be used.

Let's take a deep breath: in order to explain further economic benefits we need to understand the underlying technology a bit better. Let's talk about guest migration.

- In a clustered environment we can move guests between hypervisors - Some solutions support migrating state (disk, memory, CPU registers, etc) and through slight of hand make the guest run seamlessly on the other host. Some don't: offline VS online. - While not a strict requirement, some type of shared storage for VM disks/file systems is required - Not trivial, but we'll cover the implementation details later in the course.

Time to tie the tie again.

- Being able to do maintenance without screaming users is huge. - Handling HW breakage seams trivial, not so much back in the days when Windows was tightly coupled with the hardware it was installed on. Still not trivial - both due to actual and license problems - Legacy systems can survive long past their HWs due date. Companies love this, sysadmins not so much. - Deserves repeating: Human labor is expensive and something most organisations try to avoid.

- Costs needed to anticipated in detail. The requirement to pay a lot up front for new systems was a huge barrier for startups/small players. - Buying HW, finding space in a rack and making somewhat put it up takes a lot of effort. - Lead times for hardware is really long these days and the supply chain is unpredictable. - If we can remove humans we can start to automate things. Time consuming (expensive) to move tasks between developers and HW team, lot's of room for errors and miscommunication. - We wouldn't have devops the way it is today without virtualisation.

- Selling software that other parties are suppose to install, manage and/or maintain is a real mess and quite annoying. - You need to take a lot of variables into question: OS version, hardware/drivers, faulty setup. - Before the problem used to be solved by shipping pizza boxes to customers, but that's also meh. - Virtual appliances can be delivered preconfigured in a somewhat predictable environment, only exposing limited configuration options that the customers can screw up.

Virtualisation helps us run systems reliably, but how?

A lot of the same things that make virtualisation good from an economic perspective applies to reliability as well. Segue: There are some other neat benefits as well.

- Computers break, we should not rely on a single one. Same goes for data centers. - Three is a magic number in computing (and voting in general) as it enables a quorum to be kept. Redundancy with two nodes may be better than with none, but handling state and data is a mess: requires a lot of fencing and painful reconciliation. - Would be quite expensive and out of the questions for most organisations to do this with HW only.

- The ability to capture state of a guest and roll back. - Some virtualisation solutions only support disk snapshots and not RAM/CPU state. Really only matters if you want to be able to revert instance to an already running state. - Modern filesystems also implement snapshots, but VM snapshots leave less room for trouble. - Important to note that a disc only snapshot should preferably be taken when the guest is shut down: things could be cached in memory, databases are notorious. Segue: Let's put this feature to some use for reliability.

- Organisations are often scared to mess with things "that work", even when needed or the change improves reliability long-term. - Move fast and don't break thing.

You young people care about the environment, don't you?

- Many of the properties provided by virtualisation showed both benefits for cost-savings and reliability. Same goes for it's environmental benefits. - There are also other things that are relevant, put some effort into thinking about it.

Basically the same as for saving money on HW.

- Basically the same as for saving money on HW. - Besides the peaks, most hypervisors could likely be shutdown to electricity/produce less heat. For this to be reasonable, live migration and/or batch workloads are likely needed.

- One of the initial motivations for using virtualisation. - Most companies don't fully trust all their employees: even fewer their competition. Multi-tenancy is an important use-case to save money but hard to do in commodity general purpose OSes. - Simpler to just use VMs per customer/user/applications. Segue: Besides keeping things simple, limiting what the job for a host is has other benefits.

- From a infosec perspective, we want a small attack surface. Less services means less code means less risk for security vulnerabilities. - Perhaps not everyone in the IT staff needs to be root on every system? Still often the case, but for security sensitive organisations this is now an option. - Systems that perform a limited set of tasks are easier to restrict and monitor, lot's of noise otherwise. Segue: Some people have taken this reasoning to heart and very far.

- Qubes OS runs everything in VMs: there is no reason why your personal web browsing should be able to compromise your work for customer X. We'll talk more about Qubes later in the course. - Unikernels (or library OSes) builds a runtime environment which bundles an application and only enough OS to get things running. Great to minimize attack surface and keep things performant, low overhead for kernel context switching. - Wouldn't be reasonable if it wasn't for the predictable runtime environment that virtualisation provides.

- Being able to utilise snapshots to aggressively change systems (such as apply updates) has many security benefits as well. - Snapshots provide a known good state: extremely helpful if a system is hacked. Can also be used to dif in order to find what the hacker did. - Physical computers store a lot of state that can be changed and persisted even after an OS is reinstalled (or even if the disk is physically destroyed. - Also relevant for multi-tenancy: how had and who will have access to the server after you? - The server HW industry has started to make changes, but VMs are far supreme in this case.

Not all is a dance on roses.

- Software vendors that sell licenses are happy for virtualisation. - Quite easy to spin up new systems without thinking about the effort required to take care of them: they still need to patched and when they break humans take care of them.

- All the abstractions can introduce... interesting problems and breakage. - High-availability is usually best implemented near the application. A client that can handle reconnections and migration between remote endpoints is likely a way simpler solution. - Reality may not always meet expectations: migration can fail and introduce problems.

Basically what the slide says.

- All emulation has a performance cost. OS-level virt usually introduce way less, but still there. - Depending on workloads this can be a big deal: especially I/O intensive workloads tend to suffer. - You can't always know how well your instance will perform, has the HW is shared among others. One day can be fine while the next is shit. Hypervisors are almost always overprovisioned. - Lot of sensors and counters used for advanced performance analysis can't be observed from inside the guest, making debugging and performance analysis trickier.

Take some time to reflect before forcing more knowledge into your brain.

- Put some effort into this. - A lot of value for your learning by just putting what we covered into your own words. - Helps instructor to make the course better and clarify questions.

For those who wish to dig deeper into the things we've talked about.

- So far we've talked about the "soft" parts: pros and cons of virtualisation as a general concept - These apply to both HW-level and OS-level virtualisation

Clarify and remind students about required software setup for the course

Spaced repetition and generation

- So far everything we've covered applies more or less to both HW-level and OS-level virtualisation - Look at a few solutions and their fit on the market - We've been quite soft and general, this section will be slightly more technical - Remind students to ask questions if things are unclear

- ESXI is installed on HW just like you would a Linux distribution - Hypervisor cluster is controlled by vCenter - Live migration, vSwitch and vSAN makes setting up highly availble hypervisor clusters easy - Decent choice for SMBs with limited IT staff and high availability requirements - "No one ever got fired for buying VMware" - Owned by Broadcom, getting more expensive

- Built in to Windows - VM manager (similar to Virtualbox) can be activated as a Windows feature - Requires Windows 10 Pro/Enterprise or Windows 11 Pro. Included in Windows Server, but you often need to pay per VMs and/or CPU cores. It's boring. - Was quite late to enter the game and even later to become a competitive option, but works alright for both Windows and Linux guests these days - WSL == Windows Subsystem For Linux, ability to run Linux distros in highly integrated VMs - Clarify that this applies to WSL version 2, not WSL version 1 which is OS-level virtualisation - VBS == Virtualisation Based Security, examples are Cred Guard, Drive Guard and old Edge sandbox - VBS uses security properties provided by virtualisation and related HW features without running traditional OSes in VMs

- People use the term KVM incorrectly - Several different software components make up what people refer to - For a long while this was the most common solution for virtualisation on Linux servers - Red Hat makes RHEL (which Centos/Alma Linux/Rocky Linux is based on) and sponsors Fedora - Canonical makes Ubuntu Segue: Let's break these components down and see what purpose they serve

- If you have used virt-manager or virsh, you've used libvirt - Most often a client (such as virsh) talks to libvirtd running as a system service - Handles a lot of the Grunt work, configures routing/NAT rules etc. - The project shows it's age

- QEMU is very flexible, it can emulate obscure CPU architectures and ancient floppy devices - Most OSes expect a BIOS, hardware devices, etc: QEMU emulates several kinds of these depending on the OS's needs. - QEMU's primary goal (at least initially) was to be a somewhat fast and very flexible solution for emulation. These days, it's primarily used to run server VMs in data centers which is quite far from this initial goal. - It's a bit like the "Good/Cheap/Fast"-rule, you need lots of code to implement all these features and options - lots of code means high risk for bugs, worst case security vulnerabilities - A great example is the "Venom" VM escape vuln (https://en.wikipedia.org/wiki/VENOM), which affected code used to emulate a floppy device which was enabled in most (every?) guest. Most users in the current century probably never needed this feature, but it still was available.

- What makes QEMU become fast and usable, fully emulating a CPU and RAM is very slow - Not only used by QEMU, as we are about to see Segue: Before we get into other users of KVM, there is a project that only use two of the components (not libvirt)

- Makes use of QEMU and KVM, but not libvirt as it is a bit of a mess - Basically same installation procedure as for ESXI - Supports automation through API: Terraform and Ansible can use this - Besides HW-level VMs it can run lightweight OS-level using LXC, more about that later - Includes out-of-the-box support for ZFS, which provides great software RAID, snapshots and data corruption detection among other things - Supports many of the same clustering features as VMware ESXI, but not nearly as polished - Great for a similar usecases as ESXI, also for home labs - it's FOSS! - https://www.proxmox.com/en/proxmox-virtual-environment/overview

- Initially developed by Intel, but know governed by the Linux Foundation - Main focus is not to run everything like QEMU, but run modern server OSes - Supports Linux and since somewhat recently the latest Windows Server - Priorities are security and performance, not flexibility - Development is sponsored by several (and in some cases competing companies), such as Intel, ARM, AMD, Microsoft, ByteDance (developers of Tiktok) and Alibaba (probably largest non-US cloud) - Uses KVM on Linux to make things fast! - Coded in Rust, QEMU is written in C. Preferable for safety - Will likely be a reasonable replacement for libvirt + QEMU for most users soon Segue: Another user of KVM is Firecracker

- Firecracker isn't built to run general purpose server operating systems/workloads - Initially developed for Amazon's FaaS Lambda, later Fargate - Wanted the speed and low overhead of OS-level virtualisation but security of HW-level - Both Firecracker and Cloud Hypervisor are written in Rust, which is a language that help developers make safer and more performant applications. A lot of the low-level code in these projects share code, resulting in that they make each other better

- Originally a research project, released as FOSS in 2003 - High performance with focus on para-virtualisation (more about that later in the course) - Served as the base for AWS instances (now days they seem to use KVM more and more) Segue: Xen has an interesting design and is quite different from QEMU + KVM....

- https://en.wikipedia.org/wiki/Hypervisor#Classification - https://wiki.xenproject.org/wiki/Xen_Project_Beginners_Guide - TODO: Add better speaker notes for this somewhat complicated topic

- TCB == Trusted Computing Base, HW and SW that if compromised could affect the overall security of a system. Kernel level software, such as drivers, and components with firmware often have more or less unrestricted access to memory, so a compromise of them could allow an attacker to gain full control of all software running on the machine - Having a small TCB is desirable as less code is easier to test and audit for security issues - As Xen acts as a thin layer and lots of functionality normally associated with a virtual machine manager is delegated to dom0 (the privileged VM used to manage the system and other VMs), Xen has a relatively small TCB - If you wanna try it out, the easiest way is probably to install XCP-ng (https://xcp-ng.org/) Segue: Due to these properties, Xen was chosen as the virtualisation technology for Qubes OS...

- Qubes OS is a FOSS desktop operating system with focus on security - Utilizes Xen virtualisation to provide security compartmentalisation (expl. will follow) - Co-founded by security researcher Joanna Rutkowska - Allegedly used by Edward Snowden and lots of other paranoid/security concious users, such as Freedom of the press foundation, the Mullvad VPN company and the organisation running "Let's Encrypt" Segue: To demonstrate the OS, I've set it up on a laptop and taken some screenshots in action

- What you're seeing is the default Qubes desktop when booted and logged in - Not a pretty sight, made slightly worse by trying to increase font size for the demos - Shipped with the XFCE (https://xfce.org/) WM, but also available with i3 (https://i3wm.org/) Segue: Things look a bit more interesting with applications running...

- The screenshot is a bit messy and probably doesn't make a lot of sense right now - Most important is to make note of the border color of the windows (red, grey, yellow and blue), these let us know which "qube" (their terminology for a VM) the application is running in - Each qube acts as a of a security boundry: The video player running in the "personal" qube can't access files or processes running in the "work" qube - The same is true for the "vault" cube which is provided to store very sensitive information in, such as private keys and password databases. This qube is further restricted as it isn't even provided with network access to minimize the risk for leaks - Qubes shines in the way qubes are integrated with the desktop and each other Segue: Lets clean up a bit and talk about how you use Qubes...

- Scrot shows the Qubes menu (a bit like the Windows start menu) - The list shows Qubes installed on the system and have sub-menus for starting applications in them Segue: Let's start a file browser in the "work" qube...

- Qubes utilise snapshots and temporary VMs in some very neat ways - If you have an untrusted/potentially malicious file, such as a document received via email or a program downloaded from an suspicious site on the Internet, it can be a risky thing to open/run it - The scrot shows a menu in the file manager that allows us to view the document in a "disposableVM"

- Once selected, we get a status notification telling us a qube is started - A fresh new VM is created (from a template) just for viewing this document, and it's quite fast

- Notice the border color (and title) of the document editor window, it's different from the file browser in the "work" qube - Even if the document contained malicious code, it would be restricted to data and processes in the disposable qube (which is likely very little of interest)

- Once done reading the document and the window is closed, we get another notification that the disposable qube is shut down - It is not only shut down, but also deleted - This enables us to feel quite comfortable that any malicious backdoors are also deleted

- DisposableVMs are just like other Qubes, except that they are created (from a snapshot) on the fly and destroyed once all running apps in has been closed - Using a disposable qube for web browsing is quite a good practice for the reasons described in the previous example Segue: Qubes OS comes with a bunch of default qubes as we've seen, but you can easily create more

- The "Qubes tools" section of the menu contains utilities for managing the system - Lets select "Create Qubes VM"

- Choose a name and a border color for the Qube: a qube per customer and/or project is a good thing - We can also select which template to use for the VM, Qubes OS comes with Debian and Fedora, Kali Linux (https://www.kali.org/) can be installed as well (a good choice for penetration testers)

- We can also select which type of qube to create - "TemplateVM" is a qube which you can install software in, configure and use as the base for other qubes/VMs - "AppVM" provides a persistent home directory for the qube, but is otherwise cloned from a VM each time it is started. If you want to install additional software or similar, these will likely need to be added to the template VM it is based on instead - "StandaloneVM" is also based on a template, but is once created fully persistent and can be modified without performing changes to the template. A good choice if you need to setup some complex software that you don't wanna include in your template VM, but it will consume more disk space and be a bit slower to start than an "AppVM" - "DisposableVM" is just what we talked about previously Segue: Qubes OS provides some tools that help us the desktop without giving up on security

- A quite common need is the ability to copy files between VMs - Qubes OS shows a dialog in the grey dom0 (administrative VM, basically not a regular qube, could be considered "the host" to keep things simple) that tells us that VM X wants to copy a file to VM Z, which prevents qubes from moving around data without the user noticing

- Another example is the Qubes clipboard, which controls which VMs can access copy/paste data - This may seem silly, but the clipboard could contain sensitive information such as passwords

- The same goes for peripherals such as microphones, webcams and USB storage drives - Sure, there are likely use-cases where you want VMs to access to them, but not all of them and not all the time - You probably want your work qube (or disposable) to access the webcam and microphone for meetings

- It may be a bit hard to see due to scaling issues, but the upper right menu bar contains a notification area - If we click on the red signal strength indicator, we see drop-down menu appear with a red border containg network interfaces and visible WiFi networks - The laptop's network interfaces are forwarded to a dedicated qube that provides networking for other VMs (qube "sys-net" by default) - It's actually the same for USB and other peripheral devices as we just saw (qube "sys-usb") - Qubes OS does it this way to protect the system from attacks delivered via network and external devices - If for example firmware in the wireless card is compromised by an attacker on the same network, the attacker's access is limited to the sys-net qube. The attacker would likely be able to intercept network traffic, but not much more and the traffic from other qubes is hopefully encrypted

- You can actually see this isolation by listing network interfaces in dom0, which has none besides the loopback interface Segue: We can take a look at how this is done by opening "Qubes manager" in the tools menu

- This quite messy application shows us information about qubes configured on the system Segue: It also allows us to edit their settings...

- The scrot shows the configuration windows for a qube - Allows us to modify VM settings in a similar fashion to other VMMs

- By clicking on the "Devices" menu, we can see PCI devices forwarded/dedicated to the qube - The current example shows how the laptop's two USB controllers are dedicated to the "sys-usb" qube, which we saw in a previous example - Similar configuration exist for the networking VM which has a wired and wireless NIC forwarded - You could forward other devices as well to qubes of your choice, such as a GPU or studio sound card

- The qube settings also allows us to configure network restrictions for the Qube - This can of course be done in other ways inside the specific qubes, but it's quite neat to have the firewall functionality so easily accessible - Firewalling is actually by default done by a separate qube - you can have multiple of these, some which are connected to VPN for example, forcing all qubes configured to use it through the tunnel or proxy

- If you fancy the terminal or want to automate things, everything showed in the demo and much more can be done with CLI tools

- While Qubes OS is really cool and I recommend everyone to try it, it's not problem free - To run qubes smoothly, you need a somewhat beefy computer with CPU virtualisation features that are not available on all consumer systems - check out their system requirements - While working quite well, there is a lot of overhead with all qubes running - if the "system" and "service" qubes are included, there were 12 different VMs running during my demo - Especially RAM is a thing you'll need quite a lot of - For security reasons and the way windows from different qubes are shown, there is not 3D or hardware codec acceleration available. You can't really game or do things like CAD on Qubes unless you forward a dedicated GPU to the qube (which is seldom completely trivial) - Lot's of things to manage, including several templates that must be patched and maintained, requires a bit of knowledge of the underlying technology - Qubes dom0 kernel is based on a patched older version of Fedora, which means that it rarely has good support for new hardware. If you are gonna use qubes, you have better chances with a device that has a few years on it's neck Segue: Even if you consider these problems deal-breakers or find it slightly too paranoid, there is still a lot of things to learn and adopt from Qubes' design....

- Even if you don't use Qubes you can still have a snapshot'ed VM that can be considered disposable - Why not do your web browsing in a VM that can be nuked and paved every day? - If you don't want all your Internet traffic to pass through a VPN, why not setup a dedicated VM for it? If you setup two VMs, one as a router/gateway and the other as a client to it, you can restrict connections and minimize the risk for leaks - Basically what the slide says

- Virtualisation is in general very beneficial for security, as we've discovered during the course Segue: It does however have it's downsides and doesn't solve every problem...

- Guests depend the host's security, as it is in full control of their execution* - The hypervisor can snoop on the guests' disks and memory. This could be used to steal sensitive information such as credentials from memory and confidential databases from disk - From the perspective of the guest, it is in general not possible to inspect the security of the underlying hypervisor and it's supporting infrastructure. Instead, users are forced to more or less trust the infrastructure operators - This problem also exist for physical servers that are colocated in a third-party's data center (which is the common thing these days as data centers are expensive facilities to own/operate), but in many cases this requires physical access to the servers (which are usually in a "locked" rack or cage). A malicious actor who have access to a hypervisor or the virtualisation control plane (think vCenter or similar) usually has a far easier time gaining access to VMs - Add brasklapp about the terrible state of BMCs/IPMI/HW management interfaces Segue: So what can we do to limit trust in the operators of physical infrastructure?

- Disk encryption can be used to protect data at rest - If someone would walk away with the hypervisor or network storage appliance that the guest's disk image is stored on, at least it would be encrypted - Relevant as we don't necessarily know how the operator handles backups of data and HW decommissioning/broken disks (perhaps they are sent to a third-party vendor?) - Regardless the key needed to unlock the disk must be in memory of the guest that access the data - Probably stored somehow as we want to be able to reboot servers without entering a password or similar every time. Encrypted storage on servers, especially OS disks, is a bit of a mess - If the decryption key for the data is available in the guest, chances are that the hypervisor can access it as well - either through snooping of memory or by manipulating storage for the guest operating system

- Infrastructure operators, regardless if they are your company's IT department or a third-party hosting provider, can do things to earn trust even tough we can't inspect security from the guest - Allow independent auditors to inspect and test security controls/processes - Not just technical things such as patch levels and network restrictions, but also physical security, background checks of personnel and similar - As a wise and pessimistic man once said: "trust is just a word for lacking security" - Laws and industry compliance frameworks may prevent organisations from letting a third party host/control their infrastructure even if they want to - Even if operators are not actively malicious and auditors didn't find any obvious problems, systems can still get hacked. New vulnerabilities and attack techniques are discovered every day

- Many different organisations would like to solve the problem that sounds a bit like a pipe-dream: not having to trust the computer your software (in this case VM) is running on - Organisations doesn't necessarily have the skills, resources or interest in operating their own infrastructure - Many would like to a cloud provider, but don't feel comfortable giving away access to their data - Cloud providers would very much like to take these peoples' money - Large and security concious organisations would like to save money by using internally pooled virtual infrastructure, but have some many different security requirements that it is hard to do so - Growing interest in edge computing (https://en.wikipedia.org/wiki/Edge_computing), which often means that servers will move from relatively safe data centers (from a physical perspective) to small service lockers next to the highway or in the forest. It may be tricky to prevent malicious actors from physically gaining access to the HW in these cases - Would likely also be way to expensive for everyone with "edge needs" to own their own geographically spread out infrastructure. Sharing will be necessary if not only the biggest players wanna utilize it.

- OT intro: The benefits of disk encryption is widely understood, but many don't know that keys stored in RAM can often be extracted using so called "cold boot attacks" (https://en.wikipedia.org/wiki/Cold_boot_attack) or similar methods - Data is typically stored unencrypted in RAM (or otherwise the keys to decrypt it are stored in RAM next to it), which means that an attacker with physical access to RAM may be able to extract it - This is obviously bad if someone gains physical access to the hypervisor, but the hypervisor can as previously mentioned most often snoop on the memory of guests and even manipulate it - Processor/chipset vendors started introducing features that encrypt all data stored in RAM with a random key generated per boot by the CPU (in which the key is, supposedly securely, stored) - Initially, the goal was to prevent against physical attacks as described above, but new features were added that generated a random encryption key per guest (or rather the area of memory dedicated to each guest), which in theory would prevent the hypervisor (and other guests which may have been able to escape/breakout) from peeking at it - Hypervisor can still modify executable files on the guest disk or use a number of other techniques to gain access, which makes the memory encryption a bit irrelevant - AMD's naming scheme is a mess, see https://developer.amd.com/sev/ Segue: The dream lives on and since then new features have been introduced which tries to address some of these vulnerabilities/attacks...

- HW vendors switched focus from just memory encryption to a more holistic approach - These features would not only prevent hypervisors from messing with guest memory, but also CPU registers/state - Mayhaps most importantly, they provide a method for guests to verify during startup that they actually running in this encrypted/isolated environment - Once the guest knows that the coast is clear, it can decrypt/process sensitive data - Quite complicated how this works on a technical level and a bit out-of-scope for this course Segue: This sounds wonderful and almost to good to be true...

- These HW features have just started getting upstream support in the Linux kernel and virtualisation stack - will take a while before they are fully rolled out and stable - Verified/attested boot chains, which seems to be requirement for guests to run without trusting the host operating system, is far from problem free. The Linux implementations for secure boot (which is one of the pieces) seems to be motivated by getting it run on all HW, not making things actually more secure - In practice, we are moving trust from the hypervisor (which is often based/running on FOSS and auditable code) to CPUs running closed-source firmware with a bad track record - These isolation technologies are not perfect and vulns have been discovered in them. Intel SGX, which is a similar technology that has been on the market for quite some time, have had several flaws that completely break the security promise Segue: So, should we just give up on these technologies?

- The security of a system should be multi-layered: technologies such as Intel TDX may be able to stop an attacker that has managed to get a foothold on a hypervisor - Everything that improves the overall security of a system should be considered, even if it's not a silver bullet - Perhaps it prevents companies from building business models based on snooping - The main danger is the promise it makes and how people interpret that

- We've talked about the many benefits provided by virtualisation - Performance is not one of them - There are ways however to make the overhead of HW-level VMs smaller

- One of the largest gains is to move things to HW (seems a bit counter intuitive) - Emulating a CPU in SW is complex and usually quite slow - Supported by most CPUs these days, features allowing direct access to PCI devices (such as GPUs) may only be accessible on workstation and server systems - KVM utilise and expose these hardware acceleration features - In order to allow live migration of guests, we can't forward HW devices or use fancy CPU features that may not exist on the target hypervisor. Feature masking enables a host to hide CPU features from the guests

- Back in the days when virtualisation was new, hypervisors emulated commonly supported devices such as NICs and SATA HDDs - These days almost all servers are run virtualised and it seems a bit silly to pretend they are HW - Para-virtualisation is about stopping the pretending while still benefiting from virtualisation - Virtio devices are supported out-of-the-box by most modern operating systems running as guests - Usage of these can drastically improve performance/lower load on the hypervisor

- These techniques don't improve performance, but the minimizes the overhead cost - Deduplication == Only store one piece of data ones, for memory/RAM pages and for disks blocks - If a hypervisor is running 100s of Windows VMs, chances are that a lot of storage space can be saved if duplicated files can be deduplicated - Same goes for RAM, many VMs will likely run the same Linux distro and store the same kernel in memory - Deduplication can be costly from a performance perspective, but sometimes that is overweighted by the potentially huge savings - "Offline/Post-process" deduplication is often way cheaper from a performance perspective than "online/in-line" (https://en.wikipedia.org/wiki/Data_deduplication#Classification) - Memory ballooning allows the host to signal to guests that they can use more memory/should clean up their caches to save memory during runtime. This enables hypervisor operators to overprovision memory slightly more and make each hypervisor more cost effective

- Let's take a step back and think about the security pros/cons of virtualisation

- Reminder: Focus on security aspects of virtualisation, not the rest

- Take your time to think these over, it's good for your learning - If you have patients for it, revist the questions in a week or so to further engrave them into your knowledge

- As we learned, virtualisation has many benefits related to security, economics, reliability and environmental aspects - HW-level virtualisation == Emulation of a computer with the goal of making an operating system run on it like it would on real HW - Has anyone tested Proxmox, Qubes or related?

- Basically course progress status

- Some of the following topics will be more complex and technical - Remind students that they need to spend time studying outside of lessons - Ask for help and clarifications - Remind students that they can help make things better by clarifying things for future students, no stupid questions exist!

- Throughout the course, students have been asked to submit their reflections after each segment - Several things were brought up that seems to have left impressions, many found Qubes kool and were surprised to find some many benefits of virtualisation - Several students requested a glossary - Questions and other feedback came up that I would like to take some time to address before we continue progressing into OS-level virtualisation

- We've talked a lot about the security aspects related to virtualisation, both good and bad - One of the exercises included argumentation against virt from a security perspective - Why many of these arguments are true, for most use-cases the pros far outweigh the cons - Talk a bit about the pros - Hypervisors are usually configured in large cluster setups with a single control plane (like vCenter for VMware), which involves some risks. While this is usually cost effective, you can run hypervisors more or less independently to mitigate the risks. It's not an inherit problem with virt - Performance and complexity probably has the strongest argument against virtualisation

- Students showed interest in Qubes OS, but don't know which HW to run it on

- In the beginning, hypervisors wanted to be able to run the OSes everyone was using - To make things smooth, they emulated devices that were commonly supported/had drivers shipped with the OSes - These days almost all servers are virtualised and have been for quite some time - Purpose built virtio drivers are included in most modern OSes that talk to a virtual device that doesn't pretend to be physical (abide by the laws of nature and act in ways that are fast in HW but slow in SW) - This removes unnecessary emulation overhead, which results in lower load on hypervisors that can be translated into heigher VM density and thereby economic benefits. Segue: On the topic of performance, there may be need to clarify some terms commonly used...

- Upscaling: Making systems powerful/able to handle more load - Downscaling: Making systems less powerful/able to handle less load, but saves money - Vertical upscaling is done by adding more CPU cores or what ever resource is necessary to a VM. Some hypervisors and OSes supports doing this while the guest is running, other require a reboot - Horizontal upscaling means deploying more VMs running the same application and using a load balancer or similar to spread user traffic against the servers - Vertical upscaling only takes you so far, eventually you'll outgrow the physical server. Not all OSes and apps handle super big instances very well - Horizontal scaling may be a better choice and adds redundancy, but adds additional complexity

- Up until this point, we've mostly talked about things from a HW-level virtualisation perspective Segue: May be time to refresh the differences between HW-level and OS-level

- In general, developers and companies don't care much if the OS believes it's running on HW, they just want to run their application stacks - The goal of OS-level virtualisation is to convince users and applications, not OSes - OS-level instances are often called "containers", I try to not use the term as it is a bit confusing and could mean mean other things (more about that later) Segue: To further complicate things, we can create sub-groups of OS-level virt...

- Basically what the slide says

- Syscalls are translated from Linux to FreeBSD or Windows syscalls - Some other parts of the system are also emulated to create the illusion of running on Linux - Linux on x86_64 has over 300 different syscalls that have many different options. It's quite hard to fully emulate this and implement the same quirky behavior as on Linux - FreeBSD is stuck on an old version of Linux and WSL v2 changed to running HW-level virtualised Linux kernels instead - Wine has gotten a lot better during the last couple of years since Valve (developers of Steam and games like Counter-Strike) got involved to make software run on Steam deck (https://www.steamdeck.com) - Far from problem free, but can be very useful

- Our main focus will be virtualising the same OS as the host, this is by far most common

- What motivates us to use OS-level instead of HW-level? We already got a solution Segue: Well, there are several benefits...

- As we don't need to pretend to be hardware, we can get rid of emulation layers that cause a lot of overhead. Applications can get more or less direct access (while restrict) access to actual HW - When running HW-level virt, each guest runs it's own OS kernel. This requires RAM and processor capacity. - The same kernel is used by the host and all guests, shaving of the overhead - This also enables concurrent sharing of HW devices that could otherwise only be used by one HW-level guest at a time - Efficiently sharing resources among HW-level VMs is hard, as the hypervisor has fairly limited knowledge what goes on inside each guest. Are they using RAM because they need to or because of caching? Are the processes using CPU batch jobs that could be give less priority? - For the hypervisor, OS-level virtualisation is a bit like a one-way mirror: it can see everything going on in the guests but not the other way around - This allows the hypervisor to treat all programs running in guests as regular processes and efficiently schedule them - All in all, these shared kernel benefits allows much higher density of guests Segue: The one-way mirrorness has other benefits as well...

- While it's true that HW-level hypervisors can typically snoop on guests (as we've discussed), in-practice it is a bit cumbersome and involves quite a bit of overhead - As the host has great insight into what is happening in OS-level guests, activity can be observed - Falco is an Intrusion Detection System (IDS) for OS-level virtualisation on Linux (specifically containers, we'll talk more about that later). You can configure it to alert you when specific applications are run and abnormal behavior occurs: Should your web server really spawn a port scanner process? Segue: Enough about the benefits, what are our options?

- We've already talked a bit about FreeBSD jails (see "Types of virtualisation" presentation) - Limit ourselves to Linux, no really wide-spread solutions for Windows - Virtuozzo is often shorted to just VZ - Very early solution that showed great potential - Required use of proprietary software with license fees and a heavily patched Linux kernel, which mayhaps prevented it from being more widely used - The core was open-sourced as "OpenVZ", still quite tightly tied to the SWsoft company - Still quite popular today, especially in web hosting and for cheap VPSes as it allows great density of guests on a single node - A bit quirky behavior sometimes, applications didn't always behave as expected. Full HW-level virtualisation probably seemed easier for many users (as they were used to one OS on a physical server) - As stated, one of the things preventing wide-spread adoption was that the development and community was heavily controlled by SWSoft. You had to use a kernel with their patches, which was somewhat limiting - LXC was initially developed by IBM and other parties with the goal of only using (or at least directly upstreaming) functionality in the Linux kernel (more about that later) - Canonical are the makers of Ubuntu and it's where its most tightly integrated

- Let's look at how LXC can be used and what neat things it provides - LXC == Container technology, LXD == service to manage it and other things (think libvirt) - Far from a complete demo of the feature set and I don't expect you to remember the commands, more of a tasting to see what it's capable of

- Installation is trivial: If you're on Ubuntu, chances are that it's already installed - As Canonical is lead, Ubuntu has the best integration. The upstream project recommends install via snap (https://snapcraft.io/), but some other distros have native packages - Once installed, the "lxd init" command allows you to enable lxd and customize settings, such as cluster configuration. Passing the "--minimal" flag provides automated configuration without any questions

- Before we can create a OS-level VM (or "system container" as they are called in LXD lingo) - The projects builds images for a large number of different Linux distros, you aren't just limited to the one running on your host/hypervisor - The example command shows a randomized sample of distros - These are preconfigured, no need to go through an install wizard Segue: Let's select one and get it started...

- In this example we'll create an instance based on the Alma Linux image (successor to Centos) - Ofc there are many options that can be tweaked, but creating an OS-level container can be this easy and (if the image has already been downloaded to the host before) deployed in a few seconds

- Once created, we can execute an interactive shell in the guest to start running commands - The example output shows that the host is running Ubuntu 22.04, while our newly created guest is a Alma Linux instance Segue: This is neat quite neat and sure it was easy to get started, but previously density and low overhead was raised as the main advantages...

- The command shows a truncated version of the info command, which among other things show resource usage of the instance - Granted that we have not installed any particular software or service besides what is provided by the container image, but it currently only consumes 2.6MiB of disk space and 55MiB of RAM - The disk usage in this output does not include the Alma Linux image, but the kool thing is that it will be shared by other instances using the same template - In other words, the only per-instance storage requirements are changes or files we add after initial deployment - We'll talk more about this sorcery later in the course

- We can do most thing's we expect out of a Linux server, such as install and run software - "srv-2" is based on the "Alpine Linux" distributions, which is very minimal and popular in the space

- As with HW-level VMs, we can create snapshots on regular basis or before we do stupid things - The can be disk-only or contain runtime state (memory, CPU registers, etc.) Segue: As briefly mentioned, LXD has built-in clustering functionality which is quite easy to get going (especially compared to other virtualisation solutions)

- The example cluster contains three hypervisor nodes - Let's say that we want to perform service (upgrade, repair, etc.) on a node - The "cluster evacuate" sub-command migrates all instances running on the specified host and disables scheduling on it - The example shows how our Alma Linux instance is moved from "node-1" to "node-2"

- Runs on a Raspberry Pi, no need for fancy hardware. If you have a few you can setup a cluster! - As noted, LXD clustering is easy to get started with. The "lxc init" wizard can be setup an overlay network for you, which enables communication between guests on different nodes regardless if one of the hypervisor nodes is in your basement and the other on a VPS in a public cloud - To make things a bit more confusing, LXD supports HW-level virtualisation with QEMU + KVM since a few versions ago. While still a bit early, it's nice to not deal with libvirt and make use of the easy clustering - LXD brags about live migration of both OS-level and HW-level guests. This doesn't always work, especially not for OS-level guests. As a rule of thumb, expect the ability to live migrate HW-level guests but not OS-level guests (they are however usually quite fast to offline migrate)

- We'll take a peak into which features of Linux that enable solutions like LXC and VZ to work - Don't expect you to remember all details and commands, but these isolation features serves as the for OS-level virtualisation security.

- There is no spoon - Unlike OSes such as FreeBSD and Solaris, there is no kernel concept of a OS-level VM in Linux - For better or worse, a hodgepodge of technologies are used together to build something that looks like an OS on a real server - The flip side of this is that the features can be used to achieve other kool things outside of the system virtualisation space - We won't cover all technologies, but rather focus on the core ones Segue: We have to start some where, may as well travel back to the epoch...

- Somewhat mysterious origins - Theory is that it was originally developed to aid development of new UNIX OS versions - Syscall to change a process (and sub-processes) view of the file system three - Prevent processes in chroot from accessing files outside of the specified directory - Not designed for security, as were about to see Segue: Probably makes more sense with an example...

- Debootstrap is a tool to download and "install" a Debian-based distribution into a directory - Used by installers and similar during system setup - In this example Debian 10 (codename "buster") is installed to the directory "my_debian_root" - When listing files in the directory, we recognize these from the file system root ("/") - This will serve as the base for our chroot

- To show that this actually, OS information is printed on the host and in the chroot ("guest") - The CLI program "chroot" uses the "chroot" syscall - Spawn an interactive shell in the chroot (bash) - Notice how the prompt change, this is Debian's default prompt - The file "/etc/os-release" in the Debian chroot is actually "my_debian_root/etc/os-release" Segue: This kind-of-works, we have some sort of proto-virtualisation going on here, but as I said chroot was not designed as a security feature...

- The same kernel is shared inside and outside of the chroot - root is still root and can list/kill/manipulate processes, read sensitive kernel logs, dump network traffic, load kernel modules, restart the host, etc. - The only thing that is isolated is the file system Segue: How do we solve this? Kernel namespaces is a part of the solution...

- Since the early 2000s, the Linux kernels namespace functionality has grown organically - Allows us to create new and isolated "views" of parts of the kernel, such as users, file systems, network interfaces/routes processes - Members in this case is one or more processes (including their sub-processes) - Works like a one-way mirror, the host can see what is going on inside all namespaces, but their members can only see things in their dedicated namespace(s) - There are currently eight different ones and new ones are in development, we'll focus one the ones that are most important and/or easy to understand - Cgroup, IPC, Network, Mount, PID, Time, User, UTS Segue: This is a bit complicated to explain, some examples may help...

- All programs that are started on a Linux system (processes) gets their own process ID (PID) - The PIDs start from 1, which is the first program started upon boot and typically an init system like systemd or OpenRC - A user can view/kill/interact with their own processes, root can do it for all (by default) - A malicious program could do bad thing with other processes - The PID namespace gives it members it's own listing - a PID inside a NS may be 5 from the perspective of the host, but 1 inside the namespace - This means that namespace members can only mess with their own processes

- Listing of processes inside and outside of the chroot, showing their the same

- "unshare" and "nsenter" are two good CLI utilities for playing around with NSes - In this case, we create a new PID namespace that the chroot command is executed in - The example shows how PID 1 is different in these contexts

- The network namespace may be the most versatile one - When a new network NS is created, no network devices are in it and it got it's own routing table, firewall rules, etc - We can either move one or more physical interfaces into the NS or virtual interface route through the host namespace - This is how OS-level guests are network isolated from each other, but very useful in other cases as well as the slide says TODO: Write better descriptions of non-virt use-cases

- Read the first sentence five times fast! - As mentioned, root in the chroot can still mess with the shared kernel - Many things expect to be run as root/UID 0, but don't usually need it as they only poke around on the file system ("my_debian_root" in this case) and won't use dangerous/fancy kernel features - Like the PID NS, members of a user NS get their own account list: UID 0 (root) for the NS members may be UID 2031 on "the host"/outside of the NS Segue: Could talk about NSes, let's move on to cgroups...

- Control groups - Limit how much resources a group of processes can use - Possibility to pin the on specific CPU cores for example, which can be good for performance and security (prevent "noisy neighbors" on the CPU core, keep cache warm/isolate) - Supports a feature called checkpointing which can be used to snap the state of processes in a cgroup - this state can be restored or migrated to another host (in some cases hehe) - Heavily used by systemd, all services and users run in their own cgroup to allow granular resource measurements/restrictions. Also good if you wanna kill a service and don't leave zombie process lying around (https://en.wikipedia.org/wiki/Zombie_process)

- Mentioned system calls (syscalls) several times during the course: the way user applications tell the kernel to do things, such as open files or network sockets - x86_64 Linux expose over 300 different syscalls, some of which are simple and others with complex configuration options (syscall arguments) - Large attack surface, some are know to break isolation. We don't want a guest to be able to issue a syscall that reboots the system, do we? - As always, we want to limit the attack surface: most applications don't use too many of them any way - Some applications, like the Firefox rendering process, do this to sandbox itself and limit potential things an attacker could do if compromised Segue: We've talked about how OS-level virtualisation was partly introduced as making proper multi-user/admin restrictions to tricky. Well, we've tried to do that also...

- Does root in all cases need to be able to do everything? - Doesn't it seem stupid to give a user root privileges if they "only" need to modify network settings? - Capabilities can be given/taken away from processes, including root processes - It can also be configured as an extended attribute for executable files - A typical historic use-case was to run network services as root if they needed to listen to a port beneath 1023 (https://www.w3.org/Daemon/User/Installation/PrivilegedPorts.html). This is an unnecessary risk, as an attacker would often gain complete root access if the service was owned - The capability "CAP_NET_BIND_SERVICE" also non-root processes to do this - chroot normally requires root privs, but there exist a capability to change that - The available caps often allow much more functionality than may be needed. Some like "CAP_NET_ADMIN" have known flaws as it exposes functionality that can be used to gain unrestricted root access. It's tricky to bolt on these types of restricts after a system has been built and the interest in fixing it is limited - Removing caps for a OS-level guest can minimize the kernel attack surface

- We've only scratched the surface, a lot of other things make up OS-level virt - Could go more in-depth if interest and time exist - Understand that it's a lot to take in, especially if you're not super comfortable with Linux - As noted, you're not expected to remember all of this in detail. If you can only take away one thing, that is OS-level virtualisation on Linux is comprised of several parts and not developed as one feature, which will lead to some issues as we're about to see

- Important to be aware of when OS-level virtualisation may be a bad choice - We've touched on it a bit and some of you may be able to guess: please do!

- Live migration is a feature that, in some environments, is heavily used to minimize disruptions. While this in-theory works and even sometimes in practice, it's hard to depend on it - Offline migration can still be used, but may not be good enough - It's nice that we can run different flavors and versions of distros, but a ten year old Ubuntu version may behave slightly different from a modern Fedora kernel - The kernel is shared among the host and all guests. It quickly becomes a lot of different processes and not all resources (like inotify watches) are properly name-spaced. Cgroups doesn't regulate everything. This is usually less of a problem with HW-level VMs, as the host kernel is doing far less different things - A bug in an obscure syscall could crash the kernel. As the kernel is shared, it would affect every guest and host

- As OS-level virt was not coherently developed for Linux, there are gaps between the different isolation technologies such NSes and LSM (https://en.wikipedia.org/wiki/Linux_Security_Modules) - All the syscalls pose a huge attack surface. Quite complex compared to what a HW-level hypervisor needs to do - Syscalls and capabilities that are known to be dangerous (break isolation) are black-/deny-listed - A far better approach when it comes to security would be to white-/allow-list known safe ones - This probably break a lot of apps and make OS-level virt harder to adapt

Take some time to reflect before forcing more knowledge into your brain.

- We've touched on this, especially related virtual appliances - Let's dig a bit deeper to understand what the problem really is

- Take your time to think these over, it's good for your learning - If you have patients for it, revist the questions in a week or so to further engrave them into your knowledge

- So far, we've been going through a lot of theory - We're about to put that knowledge to the test

- Labs will be based on different scenarios you may encounter in real life - Just like in real life, no one will tell you exactly how to do it. Then the job should be replaced by a shell script - No one knows every on-top of their head, feel free to Google, discuss with class mates, look through old slides and ask for support - Let me know how you approached the challenge and why you chose the solution steps you did

- We've talked about virtualising servers and running VMs on the desktop - Why not virtualise the desktop? - Many do and any many more have tried, turns out that what seems like a good idea may not be so - VDIs popularity comes and goes. Segue: But why, whats the idea?

- In the beginning, there were large computers with user facing terminals connected to them - These terminal were little less than a screen, keyboard and perhaps a futuristic looking mouse - Didn't have much of choice at the time, when computers became cheaper everyone started getting their own. This represented a decentralisation Segue: What are the benefits with centralising people's desktops? Many are the same as for servers

- As with server workloads, we can often overprovision the hypervisors running virtual desktops - Margins need to be a lot smaller. If everyone has the occasional need to perform a performance critical task, everyone needs a somewhat beefy machine. Chances are that these needs are not concurrent for all users, meaning that pooling makes a lot of sense. - The same goes the other way: a virtual desktop could be a lot more powerful if the guest is running on server HW than would be possible to fit in a light-weight laptop - Powerful accelerators, such as CAD GPUs, could be plugged into energy-efficient servers in climate-controlled data centers instead of being crammed into bulky desktop workstations - Access to powerful computers on demand make a lot of sense and, at least in theory, would save a lot of money for each individual user - Some software has very annoying licenses and/or very specific runtime environment requirements - Especially if there is only the occasional need, sharing this SW through virtual desktops could make a lot of sense - Basis for streaming gaming services such as Google Stadia and Geforce Now - Many cloud providers offer powerful virtual desktops with accelerators for usage on-demand

- IT admins can use scripts to automatically create and update virtual desktops when needed - Supporting users with physical desktops around the world can be a PITA - If the virtual desktop has been corrupted or suspicious software has been installed on it, snapshots and/or image templates can be used to quickly return to a working state - User stores important/sensitive data on their local HDDs, USB drives and similar. Besides the risks of these getting stole/lost, backing up the data can be hard - You can't really steal virtual desktop. When a physical laptop is stolen/lost (especiall if powered on), lots of work is required to make sure that access is revoked and that the sensitive information was actually inaccessible/encrypted at the time

- So far we've talked about these things very abstractly - Usually the virtual desktops are run with HW-level virtualisation and in some cases OS-level - How to users gain access to these? They need a network connection, but what else?

- Boring low-cost low-energy computers with one job: connect the user to a central computer - Designed to be low maintenance - Connect a screen, keyboard, mouse and network connection - Typically mounted behind a screen or under the table. Commonly spread throughout an office/campus - Used to all look more or less like in the picture, these days many look like a ChromeCast - Quite need, especially if combined with a smart card: just plug it into any device in any room and access your desktop how you last left it

- If you've used RDP, it's a lot like that - Can be quite nice for users that work from multiple devices in multiple locations: at least their desktop/running apps become consistent - Quiet cheap if users can utilize their own devices to run the client on - IT doesn't need to maintain any desktop hardware, yay! - Popular with economically constrained and globally diverse organisations Segue: This presentation wouldn't be part of the course if we didn't end on a low note...

- These days, its quite uncommon that users stay in one physical location all the time. They go to visit customers/providers, work from home and need to participate in the occasional off-site - While some of they may be able to survive on a tablet, most will need a laptop. Now you have to manage twice the amount of computers and the wins of VDI get a lot smaller if not nullified - Requires a decent network connection between the client and virtual desktop server. Especially crucial is latency: the experience quickly becomes painful with just a little bit of input lag - The biggest user critique against VDI is the experience: if the network is not great, things feel sluggish. This somewhat mitigates the "work from anywhere" argument - While being quite common to combine BYOD and VDI client apps, it may not be desirable from a security perspective. The org has no or very limited control of the user's endpoint device. While sensitive data may "only be streamed" to the client and not stored on the user's personal device, security critical information such as passwords or other credentials are still inputted using the user's keyboard. In practice, devices that run VDI clients must be trusted - When the VDI solution fails, it often prevents a lot of people to work at the same time. While users are typically very dependent on shared services such as email or internal file sharing, at least some tasks may be available if these are unavailable. If VDI is inaccessible and a requirement, productivity will likely be severely lowered (especially if thin clients are used)

Virtualisation course

Welcome and thanks for joining!

What we will cover

How we will do it

Free as in beer and speech

Acknowledgements

Course contributors

Let us dig in!

Background and basics

Why and how we virtualise

Computers were humans.

Computers became...

Initial motivations

HW-level virtualisation

OS-level virtualisation

Domain-specific virtual machines

VMware

FreeBSD Jails

Glossary and loose synonyms

Physical machine

Virtual machine

Course prerequisites

What you need to know and setup

You need to know...

You need to setup...

Economic benefits

Time to save money and dress sharply

Less hardware means...

Studies show average data center CPU utilization at 6-30%

Resource sharing means...

Technical interlude

Guest migration

Offline

Online

Back to business

Reliability benefits

Not only nerds care about uptime

Virtualisation enables...

Snapshots

Environmental benefits

How virtualisation can make things a little less terrible

Exercise: Pitch to Greta

Less HW means less waste

Less HW means less energy

Security benefits

(Yes, there are several)

Safely running multi-user/multi-tenant systems is hard.

One VM per task enables...

Computers are stateful

Downsides of virtualisation

The good, the bad and the ugly

Economics

Reliability

Security

Performance

Reflections

What have we learned so far?

Answer the following questions

Recommended reading

Introductory recap

Refreshing what we've learned so far

Recap of prerequisites

Warm-up quiz

Virtualisation software

Examples of HW-level solutions

VMware ESXi

Hyper-V

"KVM"

libvirt

QEMU

KVM

Proxmox VE

Cloud Hypervisor

Firecracker

Xen

Xen

Qubes OS showcase

An alternative vision of the desktop

Not everything is awesome

Get inspired