Firecracker: Lightweight Virtualization for Serverless Applications

Notation Legend

#something : number of something
: excerpt from paper
() : an opinion or some information that’s not present in paper


Pre-Read Thoughts

This paper’s been floating around in community, for a while, I think this will be the first paper I’m reading to go somewhat close into OS related concepts (specifically Linux)

It’s going to be interesting as I was and am a heavy user of serverless with cf workers, aws lambda, vercel functions (which is again aws) and various providers, since last few years

By using, I’ve just some high level idea which isn’t much but because of NOC21 CS15: CC & DS, ik virtualization and it’s related topics which I think will be helpful for this paper

I felt completing this before the paper, On-demand Container Loading in AWS Lambda, which got published a year back in 2023 by the same presenter, Marc Brooker

(to be continued in above paper)

Introduction

(Hypervisor based VMs ⇒ KVM/Xen + QEMU)

(Xen, it’s successor KVM are type 1 hypervisors, qemu is type 2)

(KVM is more of a linux thing whereas Xen can operate wide range of OS)

Container solutions are Docker, LXC (Linux Containers)

Normal container deployments on Linux are carried using

  1. cgroups (control groups)
    1. help with limiting resources to processes grouped under it
  2. namespaces
    1. isolates processes, like a VM
  3. seccomp-bpf (SECure COMputing using Berkeley Packet Filtering)
    1. with seccomp + bpf, access of syscalls from a process can be limited (like in sandbox)
  4. (chroot)
    1. for isolated fs

but they still depend on OS, which comes with tradeoffs

Firecracker emerges as efficient way by not choosing hypervisor-based VM (which comes with overhead since it’s a VM) and LXC (linux containers, where there are security, compatibility tradeoffs)

(generally, in KVM + QEmu combo, KVM helps with hardware virtualization as close to kernal which is very fast when compared to QEMU which is software virtualiztion)

(though QEMU can handle hw stack also like standalone, but KVM handles it the best, so by combining them, best perf can be achieved)

Firecracker is replacing QEMU as a VMM (VM Monitor), KVM is still kept for kernal based virtualization

Choosing Isolation Solution

initially in lambda, a single VM is allocated per customer for all the functions, which means functions are not isolated

redesign goals

  • Isolation
  • Overhead and Density
    • density ⇒ efficiently running more funcs on same machine
  • Performance
  • Compatibility
    • for functions with linux binaries
  • Fast switching
    • start new funcs, clean old funcs
  • Soft allocation
    • over commit resources but only consume until needed, and leave remaining for other funcs

Evaluating the Isolation Options

  • Containers
    • share the same kernel, isolation is also made using kernel
  • Virtualization
    • VMs under hypervisor
  • Language VMs
    • (like JVM, V8, CLR)
    • responsible for isolating with OS or other VMs

LXC

have kernel features like cgroups, namespaces, seccomp-bpf, chroot

main concern is directly accessing kernel and then privilege escalation, side channel info disclosure

Language-Specific Isolation

not suitable because arbitrary linux binaries will be used with functions from users

Virtualization

Challenges with this are density (running large no. of functions) and overhead (because of hypervisor), adding to them, the startup time

Implementing hypervisors or VMMs is also a bigger challenge, it requires Trusted Computing Base

Firecracker’s approach is to replace VMM but use KVM

The Firecracker VMM

Firecracker uses KVM as a VMM to support Linux hosts and Linux, OSv guests

they chose to use KVM w/ standard linux prog model as it’s they already are knowledgeable on linux

Implementation started with stripping off code that is no needed from ==crosvm VMM== (Google’s Chrome OS Project)

Device Model

Unlike QEMU which provides more emulated devices, Firecracker only considered whatever that’re needed like network & block devices, serial ports, keyboard controller (i8042)

for nw & block devices, virtio is used

API

→ We chose REST because clients are available for nearly any language ecosystem, it is a familiar model for our targeted developers, and because OpenAPI allows us to provide a machine- and human-readable specification of the API

→ REST APIs exist for specifying the guest kernel and boot arguments, network configuration, block device configuration, guest machine configuration and cpuid, logging, metrics, rate limiters, and the metadata service

Rate Limiters, Performance and Machine Configuration

machine config API allows hosts to config mem and cores that can be used by microvm

nw and block devices can be rate limited based on IOPS if disk, Packet per sec if nw

Security w/ Jailer

→ The (Firecracker’s) jailer implements a wrapper around Firecracker which places it into a restrictive sandbox before it boots the guest, including running it in a chroot, isolating it in pid and network namespaces, dropping privileges, and setting a restrictive seccomp-bpf profile.

Firecracker In Production

Inside AWS Lambda

image 19.png

→ Invoke traffic arrives at the frontend via the Invoke REST API, where requests are authenticated and checked for authorization, and function metadata is loaded

Worker manager handles routing, and info is also replicated for HA

→ Once the Worker Manager has identified which worker to run the code on, it advises the invoke service which sends the payload directly to the worker to reduce round-trips.

There’s also concurrency control in invoking functions against workers

→ Each Lambda worker offers a number of slots, with each slot providing a pre-loaded execution environment for a function

a slot is for single func, will be used for serial invocation like in picture on right

image 1 14.png

→ Where no slot is available, either because none exists or because traffic to a function has increased to require additional slots, the Worker Manager calls the Placement service to request that a new slot is created for the function.

Placement service takes an optimized way in terms of resources like cpu, mem wrt to worker and function, then sends to worker to create a new slot with a time lease

→ Using a lease protocol allows the system to both maintain efficient sticky routing (and hence locality) and have clear ownership of resources.

Firecracker In The Lambda Worker

→ Each MicroVM contains a single sandbox for a single customer function, along with a minimized Linux kernel and userland, and a shim control process.

→ MicroManager provides slot management and locking APIs to placement, and an event invoke API to the Frontend

image 2 13.png

→ The shim process in each MicroVM communicates through the MicroVM boundary via a TCP/IP socket with the MicroManager, a per-worker process which is responsible for managing the Firecracker processes.

front-end on receiving slot details, it’ll go to micro manager which get func resp on passing req payload to lambda shim

→ The MicroManager also keeps a small pool of pre-booted MicroVMs, ready to be used when Placement requests a new slot.

The Role of Multi-Tenancy

→ Slots use different amounts of resources in each state. When they are idle they consume memory, keeping the function state available.

image 3 9.png

→ When they are initializing and busy, they use memory but also resources like CPUtime, caches, network and memory bandwidth and any other resources in the system.

Experiences Deploying and Operating Firecracker & Evaluation

from reference pdf

Post-Read Thoughts

Initial sections were quite a brush up on Virutualization and it’s derived topics

there’s not too much info exposed on programming side of FC, apart from saying that it’s stripped version of Google’s crosvm. It’s FOSS, and best thing is anyone can visit it anytime!

“FC In Production” was the most interesting section and laid foundations for next paper

Considering benchmarks, Cloud HV is actually doing a good job in this space, probably because it’s a more polished product than FC (at that time)

Further Reading

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&query=firecracker&sort=byPopularity&type=story

awslambdaturns

https://brooker.co.za/blog/2024/11/14/lambda-ten-years.html

On-demand Container Loading in AWS Lambda

https://blog.cloudflare.com/virtual-networking-101-understanding-tap/ for n/w

https://brooker.co.za/blog/2022/11/29/snapstart.html