34
loading...
This website collects cookies to deliver better user experience
bash
is similar to zsh
). Now that we have our VM set up and running, fire up the terminal, then download and install Ruboxer, entering your sudo
password when prompted:$ wget https://github.com/DonaldKellett/ruboxer/releases/download/v0.1.0/ruboxer_0.1.0_amd64.deb && sudo apt install ./ruboxer_0.1.0_amd64.deb
$ wget https://github.com/DonaldKellett/ruboxer/releases/download/v0.1.0/ruboxer-0.1.0-1.el8.x86_64.rpm && sudo dnf install ./ruboxer-0.1.0-1.el8.x86_64.rpm
$
/#
at the beginning of each line represents your command prompt and should not be enteredruboxer
, plus a man page you can read by entering the following command (press the up and down arrow keys to scroll, 'q' to exit): $ man 8 ruboxer
. Anyway, the man page is quite terse so let's go through some examples instead.$ wget https://github.com/DonaldKellett/containers-from-first-principles-with-rust/releases/download/v0.1.0/rootfs.tar.xz
$ tar xvf rootfs.tar.xz
$ mv rootfs container1
$ cp -r container1 container2
ruboxer
command:$ ruboxer --help
ruboxer
takes in two arguments, the first of which is the absolute path to the unpacked image and the second of which is the command to execute within the container. Assuming that you executed the above commands in your home directory, the absolute path to your unpacked image is $HOME/container1
. If not, you may need to modify the commands that follow accordingly to get them to work. Let's first peer into the unpacked image:$ ls $HOME/container1
bin dev home lib64 mnt proc run srv tmp var
boot etc lib media opt root sbin sys usr
$ ls /
bin dev lib libx32 mnt root snap sys var
boot etc lib32 lost+found opt run srv tmp
cdrom home lib64 media proc sbin swap.img usr
$HOME/container1
as the image and bash
as the command, which will spawn a shell inside the container. sudo
is required here to elevate privileges to root
- we'll explain why in a moment:$ sudo ruboxer $HOME/container1 bash
dsleung@ubuntu2:~$
to something like root@ubuntu2:/#
(exact output may differ) - in particular, the $
changed to a #
(indicating we're now running as root
instead of a normal user) and the tilde ~
changed to /
(indicating that we moved from our home directory to what appears to the container as the root directory of the filesystem). This should look familiar to you if you have peered inside a Docker container before, using a command similar to the following (this may not work on your newly installed Linux system if it doesn't have Docker installed and running):$ docker container run -it ubuntu:focal /bin/bash
@
and :
above) of a Docker container might look something like 23f66e3c1dc8
instead of retaining the hostname of your host system.# ls /
bin dev home lib64 mnt proc run srv tmp var
boot etc lib media opt root sbin sys usr
$HOME/container1
is the root directory /
- the container cannot (should not) see any files and/or directories outside of the container root $HOME/container1
. This brings us to the POSIX syscall central to implementing containers: chroot()
chroot()
is a syscall (i.e. an API call provided by the OS kernel to applications) that restricts the filesystem view of the currently running process to a particular subdirectory. It is said to virtualize the filesystem, analogous to how containers virtualize the OS and VMs virtualize the hardware. However, it only provides filesystem-level isolation - the chroot()
ed process can still view and interact with all processes running on the host system and access all device nodes, for example. Therefore, a container runtime must combine chroot()
with other methods of isolation to prevent container processes from escaping the container into the host system.chroot()
. Inside the container, run the following command to mount the proc
filesystem at /proc
:# mount -t proc proc /proc
procfs
is a pseudo-filesystem used by the Linux kernel to expose system information (such as a list of running processes) to applications. Mounting it within our container allows command-line utilities present in the container such as ps
to obtain a list of running processes and display them to the user. Now get the list of running processes (as seen from within the container) by executing:# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.3 21964 3568 ? S 08:18 0:00 bash
root 3 0.0 0.2 19184 2304 ? R+ 08:21 0:00 ps aux
ps aux
(which would've died by the time you saw the output). This is a process-level isolation provided by the Linux kernel through ...unshare()
and setns()
. By default, Ruboxer creates a new process namespace for each container so it can only see its own processes, but it also accepts an option --procns-pid
for specifying the process ID (PID) of a process namespace belonging to that process to enter.$ ps aux | grep bash | grep root
to list all processes with the strings "bash" and "root" in them (most of which are actual Bash processes executed by user root
). You should see output similar to the following (possibly with occurrences of "bash" highlighted):root 999 0.0 0.4 11188 4720 pts/0 S 08:17 0:00 sudo ruboxer /home/dsleung/container1 bash
root 1001 0.0 0.0 3140 704 pts/0 S 08:18 0:00 ruboxer /home/dsleung/container1 bash
root 1002 0.0 0.3 21964 3568 pts/0 S+ 08:18 0:00 bash
1002
. Note that this number is very likely different in your case so note down your number and substitute that for 1002
in the upcoming commands.$HOME/container2
as filesystem root and bash
as the command, but this time we join the process namespace corresponding to PID 1002
(replace with your PID) instead of creating a new process namespace:$ sudo ruboxer --procns-pid 1002 $HOME/container2 bash
/proc
again:# mount -t proc proc /proc
# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.3 21964 3568 ? S+ 08:18 0:00 bash
root 4 0.0 0.3 21960 3480 ? S 08:53 0:00 bash
root 6 0.0 0.2 19184 2444 ? R+ 08:54 0:00 ps aux
ps aux
that already exited by the time we see the output. The first Bash process with PID 1 as seen from within the container is that running in our original container, while the second one with PID 4 (in my case, you may get a different PID number) is the current Bash process in the new container we just started. We can also try to execute # ps aux
in our original container again to see this output (exact output may differ):USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.3 21964 3572 ? S 08:18 0:00 bash
root 4 0.0 0.3 21960 3480 ? S+ 08:53 0:00 bash
root 7 0.0 0.2 19184 2396 ? R+ 08:58 0:00 ps aux
sysfs
pseudo-filesystem which resembles a device tree exported by the Linux kernel to applications. By default, Ruboxer enforces a memory limit of 128MiB (base-2) for each container.devtmpfs
(yet another pseudo-filesystem) at /dev
in both the original and new containers:# mount -t devtmpfs devtmpfs /dev
# cat /bin/memeater
/dev/zero
(which can only be accessed by mounting devtmpfs
) in chunks of 16MiB (base-2) repeatedly, until it exhausts all system memory or gets killed by the system. Now run it (in either container) by executing:# memeater
16MiB
32MiB
48MiB
Killed
Ctrl-C
or it eats up all the memory in your system, causing it to hang and become unresponsive.devtmpfs
and procfs
using umount
(NOT a typo!) and exit the container:# umount /dev
# umount /proc
# exit
#
back to $
. Now restart the new container with adjusted memory limits:$ sudo ruboxer --mem-max-bytes 512M $HOME/container2 bash
devtmpfs
and run memeater
:# mount -t devtmpfs devtmpfs /dev
# memeater
memeater
is now allowed to eat much more memory before being killed:16MiB
32MiB
48MiB
64MiB
80MiB
96MiB
112MiB
128MiB
144MiB
160MiB
176MiB
192MiB
208MiB
224MiB
240MiB
256MiB
272MiB
288MiB
304MiB
320MiB
336MiB
352MiB
368MiB
384MiB
400MiB
416MiB
432MiB
Killed
chroot(2)
man page: https://man7.org/linux/man-pages/man2/chroot.2.html
unshare(2)
man page: https://man7.org/linux/man-pages/man2/unshare.2.html
setns(2)
man page: https://man7.org/linux/man-pages/man2/setns.2.html
cgroups(7)
man page: https://man7.org/linux/man-pages/man7/cgroups.7.html
capabilities(7)
man page: https://man7.org/linux/man-pages/man7/capabilities.7.html
seccomp(2)
man page: https://man7.org/linux/man-pages/man2/seccomp.2.html