Microsoft confirmed early in September that it had been informed of the first-ever occurrence of a highly dangerous cybersecurity breach. Customers who used Azure’s container-as-a-service (CaaS) offering, called Azure Container Instances (ACI) were notified that Unit 42 at Palo Alto Networks had uncovered an exploit that allowed malicious users to break out of their containers and into other users’ containers. By exploiting a two-year-old vulnerability, attackers could have run code on other users’ containers, stolen secrets, and images, and even run cryptomining operations. The vulnerability, nicknamed “Azurescape,” has been described by Unit 42 researcher Yuval Avrahami as “the first cross-account container takeover in the public cloud.” While there is no evidence of this vulnerability being exploited, customers are still being asked to monitor their containers for unauthorized usage and revoke any privileged credentials deployed on ACI before Aug. 31, 2021.
How did this cross-account container takeover occur?
ACI infrastructure
To understand what a cross-account takeover is, first, it helps to understand what ACI does. ACI is a service provided by Microsoft that allows customers to deploy containers to Azure without having to manage the underlying infrastructure. Developers can focus on building and designing their apps, while ACI handles the scaling, routing requests, and scheduling. Container groups run in isolation without a shared kernel, ensuring that they are protected from other container groups running side-by-side in the public cloud.
The infrastructure of ACI comprises multi-tenant Kubernetes clusters (and service fabric clusters, which are not covered here) that host customers’ containers. While these clusters are multi-tenant, the individual containers run in their own Kubernetes pod in a single-tenant node. This node-per-tenant approach is how ACI ensures strong boundaries between tenants. If a malicious user operating in a cluster breaks out of their container, they will find themselves in an isolated node, unable to communicate with other containers in the cluster, thus preventing a cross-account takeover. Or at least that’s how it should have been.
Breaking out of the container
First, Unit 42 ran WhoC, a container image they created to read the container runtime executing it. They found that Microsoft was using an outdated version of runC (runC v1.0.0-rc2), despite there being several releases since, including a June 2021 stable 1.0 release. The obsolete runC version was nearly FIVE years old and had two known container breakout CVEs. Naturally, the Unit 42 team exploited one of these vulnerabilities, CVE-2019-5736, to break out of the container and into the Kubernetes node. However, they were still within the tenant boundary.
Scanning the environment revealed that the clusters were running Kubernetes v1.8.4, v1.9.10, or v1.10.9. All of these were older versions of Kubernetes released between November 2017 and October 2018, with several known vulnerabilities. Unit 42 tried exploiting CVE-2018-1002102 to spread malicious Kubelets across the cluster but failed when they discovered that Microsoft had already patched this vulnerability by redirecting all exec requests from the api-server to a custom bridge pod instead.
However, that was not all. During this attempt, Unit 42 discovered an authorization header that carried a Kubernetes service account token, which is essentially an unencrypted JSON Web Token (JWT) that can easily be decoded. They found that this token could be used to gain access to the bridge service account, allowing them to execute commands on any pod in the cluster, even the api-server pod. This essentially made them cluster admins with control over all customer containers within the multi-tenant cluster.
A second vulnerability?
Believe it or not, the service account token was only the first of two vulnerabilities the Unit 42 team found. They could gain full administrative control over the multi-tenant cluster by exploiting a server-side request forgery (SSRF) vulnerability in the bridge pod. They found that the api-server did not actually verify if the status.hostIP value was a valid IP. Furthermore, the api-server accepted any string, even URL components. They entered a hostIP value that would trick the bridge pod into executing a command on the api-server container instead of their own container, which once again gave them admin status in the multi-tenant cluster.
What’s the fix?
The bad news is that Microsoft was running ACI on outdated software with known vulnerabilities. The good news is that they patched it as soon as Unit 42 informed them of the exploit and notified customers in the attack area to revoke any credentials deployed before Aug. 31, 2021. Bridge pods no longer send service account tokens to nodes when issuing exec requests, and they verify that a pod’s status.hostIP field is valid before issuing exec requests. However, this attack serves as a grim reminder of the vulnerabilities that can be exploited from within, even with adequate security measures in check.
In addition to revoking deployed credentials on a regular basis, Microsoft recommended following its ACI security recommendations and best practices (linked here and here). They also recommended more in-depth security monitoring and configuring alerts to notify customers of any suspicious behavior.
Unit 42, which discovered the exploit, recommended that enterprises adopt a defense-in-depth (DiD) approach to cybersecurity. DiD involves layering security mechanisms throughout a network to cover a range of potential breaches. While it may not insulate a company against all possible attacks, it increases redundancy so that if one defense fails, another security measure can kick in to protect your data. DiD effectively decreases your attack surface.
Protect yourself from cross-account container takeover
First, don’t assume that your cloud service provider is employing security best practices and using the latest software versions. If you want your containers to run in secure environments, you need to know what you’re signing up for. Audit your providers’ defenses and infrastructure before entrusting them with your data. Ask them about their detection mechanisms for zero-day attacks.
Second, adopt a DiD approach. Strengthening defenses and increasing security redundancy increases the complexity of an attack. This makes it more likely that malicious activity will be detected and mitigated before doing any real damage. Some principles of DiD are:
- Least privilege: A principle central to a DiD approach is the principle of least privileged access: only essential users are granted access to systems and resources, and this access is revoked regularly.
- Detection and prevention: DiD also requires in-depth monitoring systems that alert you to suspicious behavior and implement automated security protocols in response to potential breaches and zero-day attacks.
- Firewalls: Firewalls serve as security gates to your applications and networks, and they can detect and deny access to malicious users or suspicious access points.
- Network segmentation: Splitting your network into segments for each section of your business can ensure that a breach in one department does not compromise the continuity of your entire business.
- Passwords: Another requirement is strong, complex passwords, ideally with multifactor authentication.
Today, it’s not enough to blindly trust your cloud vendor. Though new kinds of attacks like cross-account takeover are making security more complex, there are ways you can protect your organization from them by taking the necessary precautions.
Featured image: Designed by Macrovector / Freepik