How Grafeas handles metadata and why your Kubernetes system needs it

Anyone who’s ever downloaded a file or visited a web page over the Internet would have come across the term “downloading metadata.” It normally only takes a few seconds and can seem rather insignificant. It’s anything but insignificant, however, and metadata has often been compared to a catalog in a library or the index of a book. To put it plainly, if a book were a block of data, metadata is the index that tells you exactly what that block of data is, what it can do, where to find it, and how best to organize it. Quality and monitoring have long been a thorn in the side for people administering container environments in a software supply chain, mostly because conventional tools just aren’t built to deal with containers on scale. There’s a long list of things that can go wrong with regards to container security, ranging from managing vulnerabilities with image security to downloading registries from questionable origins. It all basically boils down to a trade-off between deployment velocity and security, with the former going up and the latter down. With Grafeas, an open source project that defines a metadata API for software components, you can worry less about the trade-off.

Securing the supply chain

Grafeas Security is not something we can afford to lose in any kind of trade-off where building software is concerned and efforts are being made to make containers easier to track and monitor. It’s important to understand that the scale and ephemeral qualities that give containers their deployment velocity, act as a double-edged sword here and make them “almost” impossible to keep track of. This is where our old friend “metadata” steps in and saves the day, as tracking the metadata just happens to be one way to effectively keep track of the corresponding containers.

The reason behind this is that while containers come and go as they please, the metadata (which is a record of exactly what transpired and how) is published and available for scrutiny. Metadata supports security, integration, and orchestration and is pretty straightforward to gather. If you happen to be using Docker for example, with Kubernetes on top of Docker and Rancher on top of Kubernetes, you can gather the metadata from all three platforms just by pulling the Docker API.

Now while access to the metadata does give us the basic tools necessary to understand a container attack or a threat, organizations generate vast quantities of metadata, all in different formats, from different vendors and generally stored in different places. Additionally, at each stage of the software supply chain (code, build, test, deploy and operate), different tools generate metadata about various software components. These can include the identity of the developer, when the code was built, what tests were conducted, what vulnerabilities were detected and so on.

Truth serum for containers

Grafeas
The problem here is that without some sort of “standard metadata format,” it’s quite difficult to answer questions like whether a certain software component is currently deployed, or if a certain vulnerability is going to affect your production code. Grafeas is meant to provide a uniform metadata schema that allows VMs, containers, JAR files, and other software artifacts to describe themselves to the environments they run in and to the users that manage them.

In other words, Grafeas provides that central source of “truth” for organizations trying to track metadata from containers. It does this by capturing all the metadata and making it available through its metadata API, so build, auditing and compliance tools can then connect to and query it. The goal is to allow other processes in that environment to communicate with and make changes to the software in a consistent and reliable way.

Grafeas uses two API concepts, notes, and occurrences. Notes are basically details about some aspect of the software, it could be a description of a known vulnerability, details on how it was built, a history of its deployment and the like. Occurrences are instances of notes, or in other words, records as to when the instance occurred along with details on where and how it “happened.” Details of a known software vulnerability, for example, could have occurrence information describing which vulnerability scanner it was detected by, when it was detected, and whether or not the vulnerability has been addressed yet.

Here comes the judge

grafaes

As part of Grafeas, Google also introduced Kritis (which means “judge” in Greek), a deployment authorization framework for Kubernetes. Kritis allows us to use the metadata stored in Grafeas to build and enforce real-time deployment policies with Kubernetes. Like Grafeas, Kritis also uses two key concepts, attestation authorities and policies.

“Attestations” are made by attestation authorities and serve to further validate the authenticity of a deployment policy, attestation authorities are described as named entities that can attest or “certify” a deployment. A policy would then name one or more attestation authorities whose attestations are required to deploy a container.

While a software is being developed, tested, and run, a number of audits are performed against the containers and attestations are generated. These attestations make up the policies that can be enforced with Kritis on Kubernetes. Enforcement policies are based on proof of container image properties that are stored in Grafeas.

A core part of the Kritis framework is a Kubernetes Admission Controller that checks for expected “attestations” and blocks deployment when they are not present.

If, for example, the metadata pulled from Grafeas does not have an attestation that certifies its image compliance, deployment will be blocked.

Grafeas at work

Now while Grafeas grew out of work at Google, others like JFrog, Red Hat, IBM, Black Duck, Twistlock, Aqua Security, and CoreOS have since joined the initiative. This open source initiative to define a uniform way for auditing and governing the modern software supply chain is pretty much production ready and is already being used by e-commerce platform Shopify.

Shopify has 6,000 container builds a day and 330,000 images in its primary registry, so tracking them all has definitely been a challenge. In a recent blog post, the company explains how integrating Grafeas and Kubernetes in its Kubernetes pipeline has allowed it to automatically store build and vulnerability information about each container. Additionally, it also allows for strict enforcement of a built-by-Shopify policy which makes sure all Kubernetes clusters only run images signed by its builder.

Google is hosting its alpha Grafeas API and hopes to build an ecosystem around it. Since it’s not coupled with any particular platform or technology, it can run anywhere and store metadata wherever it lives, even in hybrid cloud use cases. It also integrates and aggregates metadata from existing tools. A hosted alpha implementation of Kritis, called binary authorization (BinAuthz), is available on Google Container Engine (GKE) and is planned for a further open source release soon. Key concepts of BinAuthz are the “attestation authority” and “Policy,” realized as REST resources managed through a REST API.

So, it’s a lot easier to track and monitor automated processes based on metadata extracted from its components. Since these components are more often than not created with different technologies, a standard format makes it a lot easier to deal with. In addition to tracking your deployment chain, however, if by the same standards, you start adding your own private metadata, Grafeas and Kubernetes can help you open up a whole new world of opportunity for automation and governance.

Leave a Comment

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top