The principle of minimalism
TL;DR: Your default should be the lowest-common denominator of what you actually need.
Your most common security posture will almost always be what you make your default. Over the years we have seen countless examples where a bad default was chosen because it made something easy and then we spend years trying to educate developers to not do that because changing defaults is a breaking API change. This is especially painful when it happens in category-defining technologies that become pervasive such as Docker and Kubernetes.
One example of this is the default root user in Docker containers, which Kubernetes then aligned to. Another example is Kubernetes allowing tag-based deployments (vs. digest) and its default “pull policy”. Kubernetes having unversioned mutable ConfigMaps and Secrets with no meaningful rollout story is yet another (it does now support immutable secrets). As a final example, many runtime systems (Kubernetes included) support projecting secrets into environment variables (you should use volumes!).
Experience shows that developers will reach for roughly the subset of knobs that it takes to get something working and move on. Occasionally rigorous developers will flip a few more knobs, but generally only those of which they are aware. Turning knobs should be the exceptional case, and reserved for those advanced users who wish to intentionally deviate from the secure defaults in specific ways. When we talk about secure by design or secure by default, this is what it means: reducing the number of “necessary knobs” a user must turn to zero. As with many “exception” processes, these should be regularly audited and wherever possible revoked.
The ideas we are talking about here are not new and encompass others you have likely heard of like “least privilege.” Let’s walk through several categories and explore how minimalistic defaults and “exceptions” should apply to them.
Access control
Since we mentioned “least privilege,” let’s start with Access Control. Depending on your workload, there are a number of surfaces where this principle applies. For example, in a Kubernetes context there are typically at least three:
System Calls (syscalls): The syscall boundary between the container and the shared kernel is generally authorized by a set of capabilities, some of which are “privileged.” By default the container should have no privileged capabilities, run as a non-root user and use a minimal seccomp profile.
Kubernetes API Server: Every Kubernetes workload runs as a service account (possibly the namespace default service account). Each workload should run as its own service account and this service account should only be granted access to exactly the “nouns” and “verbs” needed. Wherever possible scope this access to a single namespace (good) or resource (better). When no access is needed, mounting of the service account token should be disabled.
Cloud Provider IAM: Wherever possible, leverage “workload identity” mechanisms to scope IAM roles to a specific workload. This is generally via a service account and another important reason to have these be 1:1. Any capabilities granted to this role should follow similar guidance to #2 and minimize the set of capabilities the workload has access to (starting with an empty set).
Following the “principle of ephemerality” wherever credentials are used to grant access, the lifespan of these credentials should also be minimized.
Networking
The next category is networking. In part thanks to the rising popularity of service meshes, a lot of developers are realizing that “perimeter security” is inadequate and “zero-trust networking” is a necessary mitigation for certain classes of attacks. A zero-trust architecture makes it harder for an attacker to move laterally within your network or “phone home.” A useful analogy is that zero-trust networks are like hard candies, where perimeter security is like hard-shell candies with a “gooey center.”
Generally, developers like to talk about network security in terms of north-south (ingress and egress) and east-west (internal communication). Let’s walk through these.
North-South
I would wager the least contentious thing I will say is that the points of ingress to your network should be minimized and tightly controlled. Services should not be exposed publicly by default, and those that are should have a well thought out security model. This is “Perimeter Security 101.”
More contentious, but gaining acceptance, is the idea that egress should be as tightly controlled. When attackers are scanning for exploitable systems, if your network is allowing a malicious payload to “phone home” it's like rolling out the welcome mat. This behavior is exactly what we saw with the infamous log4j vulnerability. That outbound connection may start as a simple ping, but could later be used to give an attacker remote control of your workload and/or exfiltrate your sensitive data (e.g. customer data, intellectual property, credentials or keys).
In truth, is there really a need for most of your workloads to be able to contact arbitrary public endpoints? Most likely they talk to specific API endpoints, which will often have a strong correlation to the fine-grained IAM policies we discussed above, so you could consider fine-grained egress policies as a sort of extension of fine-grained IAM.
There will certainly be some workloads that need to egress data to a high enough degree in places that it is impractical or impossible to grant exceptions case by case. For example, consider things like web crawlers or customer notification webhooks. However, these are examples of truly exceptional workloads, which should be granted exceptions! The default should be one where egress is denied, and just like fine-grained IAM, we should allow things through with similarly fine-grained policies.
East-West
In zero-trust networking, just because a network request is within your network does not mean it is authorized. It is becoming increasingly commonplace to use “mutual TLS” (mTLS) to both encrypt traffic between services, but also to establish a cryptographically secure “caller identity.”
Building on the strong foundation of mTLS for caller identity, network policies can be crafted to describe the exact set of services a caller should be able to call. At this point they start to look a lot like what we discussed above for egress, “do you really have services that need to arbitrarily call other services on your network?” By allowing unnecessary internal traffic, you enable an attacker to potentially access sensitive data or move laterally within your system. These network policies strongly correlate with the system dependencies within your architecture, and in a sense, track your system’s own internal IAM (whether implicit or explicit).
The foundation of mTLS also enables the secure exchange of credentials between internal services to perform more context-sensitive authorization decisions.
The good news is that most service meshes such as Istio and LinkerD have deep investments into observability that let you empirically see what services you are calling to help you craft these network policies ahead of enforcement.
Filesystem
The next category is the filesystem. Another practice that is becoming increasingly common is minimizing the parts of the filesystem that can be written at runtime. This has become very common for hardened “node” runtimes for container orchestration systems (e.g. CoreOS, Google’s Container Optimized OS, Amazon’s Bottlerocket), which generally have “immutable filesystems.” In the context of containers this is less common because in most (all?) orchestrators it is not the default, but orchestrators such as Kubernetes allow containers to opt into “read-only root filesystems.”
Generally, having a read-only filesystem is the most appropriate default and containers can opt-into writable regions by defining writable “volumes” for filesystem locations where they need to write files.
It is also prudent to be careful and minimize the set of volumes that you link into your containers because certain types of volumes have been responsible for container escape vulnerabilities.
Packages
Our final category is packages. Developers are increasingly recognizing that using traditional Linux distributions (aka “distros”) for container base images is as much an anti-pattern as running multiple processes in containers. Traditional distros cater to system administrators who need to interactively bootstrap environments for other humans, and the vast majority of these distros are not needed for containerized applications at runtime (many are not even used when bootstrapping the container image!). My favorite analogy for this is that traditional distros are more like a JDK, where what your application really needs at runtime is a JRE.
A rapidly growing concern with excess packages is that these become tools for an attacker to use “live off the land” techniques and attack the container or pivot through your system. We are seeing this show up in more and more attacks to enable advanced persistent threats (APTs) to remain undetected for longer by leveraging the veritable cornucopia of tools that most developers leave littering their containers. Put simply, if an attacker can use a preinstalled netcat
or curl
binary, then they can escape any detections from tools that monitor for unknown binaries at runtime.
Another big win from minimizing included packages is improving the signal-to-noise ratio when assessing the impact of CVEs in common packages. For example, if a vulnerability is found in bash, are 100% of your applications really vulnerable? If a package only exists where it is used, the “mean time to remediation” can be dramatically reduced by focusing on applications that are actually affected. This also reduces the scope of binaries that runtime detection tooling needs to track to spot anomalous or malicious activity.
A few years ago these ideas gave birth to the “distroless” container base images, which Dan and I started at Google. These images have been getting a lot of attention as developers have adopted them in popular projects like Kubernetes. Distroless images typically don’t contain things like shells or other tools an attacker might use to “live off the land,” which make it much harder for them to hide in plain sight. However, the original distroless images were at best a Proof-of-Concept. At Chainguard, we have been working on the next generation of distroless tooling with Chainguard Images to make the solution more sustainable, production-ready, and with built-in software supply chain security measures through our “undistro” Wolfi and tools like apko.
What's next?
If these “necessary knobs” and poor defaults we walked through are an area you are struggling with or want to learn more, we can help. At Chainguard, we want to make doing the wrong thing hard for developers and the right thing the default for security. Historically, as we discussed above, poor defaults and tooling can create tension between what’s best for security and developer velocity.. Developers will look for loopholes and other ways around security mandates to unblock themselves and ship software faster. At the same time, security teams often don’t fully understand the ins and outs of core developer tooling like containers or CI/CD pipelines, which are essential tools for developers to get their work done. To help bridge the gaps between developers and CISOs we need to be building security into every point of the software development lifecycle by default. That’s core to our mission at Chainguard. Contact us to learn more.
Ready to Lock Down Your Supply Chain?
Talk to our customer obsessed, community-driven team.