Conquering your Build Horizon
One indicator of good production hygiene is the “freshness” of deployed software. Stale software metastasizes into technical debt, and can even ultimately become a source of vulnerability. This applies to a range of contexts including:
The age of your own software (in the spirit of continuous delivery).
The age of “Off-the-Shelf” (OTS) components (e.g. prometheus, flux, otel-collector, cilium).
The age of dependencies you compile into your software, such as base images.
In this post we talk about a practice dubbed “Build Horizon” at Google that imposes a maximum age on build artifacts.
"Build horizon" refers to the maximum period a build artifact, such as a binary or a container image, is allowed to be used in production before it must be rebuilt. This practice ensures that software dependencies are regularly updated, mitigating risks from outdated libraries, security vulnerabilities, and other issues associated with long-lived builds.
Generally, our philosophy on dependencies is to embrace the Principle of Ephemerality, and wherever possible automate pulling in new dependencies through your standard production qualification process. For library dependencies, tools like dependabot are great. Our own Carlos Panato put together a GitHub action we call digesta-bot that sends us automated pull requests to update our image references (e.g. base images):
However, even with automation to help, things slip through! We recently discovered that we had a leftover service running on one of our own “staging” clusters. We had renamed the service from “foo” to “bar”, and “foo” had not gotten cleaned up. This (and some fun new upstream features in sigstore/policy-controller
) gave us the perfect excuse to put together the “build horizon” policy I had been itching to write, and get us some defense-in-depth against stale artifacts!
This policy works by accessing the container image’s “config” using the new fetchConfigFile
functionality in sigstore/policy-controller
. Let’s look at an example of such a “config” using crane:
# crane config cgr.dev/chainguard/static | jq .
{
"architecture": "amd64",
"author": "github.com/chainguard-dev/apko",
"created": "2022-12-22T00:08:21Z",
"history": [
{
"author": "apko",
"created": "2022-12-22T00:08:21Z",
"created_by": "apko",
"comment": "This is an apko single-layer image"
}
],
"os": "linux",
"rootfs": {
"type": "layers",
"diff_ids": [
"sha256:6c107d6bd6dad5f936c4bd15e4842cb0766992681f9170fc4e888f3638654e1f"
]
},
"config": {
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
],
"User": "65532"
}
}
The container’s config contains a lot of interesting information, including the default entrypoint, user and environment for launching the container. However, for this policy we are after the “created” timestamp emphasized above. When fetchConfigFile
is specified, the input passed to the policy contains a field named config with a mapping from the platform-architecture to its config json (linux/amd64
shown above), e.g. using rego to access the above you would use input.config["linux/amd64"]
, or to act on all architectures use input.config[_]
.
We favor rego support over cue for this policy because it has better time functions, so leveraging the above we can write the following to check that an image was built within the past 30 days:
package sigstore
nanosecs_per_second = 1000 * 1000 * 1000
nanosecs_per_day = 24 * 60 * 60 * nanosecs_per_second
# Change this to the maximum number of days to allow.
maximum_age = 30 * nanosecs_per_day
# isCompliant is what the cosign policy contract checks for.
default isCompliant = false
isCompliant {
created := time.parse_rfc3339_ns(input.config[_].created)
time.now_ns() < created + maximum_age
}
We can wrap this into a ClusterImagePolicy
to control what images it applies to and how severely to treat violations with:
apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
name: maximum-image-age
spec:
# This applies to all images, but you can tailor this to
# match more specific patterns
images: [{ glob: "**" }]
authorities: [{ static: { action: pass } }]
# In warn mode, things won’t be blocked, but they will report
# warnings back to kubectl and show up in Enforce.
mode: warn
policy:
# This policy access the container image’s configuration
fetchConfigFile: true
# We use rego (vs. cue) since it has better time functions
type: "rego"
data: |
package sigstore
nanosecs_per_second = 1000 * 1000 * 1000
nanosecs_per_day = 24 * 60 * 60 * nanosecs_per_second
# Change this to the maximum number of days to allow.
maximum_age = 30 * nanosecs_per_day
default isCompliant = false
isCompliant {
created := time.parse_rfc3339_ns(input.config[_].created)
time.now_ns() < created + maximum_age
}
One “gotcha” with this policy is that it will always trip for naively built reproducible images, since most reproducible images use the Unix epoch as their timestamp. Take for example the Google distroless images, which suffered from this (until recently):
crane config gcr.io/distroless/static | jq .
{
"architecture": "amd64",
"author": "Bazel",
"created": "1970-01-01T00:00:00Z", # This is the unix epoch!
"history": [
{
"author": "Bazel",
"created": "1970-01-01T00:00:00Z",
"created_by": "bazel build ..."
}
],
"os": "linux",
"rootfs": {
"type": "layers",
"diff_ids": [
"sha256:cb60fb9b862c6a89f92e484bc3b72bbc0352b41166df5c4a68bfb52f52504a7d"
]
},
"config": {
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt"
],
"User": "0",
"WorkingDir": "/"
}
}
However, many reproducible build tools support an environment variable called SOURCE_DATE_EPOCH
, which allows users to align the artifact’s timestamp with the timestamp of the source commit on which it is based.
If you are interested in learning more about this topic, please reach out! We're happy to discuss.
Ready to Lock Down Your Supply Chain?
Talk to our customer obsessed, community-driven team.