Home
Unchained
Research Blog

Wolfi’s upstream security inspection: Scanning with OpenSSF Scorecard

Eliza DiMarco, Chainguard Labs Research Intern

TL;DR


We evaluated the security of upstream Wolfi packages by running the OpenSSF Scorecard tool on 1,500+ GitHub repositories. Our analysis revealed:


  • The average score is 5.4 out of 10, and the distribution is bell-shaped (aka “normal”).


  • A 100x increase in GitHub stars is associated, on average, with a one point increase in the project’s Scorecard score.


  • Almost all projects satisfied Scorecard’s most heavily weighted check (the “dangerous workflow” check), while others with lower risk levels typically have much more room for improvement.


  • The repositories associated with Ruby and C packages have notably lower average scores: 4.8 and 4.7, respectively.


A first step toward improving the security of open source software is inspecting its current condition. In an effort to better understand the security of the upstream repositories that make up the Wolfi distribution, we used the Scorecard tool to evaluate the security posture of 1,511 GitHub repositories. Scorecard is a tool designed by the Open Source Security Foundation (OpenSSF) to help maintainers improve security practices and judge the safety of their dependencies.


How does Scorecard work?


Scorecard is an OpenSSF tool that enables developers to assess the risks that open source dependencies introduce. Scorecard performs 18 security checks. Checks are scored from 0–10, weighted by severity, and reported in a single aggregate score. 

For example, the most heavily weighted check, “Dangerous Workflow,” measures dangerous code patterns and is labeled “critical.” Less severe practices such as “Code Review” and “License” are labeled “high” and “low” and contribute less to the final score. A full list and explanation of Scorecard’s checks can be found here


How we gathered the data


Scorecard takes a URL to a GitHub source code repository as input. We collected these URLs by scanning through the .yaml file for every package in Wolfi. Scanning the Wolfi .yaml files resulted in 1,550 links to the packages’ repositories. This constitutes approximately 62 percent of the top-level packages in Wolfi.


Finding #1: The average Scorecard score for Wolfi upstream is 5.4


The mean aggregate score of all scanned Wolfi packages was 5.4. See Figure 1 for a histogram of the Scorecard results over all 1,500+ repositories. The distribution appears to be bell-shaped, or “normal.”


Figure 1. Distribution of aggregate Scorecard scores of 1,511 Wolfi packages
Figure 1. Distribution of aggregate Scorecard scores of 1,511 Wolfi packages

Are these scores “high” or “low?” Though it's hard to say given the newness of this tool and style of analysis, past research suggests that these scores are typical. Historically, many open-source projects tend to have Scorecard scores between four and six. The Wolfi upstream repositories scores are similar.


Finding #2: Popular packages have better security


We also assessed whether more popular packages, as measured by GitHub stars, are associated with higher Scorecard scores. See Figure 2 for a scatterplot showing the relationship between number of GitHub stars and the aggregate scorecard score for all analyzed Wolfi upstream repositories. Note: The x-axis is on a logarithmic scale.


Figure 2. Plot of aggregate Scorecard score as a function of GitHub stars
Figure 2. Plot of aggregate Scorecard score as a function of GitHub stars

Figure 2 suggests that a 100x increase in stars is associated, on average, with a one point increase in the project’s Scorecard score. In other words, it appears that popular projects are, on average, more secure.


Finding #3: Almost all projects implement the most critical check


The analysis also examined the extent to which each check is implemented. We were curious if critical checks are more regularly implemented than checks that are high, medium, or low. See Figure 3 for a graphical analysis.


Figure 3. Average scores of checks by severity. Note: the lines represent the averages of each group (critical, high, medium, low) of checks.
Figure 3. Average scores of checks by severity. Note: the lines represent the averages of each group (critical, high, medium, low) of checks.

The only check associated with critical risk appears to be widely implemented. Checks with lower risk levels typically have much more room for improvement.


Finding #4: The average Scorecard score varies by language


Dividing the average scores by programming language is also possible. Table 1 summarizes the results by the five most common languages within Wolfi.

Table 1. Average aggregate score of languages

What does this analysis say about the security of Wolfi upstream?


This analysis suggests that the Wolfi upstream is neither obviously more nor obviously less secure than other open source ecosystems. It’s normal!


Future efforts could include trying to improve the Scorecard score of select Wolfi upstream projects. This analysis helps shed light on areas to focus on such as less popular projects and projects in Ruby and C. Additionally, there are some checks that are relatively little-implemented and could benefit from more widespread implementation in Wolfi.


Please let us know if you’re interested in collaborating on such an upstream improvement project!

Share

Ready to Lock Down Your Supply Chain?

Talk to our customer obsessed, community-driven team.

Get Started