Tuesday, 21 November, 2017 UTC


Summary

Last week, we released our first annual State of Open Source Security report. One of the discoveries the report mentions is that an analysis of around 433,000 sites found that 77% of them use at least one front-end JavaScript library with a known security vulnerability. This number mirrors the one we reported back in March, but thanks to Google Chrome’s Lighthouse now testing for vulnerable JavaScript libraries using Snyk, we can get much more thorough results.
Lighthouse data is collected as part of HTTP Archive, and the data is available for querying through BigQuery. As a result, we get to query Lighthouse audit data on a very large scale.
Looking at how many sites are vulnerable
The October 15th data (the most recent run available) on BigQuery contains data collected from 439,176 different urls. After you account for urls where Lighthouse was unable to run, or the audit itself didn’t complete for whatever reason, we get a dataset of 418,112 different sites to query against.
The first question is how many of those sites carry known vulnerabilities. We can run a query against the reports to get that information:
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
  JSON_EXTRACT_SCALAR(report, "$.audits.no-vulnerable-libraries.score") AS score,
  COUNT(0) AS volume
FROM
  [httparchive:har.latest_lighthouse_mobile]
WHERE
  report IS NOT NULL
GROUP BY
  score
HAVING
  score IS NOT NULL
ORDER BY
  score
The results are very much inline with our smaller scale study back in March: 77.3% (323,132) of those sites failed the audit. In other words, 77.3% of those sites contain at least one client-side JavaScript library with a known security vulnerability. The new version of the HTTP Archive site will report on how this changes over time.
We can drill-down even more to see how many known vulnerabilities are being carried by those libraries:
1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT 
  REGEXP_EXTRACT(JSON_EXTRACT_SCALAR(report, "$.audits.no-vulnerable-libraries.displayValue"), r'^\S*') AS knownVulnerabilities,
  COUNT(0) AS volume
FROM
  `httparchive.lighthouse.2017_10_15_mobile`
WHERE
  report IS NOT NULL
AND
  JSON_EXTRACT_SCALAR(report, "$.audits.no-vulnerable-libraries.score") = 'false'
GROUP BY
  knownVulnerabilities
ORDER BY
  CAST(knownVulnerabilities as int64)
It turns out, that if you carry at least one known vulnerability, you likely carry more. 51.8% of vulnerable sites carry more than one known security vulnerability. While the majority of those sites carry one or two, the long-tail is scary. 9.2% of sites carry libraries with a combined four or more known security vulnerabilities.
Which libraries are the most often found to be vulnerable
Using the Lighthouse audit data, we can also get an idea of which libraries are most commonly found to be vulnerable.
First, we can query to see which libraries are detected most often—whether they are vulnerable or not. The following query grabs the ten most commonly found libraries:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CREATE TEMPORARY FUNCTION getLibs(items STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  try {
    return  items.match(/"name":"([^"]*)"/ig);
  } catch (e) {
    return [];
  }
""";

SELECT library, COUNT(0) Volume
FROM (
   SELECT getLibs(JSON_EXTRACT(report, "$.audits.no-vulnerable-libraries.extendedInfo.jsLibs")) AS libs
   FROM `httparchive.lighthouse.2017_10_15_mobile`
)
CROSS JOIN 
   UNNEST(libs) AS library
GROUP BY library
ORDER BY Volume DESC
LIMIT 10
Library Number of times detected Adoption %
jQuery 344,643 82.4%
jQuery UI 83,075 19.9%
Modernizr 63,122 15.1%
Bootstrap 57,154 13.7%
yepnope 41,537 9.9%
FlexSlider 33,002 7.9%
Underscore 17,633 4.2%
Google Maps 14,312 3.4%
Moment.js 14,038 3.4%
SWFObject 13,521 3.2%
Unsurprisingly, jQuery tops the list. This is right in line with what we saw back in March, and what you would probably expect. No library yet has come close to reaching jQuery’s universal appeal. One caveat here: React is currently being underreported. Once the updated detection script has been pulled into Lighthouse, its numbers will increase (and the overall percentage of vulnerable sites will likely increase slightly as well).
Now, let’s change it up and look at which libraries are found to be carrying known vulnerabilities.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CREATE TEMPORARY FUNCTION getLibs(items STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  try {
    return  items.match(/"name":"([^"]*)"/ig);
  } catch (e) {
    return [];
  }
""";

SELECT library, COUNT(0) Volume
FROM (
   SELECT getLibs(JSON_EXTRACT(report, "$.audits.no-vulnerable-libraries.extendedInfo.vulnerabilities")) AS libs
   FROM `httparchive.lighthouse.2017_10_15_mobile`
)
CROSS JOIN 
   UNNEST(libs) AS library
GROUP BY library
ORDER BY Volume DESC
LIMIT 10
The top couple of names on the list are very similar.
Library Number of times found vulnerable % of all instances of this lib detected
jQuery 318,786 92.5%
jQuery UI 74,486 89.7%
Moment.js 10,245 73.0%
AngularJS 7,609 84.8%
Handlebars 3,129 60.7%
Mustache 1,925 51.0%
YUI 3 559 40.3%
jQuery Mobile 413 3.7%
Knockout 407 19.6%
React 181 10.2%
Looking at the percentages doesn’t paint a rosy picture. 92.5% of jQuery versions, the most popular library on the web by far, in production carry a known security vulnerability. In fact, of the ten libraries most commonly found to be carrying a known vulnerability, six of them are vulnerable in the majority of versions found in production.
This is the case despite the fact that every one of the libraries on this list has versions available that do not carry these vulnerabilities.
Library Oldest Version with No Known Vulnerabilities Release Date
jQuery 3.0.0 June, 2016
jQuery UI 1.10.0 January, 2013
Moment.js 2.15.2 October, 2016
AngularJS 1.6.1 December, 2016
Handlebars 4.0.0 September, 2015
Mustache 2.2.1 December, 2015
YUI 3 3.10.3 June, 2016
jQuery Mobile 1.2.0 October, 2012
Knockout 3.0.0 October, 2013
React 0.14.0 October, 2015
Each of the front-end libraries most commonly found to be vulnerable has been free of known vulnerabilities for anywhere from one to five years. The reality is that front-end libraries and frameworks often don’t get updated after they hit production.
Reason for Hope
The picture is a bit grim right now—there’s no way to deny it. While this data doesn’t mean that all 77% of these sites are exploitable (it’s possible they could be avoiding the vulnerable methods), that’s small consolation. That’s 77% of sites that are one developer making one method call away from being vulnerable. As we’ve seen in 2017, open-source vulnerabilities need to be taken very seriously.
But there’s also a bright side. While there are a large number of vulnerabilities in production, those vulnerabilities have been addressed in the libraries themselves. Each of the major libraries has versions available that are free of known security vulnerabilities—we just need to get them into production.
To get to a better situation, we need a few things to happen. The first is improved tooling and tooling adoption. According to our State of Open Source Security survey, 38% of people using open-source don’t use any sort of automated tools to help keep their packages up to date. I am willing to wager that if you were to look specifically at front-end JavaScript usage, you would see even lower adoption.
That number should improve. Improvements to npm and Yarn have made front-end package management much simpler for developers. Pairing a solid package management workflow with tools—like Snyk—that will help you to find, prevent, fix and monitor those packages for dependencies will go a long way towards making the web more secure.
The second thing we need is for an increase in the general awareness and understanding of the problem. It’s why we published the State of Open Source Security report—to shed light on the challenges faced in securing open source and help find ways we can improve.
Having the vulnerable libraries audit in Lighthouse (and Sonar) also helps. These tools make it much easier for developers to spot issues on the sites they build. And thanks to the HTTP Archive and BigQuery, we have easy to access data to help us see how the problem scales.
While the data right now isn’t encouraging, improved awareness and improved tooling make this a solvable problem for the future.