The OWASP Top Ten 2017 category A9 (which became A6 in OWASP Top Ten 2021) is dedicated to using components with known vulnerabilities. To cover this category in PVS-Studio, developers have to turn the analyzer into a full SCA solution. How will the analyzer look for vulnerabilities in the components used? What is SCA? Let's try to find the answers in this article!
One of the priority areas for PVS-Studio development is to cover categories from the OWASP Top Ten 2017 in the C# analyzer. We also plan to cover the Top Ten 2021 in the future. The most unusual for us is the A9:2017 category - Using Components with Known Vulnerabilities. This category has the A6 position in the preliminary version of OWASP 2021. The rule implementation for this category is an important task for our analyzer. It allows us to classify PVS-Studio as an SCA (Software Composition Analysis) tool. Which approach to implementation should we choose? Let's figure it out!
Using Components with Known Vulnerabilities
The A9 threat category (It turned into A6 in the preliminary OWASP 2021 version) is dedicated to using components with known vulnerabilities. These are the components that have the corresponding entries in the CVE database. CVE (Common Vulnerabilities and Exposures) is a database of records about real-life vulnerabilities in software, hardware, service components, etc.
A9 is quite atypical from the point of view of its coverage in PVS-Studio. That's because the existing analyzer architecture is designed to search for errors in the code itself. The architecture uses syntax trees, semantic model, various technologies like data-flow analysis and others. These technologies were generally sufficient to implement diagnostic rules that would cover certain categories from the OWASP Top Ten 2017.
For example, on the base of the existing data-flow mechanism we implemented taint analysis and various related diagnostic rules:
Each of these rules searches for potential vulnerabilities in code and works by traversing a syntax tree. At the same time, they correspond to one or more OWASP Top Ten 2017 categories. You can find the full list of correspondences here.
The situation with A9 is completely different. From the point of view of C# projects, the rule implementation for A9 is a check of all the project dependency libraries for CVE. In other words, for each dependency, we need to check whether there is a corresponding entry in the CVE database.
This task goes far beyond the usual syntax tree traversal and the study of code semantics. However, we are determined to cover this category. Besides, it is very important that the implementation of the A9 rule would let PVS-Studio position the analyzer as an SCA solution.
Software Composition Analysis
In general, SCA tools are designed to check the project for problematic dependencies.
Another possible problem is the presence of vulnerabilities in the library. In the context of SCA, we are talking about known vulnerabilities — CVE. It's almost impossible to determine the use of a dependency that contains a non-recorded vulnerability :) It is not difficult to guess that if we use a library with a (publicly known) vulnerability, we can make a product vulnerable to various attacks.
Besides, using libraries whose maintenance was discontinued is a dangerous approach. Potentially, these dependencies also contain vulnerabilities. However, developers most likely don't know about them. Fixing such vulnerabilities is out of question — no one is going to do that.
SCA and PVS-Studio
We are gradually coming to the main question — how to implement the SCA functionality? First, we need to say that we are going to develop these features within the coverage of the A9:2017 category (Using Components with Known Vulnerabilities). Thus, we are going to search for dependencies with known vulnerabilities in the first place. However, the PVS-Studio analyzer already has diagnostic rules that warn developers about copyleft licenses:
It is possible that over time we will implement other SCA features.
Detecting components with known vulnerabilities consists of two parts. The first step is to obtain all (both direct and transitive) project dependencies and then search for the CVEs that match them. The first part of this plan seems simple. The second part, though, is more difficult.
At the moment, we plan to implement the specified functionality for the C# analyzer. It's easy to obtain the list of dependencies for a C# project. Roslyn helps us a lot — our analyzer is built on its base. To be more precise, the main factor is the use of the same build platform (MSBuild) and a compiler for all C# projects. At the same time Roslyn is closely related to MSBuild. This makes obtaining the dependencies list trivial.
Since the ecosystem of C++ and Java is much more diverse, obtaining the dependencies list is going to be more difficult. We'll do this another time :).
Well, we got the dependencies from the project. How do we understand which of them have vulnerabilities? Besides, we need to keep in mind that the vulnerability may be relevant only for specific library versions. Obviously, we need some kind of database, where the dependencies, versions, and the corresponding CVEs would be stored.
The main question of implementation: how to find (or, perhaps, create) a database that allows us to compare the available information about project dependencies with specific CVE? The answer on that question depends on the tools you use.
Using a CPE open database
The first option we have studied is the approach used in OWASP Dependency Check. The approach is simple — for each dependency, this utility searches for a corresponding identifier in the CPE (Common Platform Enumeration) database. In fact, the CPE database is a list with information about products, their versions, vendors, and so on. To implement SCA, we must obtain CPE and CVE correspondences. Thus, getting a CVE list is just searching for the corresponding entry in the CPE database.
You can find the CPE database and CVE compliance on the official website National Vulnerability Database. One of the ways to get the necessary information is to use the Rest API. It's described here. For example, the following query allows us to get the first 20 elements of the CPE database including corresponding CVEs:
Below is an example of CPE for ActivePerl:
"title": "ActiveState ActivePerl",
"vulnerabilities": [ "CVE-2001-0815", "CVE-2004-0377" ]
The most important part here is the "cpe23Uri" value. It contains important information for us in a certain format, and, of course, "vulnerabilities" (although they are not a part of the CPE list). For simplicity we read the "cpe23Uri" string as
According to the specification, a hyphen in place of one of the fragments means logical "NA" value. As far as I understand, this can be interpreted as "the value is not set". The "*" character put in place of a fragment means "ANY".
When we implement a CPE-based solution, the main difficulty is to find the right element for each dependency. The problem here is that the library name (obtained when we parsed the project links) may not match the corresponding CPE entry. For example, the CPE list has entries with the following "cpe23Uri":
After processing the entries, the analyzer concludes that they are all related to various versions of a product with the name "asp.net_model_view_controller" released by a company called Microsoft. All these entries correspond to a vulnerability with the CVE-2014-4075 identifier. However, the library in which the vulnerability was discovered is called "System.Web.Mvc". Most likely we'll get this name from the list of dependencies. In CPE, the name of the product is "Microsoft ASP.NET Model View Controller".
Besides, we need to take into account the vendor, whose identifier is an integral part of the CPE entries. There are also problems with this – the actual dependency does not always provide the necessary information in any form suitable for parsing. Not to mention the compliance of this information with any entry from CPE.
You can guess that the similar problems arise with the library version.
Another problem is that many records in the database are not relevant when we look for matches. Let's take as an example the entry given at the beginning of this section:
ActivePerl is a distribution of the Perl language from ActiveState. The probability that something like this would be a dependency on a C# project... Well, is low. There are a lot of "unnecessary" (in the context of analyzing C# projects) entries. It's hard to say how we can teach the analyzer to distinguish them from the useful ones.
Despite the mentioned problems, the CPE-based approach can still be effective. Its implementation should be much trickier than a simple pair of string comparison. For example, the OWASP Dependency Check works in an interesting way. For each dependency, this tool collects evidence strings that can correspond to the vendor, product and version values from the desired CPE.
Using GitHub Advisory
We found another approach to searching for CVEs. We investigate GitHub Advisory to find the entries that correspond to the dependency we need to check. GitHub Advisory is a vulnerability database (CVE) discovered in open source projects that are stored on GitHub. The full list of positions is available here.
After we got acquainted with CPE, we understood that the method of recording data is extremely important when we choose the data source. We have to admit that in this case GitHub Advisory is much more convenient than CPE. Perhaps, this database was originally created for being used by various SCA tools. Anyway, various solutions like GitHub SCA and SCA by Microsoft use this database.
For programmatic access to GitHub Advisory, we need to use GraphQL. It's a powerful technology, but we must note that it's much easier to understand Rest API. Nevertheless, worn out by GitHub's GraphQL Explorer, I finally managed to make a query that outputs almost what I wanted. Namely, it outputs a list of packages and corresponding CVEs. Here's one of the elements I received:
"vulnerableVersionRange": "< 5.8.4"
Obviously, I did not make the most optimal query, so I got a little extra information at the output.
If you're an expert in GraphQL, please write in the comments how you would construct a query that allows you to get a list of matches of this form: (package name, version) => CVE list.
Anyway, the query result clearly indicates the package name — the one that corresponds to this dependency in NuGet. The package name corresponds to CVE, and versions, for which the vulnerabilities are relevant. I'm sure that with a better understanding of this topic, we could easily create a utility that would automatically download all the necessary information.
We must say that selecting packages specifically for NuGet is a useful feature. In many cases (if not all) we would like to look for entries that correspond to a particular dependency among those packages. More specifically, we would like to do it without all the stuff for Composer, pip, etc.
Alas, but this solution has its flaws. At the time of writing this article, the GitHub Advisory had 4753 entries and only 140 NuGet packages. In comparison to the CPE database that contains more than 700 000 entries, this collection doesn't look that impressive. Although we must note that not all CPEs have corresponding CVEs. Plus the description implies that the GitHub Advisory database will contain information about vulnerabilities of GitHub-stored projects only. This awfully narrows the sample.
Nevertheless, the convenience of presenting vulnerabilities in this database at least makes us think about using it, if not as the main, then at least as one of the auxiliary data sources.
Our own database
Powerful SCA tools, such as Black Duck and Open Source Lifecycle Management, form and use their own databases. These databases, judging by the description, contain even more information than the National Vulnerability Database. Obviously, such databases present information in the most convenient form for the relevant tools.
Working on this direction, we have to transform the public data found about vulnerable components into some form convenient for our analyzer. We only need to find data convenient for such a transformation. Most likely, all SCA tools have their own databases of vulnerable components. However, not all of them contain information about vulnerabilities that are not in NVD or some other public source. One of the important distinguishing features of powerful SCA solutions is that they build their custom base that surpasses similar bases of other tools. Therefore, when working on the SCA implementation in PVS-Studio, we will take into account the need to expand our vulnerability base in the future.
Places where vulnerable components are used
It may seem that the implementation of the SCA functionality in PVS-Studio will require creation of something fundamentally new, without the possibility of using any of our existing developments. And, frankly, not in vain. The fact is that dependency analysis is a brand-new functionality and PVS-Studio doesn't have anything like this right now.
However, we have an idea how we can use the existing architecture to enhance our SCA implementation. Instead of simply making the analyzer trigger at the presence of a link to an unsafe library, we will try to look for its use in code. We have plenty of ready-made mechanisms for this :).
In my opinion, if the library is not even used, the analyzer still should warn about its presence among the dependencies. And if the library capabilities are somehow applied in the code, then the analyzer should issue a warning of the highest level. So far these are just thoughts.
As you see, we haven't decided what implementation approach to use. We haven't resolved some issues about it, too. For example: if we use a library with a vulnerability many times in project, should the analyzer issue a warning for each place of use? Or will the user drown in warnings? Should the analyzer issue one warning per file or should we simply raise the level if it detects the use of such library?
We have a lot of such questions about this solution. That's why we would like to know — how would YOU like to see SCA in PVS-Studio? How should an effective tool for finding problematic vulnerabilities work? What level should warnings have? Should we try to find other information sources about vulnerabilities? Should the analyzer trigger at transitive (indirect) dependencies?
Anyway, we are waiting for your comments. Thank you for your attention!