Choosing Secure Open Source Packages Part 1
Many developers don’t feel qualified to make security decisions. In many ways, that’s a perfectly healthy attitude to have: Security decisions are hard, and even folk with training make mistakes. But a healthy respect for a hard problem shouldn’t result in decisions that make a hard problem even harder to solve. Sometimes, we need to recognize that a lot of architectural decisions in a project are security decisions, whether we like it or not. We need to figure out how to make better choices.
One thing we’ve studied a lot in the Intel® Open Source Technology Center is the types of open source packages developers want to use and what makes those packages good or not-so-good from a security perspective. Developers are not choosing bad packages because they are malicious or stupid; developers are choosing them because they have other metrics in mind. Some of these might be:
- Does it meet our needs now and will it do so in the future?
- Does it meet our licensing requirements?
- It is small enough for our desired footprint?
- Is it what others in the industry are using?
- Does it have a good tutorial so we can get someone ramped up to integrate it quickly?
Security often takes a back seat to the plethora of things developers must weigh when making a package decision. Sometimes that’s because security seems less urgent at the beginning than other concerns, but it’s also true that many developers don’t know how to evaluate the security of a package.
While it’s hard to verify secure behaviour of the library, it’s easy to do a really simple security risk assessment. This is especially true for open source packages where it might only take a few minutes to learn about the community and look at the code. Learning to do this will let you take some basic security risks into account when you choose open source packages.
What does a simple security risk assessment for open source packages look like? Here are five steps to help you make better-informed decisions:
- Take a first look. Are there warning signs?
- Check the contributors/activity.
- Check how they handle security issues.
- Look at the test suite.
- Be aware of assumptions.
There is a scorecard at the end of this post with some suggested marking criteria to help those who prefer defined metrics. But first, let’s look at some more details about what each of those steps means.
Step 1: Take a first look
The first step is to do a quick glance at the project and do a sniff-test: does this project have obvious problems? You don’t have to angst over it, just take five or ten minutes to read a few of the project’s readmes and pages to see if any obvious warning signs jump out at you.
Not sure how to do that? Here’s some key questions you might want to consider:
- Have you read the readme, first pages of the website, and other readily available introductory information?
- Does the code appear to be held to good software development standards?
- Is this code used professionally or is it a hobby project?
- Are there any signs that there are known issues with this code?
- Does this code solve a personal problem for the developer, or is it robust enough for other use cases?
- Is this code active or is it an abandoned archive?
- Are there any warning signs?
You might be thinking “I’m not a security expert; how am I possibly going to recognize a security warning sign?” Fortunately, for a simple risk assessment, the warning signs are pretty easy to learn. Many of the warning signs you want to spot should be warning signs for any software development, not specific to security-aware software development. You can see some examples of warning signs found in real open source projects in the next section.
Let's look at some warning signs
The developers tell you to use something else
Open source developers are often quite happy to be clear about any shortcomings in their libraries. It’s a cultural thing in some ways: people are sharing code because they hope others will find it interesting or useful, and they don’t want others to have a bad experience that could be avoided with a gentle warning. For example:
This developer is warning you that this library might have issues surrounding speed and security. Which probably means it’s not a great thing to rely upon for your privacy. So why was this library put out there at all? There might be several reasons:
This could be an older library that was supplanted by a new one
The person who wrote it might have written it as a learning exercise and wanted to share.
The person who wrote it might have thought it was great when they wrote it, but since then has learned a lot more about available alternatives.
It could be that the library was created and is used for research purposes.
********************************************DISCLAIMER************************************************* This code is reference software only and is not feature complete. It should not be used in commercial products at this time. Intel makes no claims for the quality or completeness of this code
Similarly, the warning above is pretty common: this code is provided to help you build your own project, but it’s not production-quality in and of itself.
If the developers have told you that their code has known issues and that alternatives are better, you should listen!
Code of dubious provenance
Sometimes, the warning signs will leave you wondering if the code you’re getting is really what you’re expecting. Here’s a few examples:
“I didn’t write this code but I like it.”
What does this even mean? Is the license of this code legitimate, or are you potentially exposing your project to a copyright lawsuit? Is this a dead copy that’s not being updated? Where did this code come from?
Another common thing we see is code from old repositories. Code.google.com, for example, was shut down in early 2016, so any code from there is either not being maintained or has been moved. There was an easy export to GitHub*, so most projects went there. Sourceforge* is also a common place where you might find unmaintained code. For security purposes, you want code that will get fixes from the upstream community, so if you find something that is not being maintained, you should first look to see if you’ve missed a maintained version. If you can’t find a version that is being maintained, you should probably find an alternative library.
Some sources will also let you know outright if the origin of some code is dubious. CPAN*, as seen in the screenshot above, lets you know if code was not signed by the original author. CPAN notes that this can mean any of the following:
- A co-maintainer uploaded a new release, but because of an oversight wasn’t granted permission on one of the modules. This often happens with distributions that have a different release manager each cycle.
- Someone without co-maintainer permissions forked the distribution and uploaded it.
- An author makes a new release with a new namespace without realizing that namespace is taken by another author.
Without knowing what happened, it’s quite possible that you could download malicious code masquerading as a formerly known and trusted package. If you don’t know for sure where the code comes from, there is something wrong. Those packages might be fine, but you should do some more research to be sure that you aren’t using some weird unsupported fork. Do the legwork or find an alternative package that doesn’t have such a dubious provenance and save yourself the headache entirely! You should feel reluctant to use something in this category.
Code of dubious quality
“CryptoJS is a project that I enjoy and work on in my spare time, but unfortunately my 9-to-5 hasn’t left me with as much free time as it used to. I’d still like to continue improving it in the future, but I can’t say when that will be.”
“Opencsv was developed in a couple of hours by Glen Smith.”
Here are two examples of how a developer might let you know that you’re looking at a hobby project rather than something that is supported. In the first quote, you can see the dev is letting you know that they have limited time for the project. This means that if an issue is found, you cannot count on a fast response and a fix being available in a reasonable time. In the second, it’s not a sign of professional work with good error checking and testing if the code was “developed in a few hours.” It might be fine depending on what’s been done since those first few hours, and it might be fine if you’re only parsing .csv files that you generated yourself, but you’ll need to do more research to make sure that the project matches your expectations.
Some code is very up-front about known security flaws or other issues that make it not ideal to use. One example is:
“[This code is] slower and more subjective to side-channel attacks by nature.”
Sharing code with known problems and vulnerabilities is a great way to do research, but is it something that should be part of your released software?
“This project was made as a “proof of concept” demonstration of how to detect apps on an iOS device, from early 2011. Since then, it has been extensively used in many apps, to the point where Apple made the decision to ban the excessive use of - canOpenURL:, the method which iHasApp relies upon to determine app installation. As a result, using a list of URL schemes for app detection is no longer a viable method.”
Or there’s this one. This was a library to help you determine what other apps might be installed on a device. It was so popular and such a violation of privacy that Apple banned the method they used.
Many open source developers are quite willing to be up-front about known flaws or issues that might affect users of their code. You should be sure to read and take this sort of information into account before using an open source package, as the flaws could easily affect your final product.
Very old code
Sometimes, you’ll come across code that is just really old as far as software goes:
This code was last released in 1999. Remember that security tools and attacker exploit suites have come a very long way in the past few years, so things that were formerly too difficult to exploit are often now easy even for “script kiddies” with low security knowledge to exploit. If the code you’re looking at is designed for older systems, and especially if it hasn’t been updated in years, you may want to proceed with caution.
Red flag words
“cJSON aims to be the dumbest possible parser that you can get your job done with.”
This one might not be so obvious. But let me tell you about some sneaky things that security folk hate seeing:
- “Elegant,” means to a security person: “We didn’t handle any edge cases.”
- “Lightweight,” means: “We cut out all the input validation.”
- “Fast,” means: “We cut out all the error checking.”
- “We wrote a new parser,” means all of the above.
There are certain types of code, such as parsers, that often handle external input. These are among the most common ways that attackers get an initial foothold into your system. If you’re parsing your own generated output, using something small is fine, but if you parse any kind of external output you probably don’t want “the dumbest possible parser.” Instead, you want something robust, well-tested, and mature. So if you see code that touts how fast, lightweight, or elegant it is, proceed with caution and make sure that it has a robust test suite and that it handles errors properly. We will discuss test suites more in Step 4.
Step 2: Check the contributors and activity
After doing a quick read-through of the front of the website and the readme file, the next thing to do is to take a look at the contributors and activity of the project.
Some key questions to ask are:
- How many contributors are active and significant?
- Are the key maintainers doing this as part of a job or a hobby?
- Is this code actively maintained, or is it abandoned?
- How many checkins were there in the past year?
- Are issues being fixed on a regular basis?
- Who is doing the code reviews?
- Who takes over if the main maintainer gets sick?
You’re not going to answer all of these questions in five minutes, but thankfully, common open source tools can give you really quick answers for a few of them.
For example, let’s look at a GitHub project:
On the right, below the short description of the project, you can see that this project has 61 contributors. Sounds great, right? But it’s important to know how many of these contributors are active and significant. Click on the number of contributors to display some contributor graphs:
This view lets you see that really, most of this project came from a single person, and that he hasn’t been seriously active since around 2012. This project suddenly looks a lot less great!
Remember that some of the questions listed above are subjective. For example, what does it mean to be a significant contributor for a project? For some projects, it might mean that a contributor has more than 50 contributions; for others it could mean more than 500.
In the graphs for CIAO*, a relatively young project started in 2016, you can see multiple active significant contributors. In this case, significant contributors have around 200 commits.
On the other hand, being a significant contributor to django* would take more than a thousand commits. This project is also old enough for you to see from the graphs where older contributors might have moved on and where new contributors have taken up maintenance and continued development, but (although it’s not evident from this screenshot) django is still a project with many active and significant contributors.
Step 3: Check how they handle security issues
Now that you’ve taken that first look at the project and seen if the project has active contributors, the next thing to ask is, “How do the project handle security issues?” Again, this is a hard question to answer with nuanced detail, but we don’t need much detail for an initial security screen. For this first pass, you want to know if the project fixes vulnerabilities at all and if they have a plan for handling problems in the future.
Here are some key questions you might ask:
- Is there a clear way to report security vulnerabilities?
- An ideal procedure should involve a way to keep the vulnerability secret until a fix is found.
- Typical good solutions can include sending an email to a special security mailing list or a bug tracker with special “security” flag.
- If there is no way to report security issues specifically, assume the project has not thought about it (this is a bad sign).
- Is there evidence that vulnerabilities are fixed in a timely manner?
- Is there any explanation of what happens when a security issue is reported?
Most groups that have vulnerability-handling procedures make them pretty obvious, because the goal is to clearly explain how to submit vulnerabilities. If you wanted to find Intel’s procedure, for example, you could just search “Intel report vulnerabilities” and find a link to the Intel security center page on vulnerability handling guidelines. Intel’s page is an example of what we want to see.
Another good example from an open source foundation is the security guidelines for Apache*. Apache also has a page explaining exactly what happens after a vulnerability is filed that explains the full procedure very well.
Many smaller projects won’t have any information about how to handle vulnerabilities. In general, this is a sign of a project’s immaturity with respect to security. it doesn’t mean that a project is insecure, but it does mean that the project is likely to be less secure than a project that has information about how to handle vulnerabilities, because no one has taken the time to think about how to handle security issues correctly yet.
If there’s no policy at all, the simplest way to check how security issues are handled is to check the bug tracker:
Do a quick scan for open bugs labelled “security.” How many are there? Has anyone from the development team commented on them? In general, projects that are more security aware will have developers triaging potential security bugs faster than regular bugs. If that’s not what you’re seeing, it might mean that the project is lacking resources for security.
Another thing you can look for on a second pass is whether the project is doing any proactive security work. Is the project looking for potential security bugs before they get reported or exploited by others? Examples of proactive security include things like fuzz testing, static analysis, or security audits. If a project does do proactive security work, a search for the project name and “fuzz” or “static analysis” will usually yield some posts from mailing lists when the project was setting it up (if it is indeed set up). Coverity* also offers free static analysis scans for open source projects, and you can search the list of projects using their tools here to see if the project you’re interested in takes part.
In Part 2 of this blog, we'll look at more guidelines and examples for how to choose secure open source packages. Remember, the first time you do a “simple” risk assessment like this, it probably won’t feel very simple. But with practice you can get a quick sense of a project from reading a few pages, looking at a few sources of information, and answering a few questions. A security expert with open source knowledge can give you much more accurate risk assessments, but not everyone has access to experts, and risk assessment is a good way to narrow down your field of choices while keeping security in mind.
- “Keys” by Jessica Paterson https://www.flickr.com/photos/modernrelics/1093797721
- “Nullarbor Warning Sign” by Chris Fitall http://www.flickr.com/photos/chrisfithall/14664168646