Smatch Static Analysis Tool Overview, by Dan Carpenter

December 10, 2015 | 4 minute read
Text Size 100%:

This is a posting by Oracle Linux Kernel Engineer, Dan Carpenter.  In this, he provides an overview of Smatch, the C static analysis tool which he developed, and which he uses to test the mainline Linux kernel code for security bugs.

My job at Oracle is focused on finding security bugs in the Linux
kernel. My favorite type of bug is off by one bugs where the code says:

if (x > ARRAY_SIZE(table))

return -EINVAL;

The problem is that > should be changed to >= it so it says:

if (x >= ARRAY_SIZE(table))

return -EINVAL;

These are a one-line fix and an easy way for me to boost my patch
count. I have made over a hundred of these changes. In fact, I probably
have some kind of record for fixing the most off by one bugs! My record
breaking secret is that I use an open source, static analysis tool which
I wrote called Smatch.

Maybe the easiest way to understand how Smatch works is to download it
and play with it a bit. Here are the instructions:

You will first need
to install some dependencies such as the sqlite development packages for
C, BASH, Perl and Python. Also I would recommend installing the libXML,
gtk+2.0 and LLVM development packages as well but those are not
required. Then run the following commands:

git clone git://repo.or.cz/smatch.git

cd smatch

make

Now let's create a small test.c file:

#include "check_debug.h"
int var;
void function(void)
{

if (var < 0 || var >= 10)

return;

__smatch_implied(var);
}

The "check_debug.h" file can be included into any .c file. It is used
to display internal Smatch information which helps with debugging. If
you run `./smatch test.c`, then that prints the value of the "var"
variable.

test.c:9 function() implied: var = '0-9'

Smatch also tries to track some relationships between variables so let's
change our test.c file to look like this:

#include "check_debug.h"
int a, valid;
void function(void)
{

valid = 0;

if (a >= 0 && a < 10)

valid = 1;

if (a == -1)

__smatch_implied(valid);
}

With this code, since -1 is outside the 0-9 range, that means "valid" is
zero.

test.c:12 function() implied: valid = '0'

We could move the limit check into a separate function if we wanted:

#include "check_debug.h"
int is_valid_month(int month)
{

if (month < 1 || month > 12)

return 0;

return 1;
}
int var;
void function(void)
{

if (is_valid_month(var))

__smatch_implied(var);
}

It prints that valid values are 1-12 as expected.

Basically we're tracking the values of all the variables. The math
behind this is called flow analysis and the core part of Smatch is a
flow analysis engine. The flow analysis engine lets you track more
abstract things as well such as if a pointer has been freed or if it has
been dereferenced. It easy to hook into the Smatch flow analysis engine
and add more checks.

Since 2009, there have been around 3000 kernel bugs patched because of
Smatch warnings.
Most are minor bugs such as there might be an off by
one bug so the computer will crash when someone installs 256 graphics
cards. In that situation the programmer made a real mistake and we will
fix it, but it has no real world impact. Other times even minor
mistakes like returning a wrong error code can be serious, for example
in 6d97e55f7172 ('vhost: fix return code for log_access_ok()') we were
supposed to return zero on failure but instead we returned -EINVAL.
Since -EINVAL is non-zero, that meant access was granted when it was
supposed be denied.

The main complaint about every static analysis tool is that the rate of
false positives is too high. The problem in the Linux kernel is that
the developers fix all the real bugs and so only 100% false positives
remain. It's better to focus on new warnings from newly added code
because those are often real bugs. I try to discourage people from
changing the kernel code just to silence false positives. Changing the
code can be a good thing if it makes the code easier to understand but I
always tell people that Smatch is still improving so, hopefully, there
will be a way to silence the false positive by changing Smatch instead.

I always run Smatch on my patches before sending them to the kernel
maintainers and it saves me from embarrassing mistakes. The command to
do that is:

~/path/to/smatch/smatch_scripts/kchecker --spammy drivers/modified_file.c

Earlier I showed that Smatch can do cross function analysis. It does
analyze short functions inline, as you have seen, but to get the full benefit, you have to build the cross function database. It takes around
three hours. The command to is:

~/path/to/smatch/smatch_scripts/build_kernel_data.sh

Running that command creates a smatch_db.sqlite file. Then re-run the
kchecker script and it will use the new cross function database. Or if
you want to run Smatch over the whole kernel the command is:

~/path/to/smatch/smatch_scripts/test_kernel.sh

If you have any issues or suggestions feel free to email the list at
smatch@vger.kernel.org.

-- Dan


Dan Carpenter


Previous Post

Announcing the general availability of Oracle Linux 7.2

Michele Casey | 2 min read

Next Post


FRIDAY SPOTLIGHT: Oracle Linux 6.8 is now available

Michele Casey | 2 min read