X

News, tips, partners, and perspectives for the Oracle Linux operating system and upstream Linux kernel work

Improving Application Security with UndefinedBehaviorSanitizer (UBSan) and GCC

This blog entry was provided by Diane Meirowitz

Introduction

Lock

The UBSan ("UndefinedBehaviorSanitizer") tool is a very useful, yet relatively unknown member of the GNU/Linux Toolchain family. This tool can improve the security of an application by efficiently detecting several types of errors in the source code. It is a run time tool that reports errors as the program executes.

In this article we introduce and discuss the features of UBSan, we explain how to use it, and we provide some tips and tricks showing how to get the maximum benefit from this tool.

What is Undefined Behavior and How Does it Impact Application Security?

What is undefined behavior? Language specifications often fail to say what the compiler should do when code does not conform to expected values. For example, the C specification does not say what the result should be if an array is indexed with an out-of-bounds value, or what should happen if the shift amount is greater than the size of the expression in a bitwise shift. Since the result for these cases is unspecified, the compiler is free to generate any code that produces the correct result when the values are within the correct range, and ignore the possibility of incorrect values. This means that different compilers may handle these situations differently and in general the result of undefined behavior is unpredictable. The code may "work" with one compiler on certain hardware and not with another combination, it may "work" without optimization and fail with optimization or vice-versa. These situations generally point to an undefined behavior bug in the code.

Undefined behavior can impact security in many ways. A buffer overflow can lead to exposure of personal information or even allow an attacker to take over a machine. As explained in detail here, OWASP Buffer Overflow, "Writing outside the bounds of a block of allocated memory can corrupt data, crash the program, or cause the execution of malicious code." For example, an attacker could take advantage of a buffer overflow in a stack variable to overwrite the function return address, causing the function return to execute the attacker's code. A related issue is a buffer overflow read. A notorious example of this is the OpenSSL Heartbleed bug as described here OWASP Heartbleed Bug, which could expose passwords and other private data.

Example of Security Impact of Undefined Behavior

Buffer overflow is a well-known undefined behavior error, but there are many more types of undefined behavior, and all can compromise a program's security. An example is a Denial of Service (DoS) attack, which occurs when a service is made unavailable due to crashing, hanging, etc. One way this can happen is with signed integer overflow, which may cause a variable to be assigned an unexpected negative value. This could result in nontermination of a loop because the exit condition is never true. As a result, the program/server/site will hang and not be available to users.

Here is an example of an integer overflow that results in an infinite loop:

for (int i = 0; i <= INT_MAX; i++) {
   Do something with i...
}

The loop variable "i" continues to increment for up to a value of 2147483647. Past this point, the int data type can no longer represent the value correctly. This results in integer overflow causing "i" to become negative. This means that the terminating condition is never met so the loop never completes. Fortunately UBSan catches this error and gives the message, "signed integer overflow: 2147483647 + 1 cannot be represented in type int".

What is UBSan?

This tool detects a number of run time undefined behavior errors, including:

  • An array index that is out of bounds
  • Some cases of buffer overflow
  • Integer and pointer arithmetic overflow
  • The result of a shift operation that is too large, or small
  • A missing return statement
  • Plus several more.

How Does UBSan Work?

UBSan is intended to be used as part of the development cycle. There are two steps. First the application is compiled with flags that enable the instrumentation to check for errors. Next, the program is executed. At that point, run time errors may be flagged. It is good practice to run a variety of tests to catch as many errors as possible. Adding UBSan instrumentation slows down programs by around 2 to 3x, which is a small price to pay for increased security.

How is UBSan Different from Lint and ASan?

UBSan is different from the well-known lint tool in that it detects errors while the program executes. In contrast, lint performs a static analysis. While static analysis is very useful and should definitely be done as part of the development cycle, it is not possible for lint to catch most errors that depend upon values computed during the execution of the program.

UBSan's array bounds checking has some overlap with ASan's (AddressSanitizer) buffer overflow checking, but in general these tools detect a different set of errors. ASan checks whether memory references are valid memory accesses, whereas UBSan does true array bounds checking for variables defined as arrays. In some cases, both may flag the same error, but array bounds checking can detect errors using an index that is out of bounds for the declared type, but is still within the range of memory that belongs to the process.

In addition, UBSan detects many other errors, such as integer overflow, and incorrect shifts, that ASan does not cover, whereas ASan detects freed memory references, memory leaks, use after return, and use after scope which UBSan does not cover.

For thorough checking of a program, it is recommended to separately run both ASan and UBSan, as well as static checkers such as lint. For multi-threaded programs, run ThreadSanitizer (TSan) also.

Example Using UBSan

Below is an example using UBSan. This program incorrectly accesses the array using a negative index:

int main () {
   int my_array[10];
   return my_array[-1];
}

UBSan is integrated with recent versions of the gcc compiler. When compiling and linking, add the -fsanitize=undefined option to instruct the gcc compiler to insert instrumentation that checks for undefined behavior. We recommend using at least optimization level -O1 to increase the chance of detecting errors. The options -fno-omit-frame-pointer and -g enable UBSan to reliably display the call stack when using the print_stacktrace option.

This results in the following compile command to use UBSan:

% gcc -fsanitize=undefined -g -O1 -fno-omit-frame-pointer simple.c

Before executing this test program, we set the environment variable UBSAN_OPTIONS to indicate that we want a stack trace for each error. Note that UBSan does not abort when encountering an error.

% setenv UBSAN_OPTIONS "print_stacktrace=1
% ./a.out
simple.c:3:15: runtime error: index -1 out of bounds for type 'int [10]'
#0 0x4005fa in main simple.c:3
#1 0x7f99a329a544 in __libc_start_main libc-start.c:266

More examples of run time errors detectable by UBSan can be found in Section "Interpreting and Fixing Errors Detected by UBSan".

If you do not have a recent version of gcc installed on your system or you have gcc, but not UBSan installed, see Appendix B for information on how to download and enable it. If UBSan is not installed, you might get a message such as ld: cannot find -lubsan.

Compile Time Options for UBSan

To enable UBSan instrumentation, use the following gcc options:

   -fsanitize=undefined -O1 -g -fno-omit-frame-pointer

It is important to use at least optimization level -O1 because it appears that some errors are not detected without optimization. If a stack trace is not needed, -g and -fno-omit-frame-pointer can be skipped.

Make sure to specify -fsanitize=undefined to BOTH the compile and the link line.

See Appendix A for a list of all UBSan suboptions. These can be useful to disable certain checks.

Run Time Options for UBSan

There are two ways to specify run time options for UBSan: an environment variable or a special function. Both methods can be used, but the environment variable overrides any options set in function __ubsan_default_options(). If you want to specify more than one option, separate each option with a colon as shown below.

Option 1 - Define the environment variable UBSAN_OPTIONS

Set UBSAN_OPTIONS prior to executing the program. For example, using bash:

% export UBSAN_OPTIONS="print_stacktrace=1:log_path=./ubsan_errs"
% ./a.out

Option 2 - Define the function __ubsan_default_options() and link it with the code

Here is an example:

const char*__ubsan_default_options() {
   return "print_stacktrace=1:log_path=./ubsan_errs";
}

The latest list of UBSan-specific run time options can be found here. More run time options common to UBSan and the other sanitizers can be found here.

Note that with older versions of gcc some flags may not be available yet.

Here are some useful UBSan options:

  • print_stacktrace=1 requests a stack trace to be emitted for errors, default is 0.

  • log_path=your_pathname give a file to emit errors to instead of stderr. The name gets appended with a period and the current process ID.

  • strip_path_prefix=your_path_prefix strips the given file path prefix from file names in stack traces.

  • halt_on_error=1 requests UBSan to abort the program after the first error, default is 0.

Interpreting and Fixing Errors Detected by UBSan

Sometimes UBSan error messages can be cryptic. Below are several examples with hints to determine what is wrong and how to fix the reported error. We recommend that every error report be carefully reviewed, as our testing has not yet produced even a single case of a false-positive error. It can take a while to figure out what exactly is wrong. Here we explain what to look for to fix these errors.

Error messages: Load of address with insufficient space, Store of address with insufficient space

These confusing messages usually mean either BUFFER OVERFLOW READ (for a load instruction), or BUFFER OVERFLOW WRITE (for a store instruction) is triggered.

Example of buffer overflow read/write:

1  #include <stdlib.h>
2  int main () {
3     int *pointer = (int*)malloc(20*sizeof(int));
4     int x = pointer[22];  /******** buffer overflow read ********/
5     pointer[21] = x;      /******** buffer overflow write ********/
6     return x;
7  }

% gcc -g -O1 -fno-omit-frame-pointer -fsanitize=undefined insuff.c
% ./a.out
insuff.c:4:7: runtime error: load of address 0x00000236bc78 with insufficient space for an object of type 'int'
insuff.c:5:13: runtime error: store to address 0x00000236bc74 with insufficient space for an object of type 'int'

Variable "pointer" is indexed out of bounds on lines 4 and 5. At line 4, an attempt is made to read from an invalid address, while at line 5 a value is stored at an invalid address. No array index out of bounds errors are reported for this test because the variable "pointer" is not defined as an array.

Error message: Index out of bounds

An array is indexed outside of the valid range of its type definition. In C/C++, arrays range from 0:size-1. Note that with this error you may get an "insufficient space" error message in addition to the index out of bounds error.

This is error is illustrated in the following example where the non-existent element "array[10]" is referenced:

1   int main () {
2      int array[10];
3      return array[10];
4   }

% gcc -g -O1 -fno-omit-frame-pointer -fsanitize=undefined aob.c
% ./a.out
aob.c:3:15: runtime error: index 10 out of bounds for type 'int [10]'
aob.c:3:15: runtime error: load of address 0x7ffd370e3268 with insufficient space for an object of type 'int'

Error message: Integer and pointer arithmetic overflow

In the following example the errors are in the expressions on the right hand side of the equal sign. These errors occur because C/C++ evaluates many integer expressions as an int, which is signed and generally 32 bit in size. For example, if "j" is an int, the expression "j + 1" has type int. To force the expression type to be unsigned long, use 1lu, instead of 1.

Even if the assignment destination variable is large enough to hold the result of the expression, an error will occur when the expression is evaluated, unless the expression is forced to be larger through an explicit cast. Unfortunately, the error of assigning a too large value to a variable is not currently diagnosed by UBSan.

Fixing the error on line 5 is as simple as inserting (unsigned int) before argc, but make sure the destination is large enough (at least unsigned int) to receive the value.

1   #include <limits.h>
2   int main (int argc, char **argv) {
3      void *pointer = (void*)ULONG_MAX;
4      pointer += 2;                    /* pointer arithmetic overflow */
5      unsigned int i = INT_MAX + argc; /* integer overflow */
6      return i;
7   }

% gcc -g -O1 -fno-omit-frame-pointer -fsanitize=undefined overflow.c
% ./a.out
overflow.c:4:11: runtime error: pointer index expression with base 0xffffffffffffffff overflowed to 0x000000000001
overflow.c:5:28: runtime error: signed integer overflow: 1 + 2147483647 cannot be represented in type 'int'

Error message: Shift result cannot be represented

By default in C/C++, many integer expressions are evaluated as int, and 1 << 31 does not fit in the space allocated for an int, even though the result is assigned to an unsigned int. To force the expression to be unsigned int, cast the 1 or the 31 or both to unsigned int or use a type suffix, such as 1u or 31u.

1   int main (int argc, char **argv) {
2      unsigned int j = 1 << 31;
3      return j;
4   }

% gcc -g -O1 -fno-omit-frame-pointer -fsanitize=undefined shiftres.c
% ./a.out
shiftres.c:2:22: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

Error message: Shift exponent is too large for type

The shift exponent refers to the shift count on the right of the shift operator, e.g., expression << exponent. The exponent cannot be larger than the number of bits of the result of the expression. This example shows the importance of operator precedence rules in C. It may be counterintuitive, but + and - have higher precedence than << and >>, meaning that the following expression is evaluated as 1 << (63 + argc), rather than the intended (1 << 63) + argc. The moral of this story is to always use parentheses (and UBSan!).

1   int main (int argc, char **argv) {
2      unsigned long j = (unsigned long)1 << 63 + argc;
3      return j;
4   }

% gcc -g -O1 -fno-omit-frame-pointer -fsanitize=undefined shiftexp.c
% ./a.out
shiftexp.c:2:39: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'

Note that the data type of a shift expression is based on the type of the value to be shifted, not the exponent, and this can lead to unexpected errors. In the example below, "1 << exponent" is evaluated as int, despite j being of type unsigned long. To fix this, use "1lu" or (unsigned long)1, rather than "1" in the shift.

1   int main (int argc, char **argv) {
2      unsigned long exponent = 63lu;
3      unsigned long j = 1 << exponent;
4      return j;
5   }

% gcc -g -O1 -fno-omit-frame-pointer -fsanitize=undefined shiftexp2.c
% ./a.out
shiftexp2.c:3:23: runtime error: shift exponent 63 is too large for 32-bit type 'int'

How to Disable One or More UBSan Checks

UBSan has an extensive set of gcc compile time suboptions which makes it easy to disable checks which are not useful to a project. To do this, use the -fno-sanitize=suboption syntax for the particular suboption to be disabled. For example -fno-sanitize=null disables null pointer checking. See APPENDIX A for the complete list of UBSan suboptions.

How to Disable UBSan for a Specific Function

It should be a very rare occurrence to disable UBSan for a function, but sometimes this may be needed. We recommend doing this only after exploring all other options because false positive errors almost never happen. Only very small functions should be bypassed so that the majority of the program benefits from undefined behavior checking. To disable UBSan for a function, the following attribute can be added to its definition:

__attribute__((no_sanitize("undefined")))
void myfunc() {
   ...
}

UBSan in an Operational Environment

At Oracle, we often use UBSan on large binaries on Oracle Linux and have found it to be invaluable to improve security. Despite regular testing of this code with both static and dynamic tools, we find that UBSan detects an entirely different set of errors from the other tools. Many of the examples above were derived from real errors that UBSan found during this ongoing testing.

Gotchas

  • Do not try to compile with both ASan and UBSan at the same time, even though it is possible to do so. Our experience is that the source code needs to be instrumented separately and different runs have to be conducted to detect all errors.
  • There is no -fsanitize=all option, because some sanitizers do not work together.
  • Make sure to specify -fsanitize=undefined to BOTH the compile and the link line.
  • Missing return statements are detected in C++ code only.
  • Use at least optimization level -O1 in order for all errors to be detected.
  • To get reliable stack traces, make sure to compile with both -g and -fno-omit-frame-pointer.

Appendix A - The gcc Compile Time Options for UBSan

Here are gcc's release 10 UBSan suboptions. All are enabled by default with -fsanitize=undefined unless otherwise specified below. The utility of having an extensive list of suboptions is that checks can be disabled using their -fno-sanitize counterpart. For example -fno-sanitize=null disables null pointer checking.

The suboptions are also documented in the Program Instrumentation Options section of the Using the GNU Compiler Collection (GCC) guide.

-fsanitize=shift Enables checking that the result of a shift operation is not undefined. This option has two suboptions, -fsanitize=shift-base and -fsanitize=shift-exponent.

-fsanitize=shift-exponent Enables checking that the second argument of a shift operation is not negative and is smaller than the precision of the promoted first argument.

-fsanitize=shift-base If the second argument of a shift operation is within range, check that the result of a shift operation is not undefined.

-fsanitize=integer-divide-by-zero Detect integer division by zero as well as INT_MIN / -1 division.

-fsanitize=unreachable The compiler turns the _builtin_unreachable call into a diagnostics message call instead. When reaching the _builtin_unreachable call, the behavior is undefined.

-fsanitize=vla-bound Instructs the compiler to check that the size of a variable length array is positive.

-fsanitize=null Enables pointer checking. Particularly, the application built with this option turned on will issue an error message when it tries to dereference a NULL pointer, or if a reference (possibly an rvalue reference) is bound to a NULL pointer, or if a method is invoked on an object pointed to by a NULL pointer.

-fsanitize=return Enables return statement checking. Programs built with this option enabled issue an error message when the end of a non-void function is reached without actually returning a value. This option works in C++ only.

-fsanitize=signed-integer-overflow Enables signed integer overflow checking. UBSan checks that the result of +, *, and both unary and binary - does not overflow in the signed arithmetics. Note, integer promotion rules must be taken into account. The following is not an overflow because a++ is equivalent to a = a + 1, and a + 1 is evaluated as an int expression.

signed char a = SCHAR_MAX;
a++;

-fsanitize=bounds Enables instrumentation of array bounds. Various out of bounds accesses are detected. Flexible array members, flexible array member-like arrays, and initializers of variables with static storage are not instrumented.

-fsanitize=bounds-strict Enables strict instrumentation of array bounds. Most out of bounds accesses are detected, including flexible array members and flexible array member-like arrays. Initializers of variables with static storage are not instrumented.

-fsanitize=alignment Enables checking of alignment of pointers when they are dereferenced, or when a reference is bound to insufficiently aligned target, or when a method or constructor is invoked on insufficiently aligned object.

-fsanitize=object-size Enables instrumentation of memory references using the __builtin_object_size function. Various out of bounds pointer accesses are detected.

-fsanitize=float-divide-by-zero Detect floating-point division by zero. Unlike other similar options, -fsanitize=float-divide-by-zero is not enabled by -fsanitize=undefined, since floating-point division by zero can be a legitimate way of obtaining the bit representations for Inf and NaN.

-fsanitize=float-cast-overflow Enables floating-point type to integer conversion checking. It is checked that the result of the conversion does not overflow. Unlike other similar options, -fsanitize=float-cast-overflow is not enabled by -fsanitize=undefined. This option does not work well with FE_INVALID exceptions enabled.

-fsanitize=nonnull-attribute Enables instrumentation of calls, checking whether null values are not passed to arguments marked as requiring a non-null value by the nonnull function attribute.

-fsanitize=returns-nonnull-attribute Enables instrumentation of return statements in functions marked with returns_nonnull function attribute, to detect returning of null values from such functions.

-fsanitize=bool Enables instrumentation of loads from bool. If a value other than 0/1 is loaded, a run-time error is issued.

-fsanitize=enum Enables instrumentation of loads from an enum type. If a value outside the range of values for the enum type is loaded, a run-time error is issued.

-fsanitize=vptr Enables instrumentation of C++ member function calls, member accesses and some conversions between pointers to base and derived classes, to verify the referenced object has the correct dynamic type.

-fsanitize=pointer-overflow Enables instrumentation of pointer arithmetics to check for overflow.

-fsanitize=builtin Enables instrumentation of arguments to selected builtin functions. If an invalid value is passed to such arguments, a run-time error is issued. E.g. passing 0 as the argument to __builtin_ctz or __builtin_clz invokes undefined behavior and is diagnosed by this option. While -ftrapv causes traps for signed overflows to be emitted, -fsanitize=undefined gives a diagnostic message. This currently works only for the C family of languages.

Appendix B - Installing gcc with UBSan on Oracle Linux

UBSan was first available in gcc 4.9 and it has been much improved since then. There are ongoing improvements and we recommend using the most up-to-date version of gcc to leverage the features and bug fixes.

First of all, you can download Oracle Linux here: www.oracle.com/linux. Then use the following commands to install gcc from devtoolset-8 (or later) with UBSan. Remember to add the scl enable command to your shell's initialization files after installing as described below.

Installing gcc and UBSan on Oracle Linux 7

Make sure that you have the "Software Collections" repository enabled, and if not, edit public-yum-ol7.repo and enable it by setting enabled=1. For example:

% cd /etc/yum.repos.d/
% grep -A5 collections public-yum-ol7.repo
[ol7_software_collections]
name=Software Collection Library for Oracle Linux 7 ($basearch)
baseurl=https://yum.oracle.com/repo/OracleLinux/OL7/SoftwareCollections/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

If you have not already done so, install devtoolset-8 (or later):

% yum install devtoolset-8

Install devtoolset-8-libubsan-devel, which will also cause installation of the dependent package libubsan1:

% yum install devtoolset-8-libubsan-devel
...
Installing:
devtoolset-8-libubsan-devel x86_64 8.3.1-3.2.0.1.el7 ol7_software_collections 186 k
Installing for dependencies:
libubsan1     x86_64 9.3.1-2.el7 ol7_software_collections   149k

After installation, you need to enable the devtoolset software collection. See below for instructions.

Installing gcc and UBSan on Oracle Linux 8

Make sure that both the BaseOS and the Application Stream are enabled in oracle-linux-ol8.repo, and if not, make it so:

% cd /etc/yum.repos.d/
% grep -B6 enabled=1 oracle-linux-ol8.repo
[ol8_baseos_latest]
name=Oracle Linux 8 BaseOS Latest ($basearch)
baseurl=https://yum$ociregion.oracle.com/repo/OracleLinux/OL8/baseos/latest/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1
[ol8_appstream]
name=Oracle Linux 8 Application Stream ($basearch)
baseurl=https://yum$ociregion.oracle.com/repo/OracleLinux/OL8/appstream/$basearch/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle
gpgcheck=1
enabled=1

% cd /etc/yum.repos.d/
% cd /etc/yum.repos.d/

Install gcc (if it is not already installed):

% yum install gcc

Install libubsan:

% yum install libubsan

After installation, you need to enable the devtoolset software collection as described in the next section.

Enabling the devtoolset software collection

If you have installed gcc and ubsan from a devtoolset or from a gcc-toolset, you'll need to enable the software collection so it can be found. Generally this would be put into your shell's initialization file. You can use any shell instead of bash, below.

% scl enable devtoolset-8 bash

Appendix C - More Information

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.