Sunday Feb 23, 2014

The goto fail heard ‘round the world

There has been much discussion online this weekend of what happens when you introduce an extra “goto fail” line into your code, such as making a mistake using a merge tool to combine your changes into those made in parallel to a code repository under active development by other engineers. (That is the most common cause I’ve seen of such duplication in source code, and thus the one I see most likely — others guess similarly and discuss that side more in depth — but that’s not what this post is about.)

Adam Langley posted a good explanation of this bug, including a brief snippet of code showing the general class of problem:

 1     extern int f();
 3     int g() {
 4             int ret = 1;
 6             goto out;
 7             ret = f();
 9     out:
10             return ret;
11     }
which he pointed out “If I compile with -Wall (enable all warnings), neither GCC 4.8.2 or Clang 3.3 from Xcode make a peep about the dead code.” Others pointed out that the -Wunreachable-code option is one of many that neither compiler includes in -Wall by default, since both compilers treat -Wall not as “all possible warnings” but rather more of “the subset of warnings that the compiler authors think are suitable for use on most software”, leaving out those that are experimental, unstable, controversial, or otherwise not quite ready for prime time.

Since this has a simple test case, and I’ve got a few compilers handy on my Solaris x86 machine for doing various builds with, I did some testing myself:

gcc 3.4.3

% /usr/gcc/3.4/bin/gcc -Wall -Wextra -c fail.c
[no warnings or other output]

% /usr/gcc/3.4/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
fail.c: In function `g':
fail.c:7: warning: will never be executed

As expected, gcc warned when the specific flag was given.

gcc 4.5.2 & 4.7.3

% /usr/gcc/4.5/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
[no warnings or other output]

% /usr/gcc/4.7/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
[no warnings or other output]

It turns out this is due to the gcc maintainers removing the support for -Wunreachable-code in later versions.

clang 3.1

% clang -Wall -Wextra -c fail.c
[no warnings or other output]

% clang -Wall -Wunreachable-code -c fail.c
fail.c:7:8: warning: will never be executed [-Wunreachable-code]
        ret = f();
1 warning generated.

% clang -Wall -Weverything -c fail.c
fail.c:3:5: warning: no previous prototype for function 'g'
int g() {
fail.c:7:8: warning: will never be executed [-Wunreachable-code]
        ret = f();
2 warnings generated.

So clang also finds it with -Wunreachable-code, which is also enabled by -Weverything, but not -Wall or -Wextra. (I didn’t see a similar -Weverything flag for gcc.)

Solaris Studio 12.3

% cc -c fail.c
"fail.c", line 7: warning: statement not reached

% lint -c fail.c

statement not reached

Both Studio’s default compiler flags and old-school lint static analyzer caught this by default. Though, as I noted last year, warnings about some unreachable code chunks will be seen in some releases but not others, as the information available to the compiler about the control flow evolves.

As anyone who has compiled much software with more than one of the above (including just different versions of the same compiler) can attest, the warnings generated on real code bases by various compilers will almost always be different. There’s many cases where gcc & clang warn about things Studio doesn’t, and vice versa, as every compiler & analyzer has been fed different test cases over the years of bad code that developers wanted them to find, or specific problems they needed to solve. This is why when compiling the Solaris kernel, we run two compilation passes on all the sources - once with Studio cc to both get its code analysis and to make the binaries we ship, and once with gcc just to get its code analysis, plus running it through Oracle’s internal parfait static analyser as well.

Having a growing body of warnings doesn’t help either, if you never do anything about them. This is why Solaris kernel and core utility code have policies on not introducing new warnings and we’ve been making efforts in X.Org to clean up warnings, so that you can see new issues when you introduce them, not lose them in a sea of thousands of existing bits of noise.

Of course, it’s too late to stop the bug that caused this discussion from slipping into the code base, and easy to second guess after the fact how it may have been prevented. We have lots of tools at our disposal and they continue to grow and improve, including human code review (though human review is much less reliable and repeatable than most tools, humans are much more flexible at finding things the automated tools weren't prepared to catch), and we need to use multiple ones (more on the most sensitive code, such as security protocols) to improve our chances of finding and fixing the bugs before we integrate them. Hopefully we can learn from the ones that slip through how to improve our checks for the future, such as adding -Wunreachable-code compiler flags where appropriate.


Engineer working on Oracle Solaris and with the X.Org open source community.


The views expressed on this blog are my own and do not necessarily reflect the views of Oracle, the X.Org Foundation, or anyone else.

See Also
Follow me on twitter


« February 2014 »