Managed strings

I'm off to a meeting of the ISO/SC22/WG14, the C programming language committee meeting in a weeks.  Actually, I'm leaving today for a meeting with our my engineering team in St. Petersburg on my way to Berlin for the ISO/SC22/WG14, the C programming language committee meeting.  Another piece of work the committee has been working on for over a year now involves Mitigating Security Vulnerabilities.  This work is about to turn into a Draft Technical Report, currently titled:

Extensions to the C Library Part I: Bounds-checking interfaces

You can read more about it at:

there is a rationale at:

This work has generated alot of interest.  One such area is dealing with the vulnerablilities of manipulating strings in C.  Robert C. Seacord of Carnegie Mellon University has submitted a paper to the committee with ideas on library routines to manage strings to mitigate these issues.  Below is the introduction from the paper, and a link to the full document.


String manipulation errors

Many vulnerabilities in C programs arise through the use of the standard C string manipulating functions.  String manipulation errors include buffer overflow through string copying, truncation errors, termination errors and improper data sanitization.  Buffer overflow can easily occur when copying strings if the fixed-length destination of the copy is not large enough to accommodate the source of the string.  This is a particular problem when the source is user input, which is potentially unbounded.  The usual programming practice is to allocate a character array that is generally large enough.  The problem is that this can easily be exploited by malicious users who can supply a carefully crafted string that overflows the fixed length array in such a way that the security of the system is compromised.  This is still the most common exploit in fielded C code today.  In attempting to overcome the buffer overflow problem, some programmers try to limit the number of characters that are copied. This can result in strings being improperly truncated.  This, in turn, results in a loss of data which may lead to a different type of software vulnerability.

A special case of truncation error is a termination error.  Many of the standard C string functions rely on strings being null terminated.  However, the length of a string does not include the null character.  If just the non-null characters of a string are copied then the resulting string may become improperly terminated.  A subsequent access may run off the end of the string and corrupt data that should not have been touched.

Finally, inadequate data sanitization can also lead to vulnerabilities.  Many applications require data to be constrained not to contain certain characters.  Very often, malicious users can be prevented from exploiting an application by ensuring that the illegal characters are not copied into the strings destined for the application.

Proposed solution

A secure string library should provide facilities to guard against the problems described above. Furthermore, it should satisfy the following requirements:
  1. Operations should succeed or fail unequivocally.
  2. The facilities should be familiar to C programmers so that they can easily be adopted and existing code easily converted.
  3. There should be no surprises in using the facilities. The new facilities should have similar semantics to the standard C string manipulating functions.  Again, this will help with the conversion of legacy code.
Of course, some compromise is needed in order to meet these requirements.  For example, it is not possible to completely preserve the existing semantics and provide protection against the problems described above.

Libraries that provide string manipulation functions can be categorized as static or dynamic.  Static libraries rely on fixed-length arrays. A static approach cannot easily overcome the problems described. With a dynamic approach, strings are resized as necessary.  This approach can more easily solve the problems, but a consequence is that memory can be exhausted if input is not limited.  To mitigate against this issue, the managed string library supports an implementation defined maximum string length.  Additionally, the string creation function allows for the specification of a per string maximum length.


Post a Comment:
  • HTML Syntax: NOT allowed

Douglas is a principal software engineer working as the C compiler project lead and the Oracle Solaris Studio technical lead.


« April 2014