Thursday May 07, 2009

The perils of strlen

Just been looking at an interesting bit of code. Here's a suitably benign version of it:

#include <string.h>
#include <stdio.h>

void main()
{
  char string[50];
  string[49]='\\0';
  int i;
  int j=0;
  for (i=0; i<strlen(string); i++)
  {
   if (string[i]=='1') {j=i;}
  }
  printf("%i\\n",j);
}

Compiling this bit of code leads to a loop that looks like:

                        .L900000109:
/\* 0x002c         12 \*/         cmp     %i5,49
/\* 0x0030         10 \*/         add     %i3,1,%i3
/\* 0x0034         12 \*/         move    %icc,%i4,%i2
/\* 0x0038         10 \*/         call    strlen  ! params =  %o0 ! Result =  %o0
/\* 0x003c            \*/         or      %g0,%i1,%o0
/\* 0x0040            \*/         add     %i4,1,%i4
/\* 0x0044            \*/         cmp     %i4,%o0
/\* 0x0048            \*/         bcs,a,pt        %icc,.L900000109
/\* 0x004c         12 \*/         ldsb    [%i3],%i5

The problem being that for each character tested there's also a call to strlen! The reason for this is that the compiler cannot be sure what the call to strlen actually returns. The return value might depend on some external variable that could change as the loop progresses.

There's a lot of functions defined in the libraries that the compiler could optimise, if it was certain that it recognised them. The compiler flag that enables recognition of the "builtin" functions is -xbuiltin (which is included in -fast. This enables the compiler to do things like recognise calls to memcpy or memset and in some instances produce more optimal code. However, it doesn't recognise the call the strlen.

In terms of solving the problem, there are two approaches. The most portable approach is to hold the length of the string in a temporary variable:

  int length=strlen(string);
  for (i=0; i<length; i++)

Another, less portable approach, is to use #pragma no_side_effect. This pragma means the return value of the function depends only on the parameters passed into the function. So the result of calling strlen only depends on the value of the constant string that is passed in. The modified code looks like:

#include <string.h>
#include <stdio.h>
#pragma no_side_effect(strlen)

void main()
{
  char string[50];
  string[49]='\\0';
  int i;
  int j=0;
  for (i=0; i<strlen(string); i++)
  {
   if (string[i]=='1') {j=i;}
  }
  printf("%i\\n",j);
}

And more importantly, the resulting disassembly looks like:

                        .L900000109:
/\* 0x0028          0 \*/         sub     %i1,49,%o7
/\* 0x002c         11 \*/         add     %i3,1,%i3
/\* 0x0030          0 \*/         sra     %o7,0,%o5
/\* 0x0034         13 \*/         movrz   %o5,%i4,%i2
/\* 0x0038         11 \*/         add     %i4,1,%i4
/\* 0x003c            \*/         cmp     %i4,%i5
/\* 0x0040            \*/         bcs,a,pt        %icc,.L900000109
/\* 0x0044         13 \*/         ldsb    [%i3],%i1
About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs