Friday Apr 04, 2014

About the Studio 12.4 Beta Programme

Here's a short video where I talk about the Solaris Studio 12.4 Beta programme.

Tuesday Aug 27, 2013

My Oracle Open World and JavaOne schedule

I've got my schedule for Oracle Open World and JavaOne:

Note that on Thursday I have about 30 minutes between my two talks, so expect me to rush out of the database talk in order to get to the Java talk.

Monday Apr 01, 2013

OpenMP and language level parallelisation

The C11 and C++11 standards introduced some very useful features into the language. In particular they provided language-level access to threading and synchronisation primitives. So using the new standards we can write multithreaded code that compiles and runs on standard compliant platforms. I've tackled translating Windows and POSIX threads before, but not having to use a shim is fantastic news.

There's some ideas afoot to do something similar for higher level parallelism. I have a proposal for consideration at the April meetings - leveraging the existing OpenMP infrastructure.

Pretty much all compilers use OpenMP, a large chunk of shared memory parallel programs are written using OpenMP. So, to me, it seems a good idea to leverage the existing OpenMP library code, and existing developer knowledge. The paper is not arguing that we need take the OpenMP syntax - that is something that can be altered to fit the requirements of the language.

What do you think?

Wednesday Dec 05, 2012

Library order is important

I've written quite extensively about link ordering issues, but I've not discussed the interaction between archive libraries and shared libraries. So let's take a simple program that calls a maths library function:

#include <math.h>

int main()
  for (int i=0; i<10000000; i++)

We compile and run it to get the following performance:

bash-3.2$ cc -g -O fp.c -lm
bash-3.2$ timex ./a.out

real           6.06
user           6.04
sys            0.01

Now most people will have heard of the optimised maths library which is added by the flag -xlibmopt. This contains optimised versions of key mathematical functions, in this instance, using the library doubles performance:

bash-3.2$ cc -g -O -xlibmopt fp.c -lm
bash-3.2$ timex ./a.out

real           2.70
user           2.69
sys            0.00

The optimised maths library is provided as an archive library (libmopt.a), and the driver adds it to the link line just before the maths library - this causes the linker to pick the definitions provided by the static library in preference to those provided by libm. We can see the processing by asking the compiler to print out the link line:

bash-3.2$ cc -### -g -O -xlibmopt fp.c -lm
/usr/ccs/bin/ld ... fp.o -lmopt -lm -o a.out...

The flag to the linker is -lmopt, and this is placed before the -lm flag. So what happens when the -lm flag is in the wrong place on the command line:

bash-3.2$ cc -g -O -xlibmopt -lm fp.c
bash-3.2$ timex ./a.out

real           6.02
user           6.01
sys            0.01

If the -lm flag is before the source file (or object file for that matter), we get the slower performance from the system maths library. Why's that? If we look at the link line we can see the following ordering:

/usr/ccs/bin/ld ... -lmopt -lm fp.o -o a.out 

So the optimised maths library is still placed before the system maths library, but the object file is placed afterwards. This would be ok if the optimised maths library were a shared library, but it is not - instead it's an archive library, and archive library processing is different - as described in the linker and library guide:

"The link-editor searches an archive only to resolve undefined or tentative external references that have previously been encountered."

An archive library can only be used resolve symbols that are outstanding at that point in the link processing. When fp.o is placed before the libmopt.a archive library, then the linker has an unresolved symbol defined in fp.o, and it will search the archive library to resolve that symbol. If the archive library is placed before fp.o then there are no unresolved symbols at that point, and so the linker doesn't need to use the archive library. This is why libmopt needs to be placed after the object files on the link line.

On the other hand if the linker has observed any shared libraries, then at any point these are checked for any unresolved symbols. The consequence of this is that once the linker "sees" libm it will resolve any symbols it can to that library, and it will not check the archive library to resolve them. This is why libmopt needs to be placed before libm on the link line.

This leads to the following order for placing files on the link line:

  • Object files
  • Archive libraries
  • Shared libraries

If you use this order, then things will consistently get resolved to the archive libraries rather than to the shared libaries.

Thursday Oct 18, 2012

Mixing Java and native code

This was a bit of surprise to me. The slides are available from my presentation at JavaOne on mixed language development. What I wasn't expecting was that there would also be a video of the presentation.

Maximising T4 performance

My presentation from Oracle Open World is available for download.

Wednesday Aug 29, 2012

CON6714 - Mixed-Language Development: Leveraging Native Code from Java

Here's the abstract from my JavaOne talk:

There are some situations in which it is necessary to call native code (C/C++ compiled code) from Java applications. This session describes how to do this efficiently and how to performance-tune the resulting applications.

The objectives for the session are:

  • Explain reasons for using native code in Java applications
  • Describe pitfalls of calling native code from Java
  • Discuss performance-tuning of Java apps that use native code

I'll cover how to call native code from Java, debugging native code, and then I'll dig into performance tuning the code. The talk is not going too deep on performance tuning - focusing on the JNI specific topics; I'll do a bit more about performance tuning in my OpenWorld talk later in the day.

Friday Mar 30, 2012

Inline template efficiency

I like inline templates, and use them quite extensively. Whenever I write code with them I'm always careful to check the disassembly to see that the resulting output is efficient. Here's a potential cause of inefficiency.

Suppose we want to use the mis-named Leading Zero Detect (LZD) instruction on T4 (this instruction does a count of the number of leading zero bits in an integer register - so it should really be called leading zero count). So we put together an inline template called looking like:

.inline lzd
  lzd %o0,%o0

And we throw together some code that uses it:

int lzd(int);

int a;
int c=0;

int main()
  for(a=0; a<1000; a++)
  return 0;

We compile the code with some amount of optimisation, and look at the resulting code:

$ cc -O -xtarget=T4 -S lzd.c
$ more lzd.s
/* 0x001c         11 */         lzd     %o0,%o0
/* 0x0020          9 */         ld      [%i1],%i3
/* 0x0024         11 */         st      %o0,[%i2]
/* 0x0028          9 */         add     %i3,1,%i0
/* 0x002c            */         cmp     %i0,999
/* 0x0030            */         ble,pt  %icc,.L77000018
/* 0x0034            */         st      %i0,[%i1]

What is surprising is that we're seeing a number of loads and stores in the code. Everything could be held in registers, so why is this happening?

The problem is that the code is only inlined at the code generation stage - when the actual instructions are generated. Earlier compiler phases see a function call. The called functions can do all kinds of nastiness to global variables (like 'a' in this code) so we need to load them from memory after the function call, and store them to memory before the function call.

Fortunately we can use a #pragma directive to tell the compiler that the routine lzd() has no side effects - meaning that it does not read or write to memory. The directive to do that is #pragma no_side_effect(<routine name>), and it needs to be placed after the declaration of the function. The new code looks like:

int lzd(int);
#pragma no_side_effect(lzd)

int a;
int c=0;

int main()
  for(a=0; a<1000; a++)
  return 0;

Now the loop looks much neater:

/* 0x0014         10 */         add     %i1,1,%i1

!   11                !  {
!   12                !    c=lzd(c);

/* 0x0018         12 */         lzd     %o0,%o0
/* 0x001c         10 */         cmp     %i1,999
/* 0x0020            */         ble,pt  %icc,.L77000018
/* 0x0024            */         nop

Tuesday Aug 09, 2011

Standards and headers

Every so often I encounter, or hear about, a problem with function definitions when the standard header files are included. Most often its mmap, but sometimes it's something else. Every time I think that I should write something up. Well, it's finally happened, a short paper on how to write portable code using the standard headers.

Wednesday Jul 13, 2011

Presenting at Oracle Open World

I'll be presenting at Oracle Open World in October. The full searchable catalog is on-line, or you can browse the speakers by name.

Wednesday May 11, 2011

Best practices for linking libraries (part 1)

A while ago I was looking into some application start up problems. The problem turned out to be an issue relating to the order in which the libraries were loaded and initialised. It seemed to me that this was a rather tricky area, and it would be very helpful to document the best practices around it. I thought this would be a quick couple of pages, but it turned out to be a rather high page count, and I ended up working on the document with Steve Clamage (with Rod Evans helping out).

The first part of the document is available. This section covers basic linker good practices. Using -L and -R rather than LD_LIBRARY_PATH, generating relocatable code etc. The key take aways are:

  • Use -L to specify the path to where the libraries can be found at compile time.
  • Use -R to specify the location of the libraries at run time.
  • Use the token $ORIGIN to specify a relative path for the libraries' location. This avoids the need to have a hard-coded location where the libraries can be found.

Friday May 06, 2011

Calling functions

I was looking at some code today and it reminded me of a very common performance issue - reloading data around calls. Suppose I have some code like:

int variable;

void function(int *array)
  for (i=0; i<1000; i++)
     if (variable==1) 

You might be surprised to find that "variable" is reloaded very iteration of the loop. The reason for this is that the loop calls another function - either func1() or func2() and the compiler knows that the function might change the value of "variable" - so to be correct it needs to be reloaded.

This problem can be fixed by caching a local copy of the variable. The compiler "knows" that local (or stack based) variables don't get modified by function calls.

However, the problem is more general than this, in C++ you might observe a reloading of variables that are members of objects - for similar reasons. The general rule for avoiding this is to examine every load or store in the hot region of code to check whether it is necessary, or whether it has been introduced because of a function call.

Thursday Jan 27, 2011

Don't initialise local strings

Consider the following code:

void s(int i)
  char string[2048]="";
  sprinf(string,"Value = %i",i);
  printf("String = %s\\n",string);

The C standards require that if any elements of the character array string are initialised, then all of them should be. We can demonstrate this by compiling with gcc:

$ gcc -O -S f.c
$ more f.s
        .file   "f.c"
        .type   s, #function
        .proc   020
        save    %sp, -2160, %sp
        stx     %g0, [%fp-2064]
        add     %fp, -2056, %o0
        mov     0, %o1
        call    memset, 0
        mov    2040, %o2

You can see that explicitly initialising string caused all elements of string to be initialised with a call to memset(). Removing the explicit initialisation of string (the ="") avoids the call to memset().

Sunday Jun 06, 2010

Solaris Studio Express

The latest Solaris Studio Express release is out, there's also a feedback programme for submitting bugs and posting questions.

One of the first things I did with it was to launch the solstudio IDE. It has the expected functionality. Code completion, and hints on the parameters that are expected by a function:

There's also integrated debugging:

I'll add a couple more posts over the next few days showing some other features.

Monday Feb 15, 2010

Bitten by inlining (again)

So this relatively straight-forward looking code fails to compile without optimisation:

#include <stdio.h>

inline void f1()
  printf("In f1\\n");

inline void f2()
  printf("In f2\\n");

void main()
  printf("In main\\n");

Here's the linker error when compiled without optimisation:

% cc inline.c
Undefined                       first referenced
 symbol                             in file
f2                                  inline.o
ld: fatal: Symbol referencing errors. No output written to a.out

At low optimisation levels the compiler does not inline these functions, but because they are declared as inline functions the compiler does not generate function bodies for them - hence the linker error. To make the compiler generate the function bodies it is necessary to also declare them to be extern (this places them in every compilation unit, but the linker drops the duplicates). This can either be done by declaring them to be extern inline or by adding a second prototype. Both approaches are shown below:

#include <stdio.h>

extern inline void f1()
  printf("In f1\\n");

inline void f2()
  printf("In f2\\n");
extern void f2();

void main()
  printf("In main\\n");

It might be tempting to copy the entire function body into a support file:

#include <stdio.h>

void f1()
  printf("In duplicate f1\\n");

void f2()
  printf("In duplicate f2\\n");

This is a bad idea, as you might gather from the deliberate difference I've made to the source code. Now you get different code depending on whether the compiler chooses to inline the functions or not. You can demonstrate this by compiling with and without optimisation, but this only forces the issue to appear. The compiler is free to choose whether to honour the inline directive or not, so the functions selected for inlining could vary from build to build. Here's a demonstration of the issue:

% cc -O inline.c inline2.c
% ./a.out
In main
In f2
In f1

% cc inline.c inline2.c
% ./a.out
In main
In duplicate f2
In duplicate f1

Douglas Walls goes into plenty of detail on the situation with inlining on his blog.

Wednesday Jun 10, 2009

Code complete: coding style

I read Code Complete a couple of years back. It's an interesting book to read, but there were two parts that annoyed me. I was giving a presentation the other week on "Coding for performance" and I happened to mention the book, and say that I had these two reservations. So I figure I should write them up more formally.

My first issue was, basically, me just been a stuck in the mud. Those of you who regularly read my blog will see that I favour the following style of indenting:

  if (some condition)
     do something;

If you've read Solaris Application programming, you'll see that I actually use quite a few styles. In writing that book, there were particular places where there was limited space on the page and I ended up juggling the style to make it fit the medium. So I have preferences, but I'm pragmatic.

Anyway, CC on page 746 says identifies my preferred style as "unindented begin-end pairs" and says the following "Although this approach looks fine, it violates the Fundamental Theorem of Formatting; it doesn't show the logical structure of the code.".


So I wanted to read up more details on this Fundamental Theorem, perhaps I'm misreading the text, but this is how it appeared to me (pg 739) "A control construct in Visual Basic always has a beginning statement ... and it always has a corresponding End statement." (pg 740) "The controversy about formatting control blocks arises in part from the fact that some languages don't require block structures." (pg 740) "Uncoupling begin and end from the control structure - as languages like C++ and Java do - with { and } - leads to questions about where to put the begin and end. Consequently, many indentation problems are problems only because you have to compensate for poorly designed language structures." [Emphasis mine.] I read this as, basically, you need to make your untidy C/C++/Java code look more like VB. I guess that's why it's taken me a couple of years to calm down sufficiently to post this ;)

Actually, I'm not far from disagreeing. But let's return to this point in a moment. Let's start with the two approaches to indenting that are recommended in the book.

First of all, what is probably the most common style "pure block emulation":

  if (something) {
    do something;

The other recommended style is "begin and end as block boundaries":

  if (something)
    do something;

On page 745, a study by Hansen and Yim (1987) indicates that there's no difference in understandability between these two styles. Excellent - so it doesn't matter! I'm sure that if "unindented begin-end pairs" were also included in the survey then it too would provide indistinguishable understandability.

Anyway, the differences between the recommended "begin and end as block boundaries" and the shunned "unindented begin-end pairs" is basically four spaces, which I don't personally think is a lot.

Heading back to why I might actually agree with some of his comments. It is very easy to introduce a bug in a program where the begin and end braces have been omitted. For example:

  if ( a > max )
    max = a;

  if ( a > max )
    printf("New max = %i\\n",a);
    max = a;

So, whilst I agree that the absence of brackets can be a problem, I don't necessarily think that rigid adherence to a particular style naturally follows as the only solution to that problem.

I do have some rules that I tend to obey:

  • Indenting is a personal/project preference. There are tools out there that can render source code pretty much how you like it. The UI is a view of the source, and it doesn't really matter what the style of the source is. If I find the source hard to read, then I can process it to make it conform to what ever layout works best for me to solve the problem that I'm working on.
  • Always use begin and end brackets. They add a single character and can avoid the problem demonstrated above.
  • I tend to favour clarity over a rigid adherence to particular styles. I'm not above placing an entire statement on a single line when I feel that it is the best way to present the information. Taking the previous example:
    Multi-lineSingle line
      if ( a > max ) {
        max = a;
      if ( a > max ) { max = a; }

Tuesday May 27, 2008

Internationalisation with gettext

One of my projects for this week is interationalizing the tool spot. spot is coded in perl, but it's useful to prototype with a C version. You can find resources on the gnuversion of gettext and the Solaris version. There's also some helpful tutorials. There's also some examples, including a perl example.

Anyway, assume that the application initially looks like:

#include <stdio.h>
int main (int argc, char \*argv[])

The call to gettext will return a translated version of the string "test", but there's a couple of other calls that need to be performed to set things up.

The call setlocale will set the locale (or language, currency etc.) that the application will use. The call takes two parameters, the first is the item that should be translated - we want to translate only the messages, and the second is the language to translate into (if this is blank then the locale is obtained from the system setting).

The call to bindtextdomain takes a name of a domain, and then binds this to a directory where the translations will reside.

The call to textdomain will use the binding for an existing domain to look up the translations.

Probably the best way of seeing this is to look at the modified source:

#include <stdio.h>
#include <locale.h>
#include <libintl.h>

int main (int argc, char \*argv[])
  setlocale(LC_MESSAGES, "fr");
  bindtextdomain("EX", "/export/home/test/locale");
  return 0;

The call to setlocale requests that the code use the "fr" locale. The call to bindtextdomain tells the application to look in the root directory /export/home/test/locale/ for the translations.

The program can now be compiled:

$ cc test.c

However, there's no message catalogue to provide the translations. So the next step is to identify the text which requires translation. The command to do this is xgettext:

$ xgettext test.c

This command generates a file messages.po which contains all the messages that need translation:

$ more messages.po
domain "messages"
# File:m.c, line:10, textdomain("EX");
msgid  "test\\n"

The messages in the file can then be translated:

$ more messages.po
domain "messages"
# File:m.c, line:10, textdomain("EX");
msgid  "test\\n"
msgstr "notest\\n"

Once the message file exists it needs to be converted into a message file that gettext can read, this is performed by the command msgfmt:

$ msgfmt -o /export/home/test/locale/fr/LC_MESSAGES/ messages.po

The msgfmt command has an optional -o parameter to specify the location of the output file. The place where the file needs to reside and the name of the file are determined by the directory passed to bindtextdomain, the locale, and the name given to the domain.

When the program is run the output is:

$ a.out

Which shows that although the text was "test" the output from the program is the localised "notest".

Monday Apr 28, 2008

static and inline functions

Hit a problem when compiling a library. The problem is with mixing static and inline functions, which is not allowed by the standard, but is allowed by gcc. Example code looks like:

char \* c;

static void foo(char \*);

inline void work()

void foo(char\* c)

When this code is compiled it generates the following error:

% cc s.c
"s.c", line 7: reference to static identifier "foo" in extern inline function
cc: acomp failed for s.c

It turns out that there is a workaround for this problem, which is the flag -features=no%extinl. Douglas Walls describes the issue in much more detail.

Thursday Jan 24, 2008

-xalias_level in C and C++

The compiler flag -xalias_level takes various options, and specifies the amount of aliasing that the compiler should assume. For example -xalias_level=any (the default at optimisation levels below -fast) means that the compiler should assume that any pointers may alias (ie point to the same location in memory).

In Sun Studio the available levels have different names for C and C++. adopted the C levels for both C and C++. The following table shows the mapping between the C and C++ names:


Friday Nov 09, 2007

Mixed language development

Sometimes applications contain a mix of languages. For example, part in C and part in Fortran. This can require different libraries to be linked in depending on the support that the language requires. The general rule for mixed language linking is found in the documentation. The rule is link with the C++ compiler if the code contains C++, if not, link with the fortran compiler (obviously if the code is pure C, then link with the C compiler.). An exception to this is making a C library which uses C++ inside, where it is possible to link using the C compiler, specifying appropriate flags.


Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge


« April 2014
The Developer's Edge
Solaris Application Programming
OpenSPARC Book
Multicore Application Programming