Thursday Apr 17, 2014

What's new in C++11

I always enjoy chatting with Steve Clamage about C++, and I was really pleased to get to interview him about what we should expect from the new 2011 standard.

Wednesday Apr 16, 2014

Lambda expressions in C++11

Lambda expressions are, IMO, one of the interesting features of C++11. At first glance they do seem a bit hard to parse, but once you get used to them you start to appreciate how useful they are. Steve Clamage and I have put together a short paper introducing lambda expressions.

Monday Apr 07, 2014

RAW hazards revisited (again)

I've talked about RAW hazards in the past, and even written articles about them. They are an interesting topic because they are situation where a small tweak to the code can avoid the problem.

In the article on RAW hazards there is some code that demonstrates various types of RAW hazard. One common situation is writing code to copy misaligned data into a buffer. The example code contains a test for this kind of copying, the results from this test, compiled with Solaris Studio 12.3, on my system look like:

Misaligned load v1 (bad) memcpy()
Elapsed = 16.486042 ns
Misaligned load v2 (bad) byte copy
Elapsed = 9.176913 ns
Misaligned load good
Elapsed = 5.243858 ns

However, we've done some work in the compiler on better identification of potential RAW hazards. If I recompile using the 12.4 Beta compiler I get the following results:

Misaligned load v1 (bad) memcpy()
Elapsed = 4.756911 ns
Misaligned load v2 (bad) byte copy
Elapsed = 5.005309 ns
Misaligned load good
Elapsed = 5.597687 ns

All three variants of the code produce the same performance - the RAW hazards have been eliminated!

Friday Apr 04, 2014

Interview with Don Kretsch

As well as filming the "to-camera" about the Beta program, I also got the opportunity to talk with my Senior Director Don Kretsch about the next compiler release.

About the Studio 12.4 Beta Programme

Here's a short video where I talk about the Solaris Studio 12.4 Beta programme.

Thursday Apr 03, 2014

SPARC roadmap

A new SPARC roadmap has been published. We have some very cool stuff coming :)

Monday Mar 31, 2014

Socialising Solaris Studio

I just figured that I'd talk about studio's social media presence.

First off, we have our own forums. One for the compilers and one for the tools. This is a good place to post comments and questions; posting here will get our attention.

We also have a presence on Facebook and Twitter.

Moving to the broader Oracle community, these pages list social media presence for a number of products.

Looking at Oracle blogs, the first stop probably has to be the entertaining The OTN Garage. It's also probably useful to browse the blogs by keywords, for example here's posts tagged with Solaris.

Thursday Mar 27, 2014

Solaris Studio 12.4 documentation

The preview docs for Solaris Studio 12.4 are now available.

Tuesday Mar 25, 2014

Solaris Studio 12.4 Beta now available

The beta programme for Solaris Studio 12.4 has opened. So download the bits and take them for a spin!

There's a whole bunch of features - you can read about them in the what's new document, but I just want to pick a couple of my favourites:

  • C++ 2011 support. If you've not read about it, C++ 2011 is a big change. There's a lot of stuff that has gone into it - improvements to the Standard library, lambda expressions etc. So there is plenty to try out. However, there are some features not supported in the beta, so take a look at the what's new pages for C++
  • Improvements to the Performance Analyzer. If you regularly read my blog, you'll know that this is the tool that I spend most of my time with. The new version has some very cool features; these might not sound much, but they fundamentally change the way you interact with the tool: an overview screen that helps you rapidly figure out what is of interest, improved filtering, mini-views which show more slices of the data on a single screen, etc. All of these massively improve the experience and the ease of using the tool.

There's a couple more things. If you try out features in the beta and you want to suggest improvements, or tell us about problems, please use the forums. There is a forum for the compiler and one for the tools.

Oh, and watch this space. I will be writing up some material on the new features....

Friday Mar 21, 2014

JavaOne award

I was thrilled to get a JavaOne 2013 Rockstar Award for Charlie Hunt's and my talk "Performance tuning where Java meets the hardware".

Getting the award was a surprise and a great honour. It's based on audience feedback, so it's really nice to find out that the audience enjoyed hearing the presentation as much as I enjoyed giving it.

Wednesday Feb 26, 2014

Multicore Application Programming available in Chinese!

This was a complete surprise to me. A box arrived on my doorstep, and inside were copies of Multicore Application Programming in Chinese. They look good, and have a glossy cover rather than the matte cover of the English version.

Article on RAW hazards

Feels like it's been a long while since I wrote up an article for OTN, so I'm pleased that I've finally got around to fixing that.

I've written about RAW hazards in the past. But I recently went through a patch of discovering them in a number of places, so I've written up a full article on them.

What is "nice" about RAW hazards is that once you recognise them for what they are (and that's the tricky bit), they are typically easy to avoid. So if you see 10 seconds of time attributable to RAW hazards in the profile, then you can often get the entire 10 seconds back by tweaking the code.

Monday Dec 30, 2013

Privileges

Before I start, this is not about security, it's probably the antithesis of security. So I'd recommend starting by reading about how using privileges can break the security of your system.

There are three tools that I regularly use that require escalated privileges: dtrace, cpustat, and busstat. You can read up on the way that Solaris manages privileges. But if you know what you want to do, the process to figure out how to get the necessary privileges is reasonable straightforward.

To find out what privileges you have you can use the ppriv -v $$ command. This will report all the privileges for the current shell.

To find out what privileges are stopping you from running a command, you should run it under ppriv -eD command. For example:

ppriv -eD cpustat -c instruction_counts 1 1
cpustat[13222]: missing privilege "sys_resource" (euid = 84945, syscall = 128) needed at rctl_rlimit_set+0x98
cpustat[13222]: missing privilege "cpc_cpu" (euid = 84945, syscall = 5) needed at kcpc_open+0x4
...

It is also possible to list all the privileges on the system using ppriv -l. This is helpful if the privilege is has a name that maps onto what you want to do. The privileges for dtrace are good examples of this:

$ ppriv -l|grep dtrace
dtrace_kernel
dtrace_proc
dtrace_user

You can then use usermod -K ... to assign the necessary privileges to a user. For example:

$ usermod -K defaultpriv=basic,sys_resource,cpc_cpu username

Information about privileges for users is recorded in /etc/user_attr, so it is possible to directly edit that file to add or remove privileges.

Using this approach you can determine that busstat needs sys_config, cpustat needs sys_resource and cpc_cpu, and dtrace needs dtrace_kernel, dtrace_proc, and dtrace_user.

Thursday Sep 12, 2013

UKOUG Conference - Three presentations

Ok, my two hour presentation at UKOUG is now split into two one hour presentations. So my schedule now looks like:

  • Monday 2nd December: Getting the most out of Oracle Solaris Studio
  • Monday 2nd December: Where code meets the processor - performance tuning C/C++ applications
  • Wednesday 4th December: Multicore, Multiprocess, Multithread to be presented on Wednesday 4th December

I'm very pleased that I've got three separate hour long sessions. The material better fits this distribution, plus I really don't think that people could sit comfortably for two hours.

I'll be hanging out at the conference for the entire week, so please do take the time to find me for a chat.

Wednesday Sep 11, 2013

Presenting at UK Oracle User Group meeting

I'm very excited to have been invited to present at the UK Oracle User Group conference in Manchester, UK on 1-4 December.

Currently I'm down for two presentations:

As you might expect, I'm very excited to be over there, I've not visited Manchester in about 20 years!

Thursday Aug 29, 2013

Timezone troubles when dual booting

I have a laptop that dual boots Solaris and Windows XP. When I switched between the two OSes I would have to reset the clock because the time would be eight hours out. This has been naggging at me for a while, so I dug into what was going on.

It seems that Windows assumes that the Real-Time Clock (RTC) in the bios is using local time. So it will read the clock and display whatever time is shown there.

Solaris on the other hand assumes that the clock is in Universal Time Format (UTC), so you have to apply a time zone transformation in order to get to the local time.

Obviously, if you adjust the clock to make one correct, then the other is wrong.

To me, it seems more natural to have the clock in a laptop work on UTC - because when you travel the local time changes. There is a registry setting in Windows that, when set to 1, tells the machine to use UTC:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation\RealTimeIsUniversal

However, it has some problems and is potentially not robust over sleep. So we have to work the other way, and get Solaris to use local time. Fortunately, this is a relatively simple change running the following as root (pick the appropriate timezone for your location):

rtc -z US/Pacific

Monday Jul 08, 2013

JavaOne and Oracle Open World 2013

I'll be up at both JavaOne and Oracle Open World presenting. I have a total of three presentations:

  • Mixed-Language Development: Leveraging Native Code from Java
  • Performance Tuning Where Java Meets the Hardware
  • Developing Efficient Database Applications with C and C++

I'm excited by these opportunities - particularly working with Charlie Hunt diving into Java Performance.

Thursday May 30, 2013

Binding to the current processor

Just hacked up a snippet of code to stop a thread migrating to a different CPU while it runs. This should help the thread get, and keep, local memory. This in turn should reduce run-to-run variance.

#include <sys/processor.h>

void bindnow()
{
  processorid_t proc = getcpuid();
  if (processor_bind(P_LWPID, P_MYID, proc, 0)) 
    { printf("Warning: Binding failed\n"); } 
  else
    { printf("Bound to CPU %i\n", proc); }
}

Thursday Nov 01, 2012

Rick Hetherington on the T5

There's an interview with Rick Hetherington about the new T5 processor. Well worth a quick read.

Thursday Oct 18, 2012

Maximising T4 performance

My presentation from Oracle Open World is available for download.

Tuesday Oct 09, 2012

25 years of SPARC

Looks like an interesting event at the Computer History Museum on 1 November. A panel discussing SPARC at 25: Past, Present and Future. Free sign up.

Wednesday Aug 29, 2012

CON6714 - Mixed-Language Development: Leveraging Native Code from Java

Here's the abstract from my JavaOne talk:

There are some situations in which it is necessary to call native code (C/C++ compiled code) from Java applications. This session describes how to do this efficiently and how to performance-tune the resulting applications.

The objectives for the session are:

  • Explain reasons for using native code in Java applications
  • Describe pitfalls of calling native code from Java
  • Discuss performance-tuning of Java apps that use native code

I'll cover how to call native code from Java, debugging native code, and then I'll dig into performance tuning the code. The talk is not going too deep on performance tuning - focusing on the JNI specific topics; I'll do a bit more about performance tuning in my OpenWorld talk later in the day.

Tuesday May 22, 2012

Square roots

If you are spending significant time calling sqrt() then to improve this you should compile with -xlibmil. Here's some example code that calls both fabs() and sqrt():

#include <math.h>
#include <stdio.h>

int main()
{
  double d=23.3;
  printf("%f\n",fabs(d));
  printf("%f\n",sqrt(d));
}

If we compile this with Studio 12.2 we will see calls to both fabs() and fsqrt():

$ cc -S -O  m.c bash-3.2$ grep call m.s
$ grep call m.s|grep -v printf
/* 0x0018            */         call    fabs    ! params =  %o0 %o1     ! Result =  %f0 %f1
/* 0x0044            */         call    sqrt    ! params =  %o0 %o1     ! Result =  %f0 %f1

If we add -xlibmil then these calls get replaced by equivalent instructions:

$ cc -S -O -xlibmil  m.c
$ grep abs m.s|grep -v print; grep sqrt m.s|grep -v print
/* 0x0018          7 */         fabsd   %f4,%f0
/* 0x0038            */         fsqrtd  %f6,%f2

The default for Studio 12.3 is to inline fabs(), but you still need to add -xlibmil for the compiler to inline fsqrt(), so it is a good idea to include the flag.

You can see the functions that are replaced by inline versions by grepping the inline template file (libm.il) for the word "inline":

$ grep inline libm.il
        .inline sqrtf,1
        .inline sqrt,2
        .inline ceil,2
        .inline ceilf,1
        .inline floor,2
        .inline floorf,1
        .inline rint,2
        .inline rintf,1
        .inline min_subnormal,0
        .inline min_subnormalf,0
        .inline max_subnormal,0
...

The caveat with -xlibmil is documented:

          However, these substitutions can cause the setting of
          errno to become unreliable. If your program depends on
          the value of errno, avoid this option. See the NOTES
          section at the end of this man page for more informa-
          tion.

An optimisation in the inline versions of these functions is that they do not set errno. Which can be a problem for some codes, but most codes don't read errno.

Monday Apr 23, 2012

sincos()

If you are computing both the sine and cosine of an angle, then you will be twice as quick if you call sincos() than if you call cos() and sin() independently:

#include 

int main()
{
  double a,b,c;
  a=1.0;
  for (int i=0;i<100000000;i++) { b=sin(a); c=cos(a); }
}

$ cc -O sc.c -lm
$ timex ./a.out
real          19.13

vs

#include 

int main()
{
  double a,b,c;
  a=1.0;
  for (int i=0;i<100000000;i++) { sincos(a,&b,&c); }
}
$ cc -O sc.c -lm
$ timex ./a.out
real           9.80

Monday Apr 02, 2012

Pragmas and exceptions

The compiler pragmas:

  #pragma no_side_effect(routinename)
  #pragma does_not_write_global_data(routinename)
  #pragma does_not_read_global_data(routinename)

are used to tell the compiler more about the routine being called, and enable it to do a better job of optimising around the routine. If a routine does not read global data, then global data does not need to be stored to memory before the call to the routine. If the routine does not write global data, then global data does not need to be reloaded after the call. The no side effect directive indicates that the routine does no I/O, does not read or write global data, and the result only depends on the input.

However, these pragmas should not be used on routines that throw exceptions. The following example indicates the problem:

#include <iostream>

extern "C"
{
  int exceptional(int);
  #pragma no_side_effect(exceptional)
}

int exceptional(int a)
{
  if (a==7)
  {
    throw 7;
  }
  else
  {
   return a+1;
  } 
}


int a;
int c=0;

class myclass
{
  public:
  int routine();
};

int myclass::routine()
{
  for(a=0; a<1000; a++)
  {
    c=exceptional(c);
  }
 return 0;
}

int main()
{
  myclass f;
  try
  {
    f.routine();
  }
  catch(...)
  {
    std::cout << "Something happened" << a << c << std::endl;
  }
  
}

The routine "exceptional" is declared as having no side effects, however it can throw an exception. The no side effects directive enables the compiler to avoid storing global data back to memory, and retrieving it after the function call, so the loop containing the call to exceptional is quite tight:

$ CC -O -S test.cpp
...
                        .L77000061:
/* 0x0014         38 */         call    exceptional     ! params =  %o0 ! Result =  %o0
/* 0x0018         36 */         add     %i1,1,%i1
/* 0x001c            */         cmp     %i1,999
/* 0x0020            */         ble,pt  %icc,.L77000061
/* 0x0024            */         nop

However, when the program is run the result is incorrect:

$ CC -O t.cpp
$ ./a.out
Something happend00

If the code had worked correctly, the output would have been "Something happened77" - the exception occurs on the seventh iteration. Yet, the current code produces a message that uses the original values for the variables 'a' and 'c'.

The problem is that the exception handler reads global data, and due to the no side effects directive the compiler has not updated the global data before the function call. So these pragmas should not be used on routines that have the potential to throw exceptions.

Friday Mar 30, 2012

Inline template efficiency

I like inline templates, and use them quite extensively. Whenever I write code with them I'm always careful to check the disassembly to see that the resulting output is efficient. Here's a potential cause of inefficiency.

Suppose we want to use the mis-named Leading Zero Detect (LZD) instruction on T4 (this instruction does a count of the number of leading zero bits in an integer register - so it should really be called leading zero count). So we put together an inline template called lzd.il looking like:

.inline lzd
  lzd %o0,%o0
.end

And we throw together some code that uses it:

int lzd(int);

int a;
int c=0;

int main()
{
  for(a=0; a<1000; a++)
  {
    c=lzd(c);
  }
  return 0;
}

We compile the code with some amount of optimisation, and look at the resulting code:

$ cc -O -xtarget=T4 -S lzd.c lzd.il
$ more lzd.s
                        .L77000018:
/* 0x001c         11 */         lzd     %o0,%o0
/* 0x0020          9 */         ld      [%i1],%i3
/* 0x0024         11 */         st      %o0,[%i2]
/* 0x0028          9 */         add     %i3,1,%i0
/* 0x002c            */         cmp     %i0,999
/* 0x0030            */         ble,pt  %icc,.L77000018
/* 0x0034            */         st      %i0,[%i1]

What is surprising is that we're seeing a number of loads and stores in the code. Everything could be held in registers, so why is this happening?

The problem is that the code is only inlined at the code generation stage - when the actual instructions are generated. Earlier compiler phases see a function call. The called functions can do all kinds of nastiness to global variables (like 'a' in this code) so we need to load them from memory after the function call, and store them to memory before the function call.

Fortunately we can use a #pragma directive to tell the compiler that the routine lzd() has no side effects - meaning that it does not read or write to memory. The directive to do that is #pragma no_side_effect(<routine name>), and it needs to be placed after the declaration of the function. The new code looks like:

int lzd(int);
#pragma no_side_effect(lzd)

int a;
int c=0;

int main()
{
  for(a=0; a<1000; a++)
  {
    c=lzd(c);
  }
  return 0;
}

Now the loop looks much neater:

/* 0x0014         10 */         add     %i1,1,%i1

!   11                !  {
!   12                !    c=lzd(c);

/* 0x0018         12 */         lzd     %o0,%o0
/* 0x001c         10 */         cmp     %i1,999
/* 0x0020            */         ble,pt  %icc,.L77000018
/* 0x0024            */         nop

Friday Mar 23, 2012

Malloc performance

My co-conspirator Rick Weisner has written up a nice summary of malloc performance, including evaluating the mtmalloc changes that we got into S10 U10.

Webinar for developers - 27th March

There's a webinar for developers scheduled for next week. Looks an interesting agenda.

Thursday Mar 22, 2012

Tech day at the Santa Clara campus

The next OTN Sys Admin day is at the Santa Clara campus on 10th April. It has a half hour talk on Studio. More information at the OTN Garage.

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs