Friday Jun 20, 2014

What's happening

Been isolating a behaviour difference, used a couple of techniques to get traces of process activity. First off tracing bash scripts by explicitly starting them with bash -x. For example here's some tracing of xzless:

$ bash -x xzless
+ xz='xz --format=auto'
+ version='xzless (XZ Utils) 5.0.1'
+ usage='Usage: xzless [OPTION]... [FILE]...
...

Another favourite tool is truss, which does all kinds of amazing tracing. In this instance all I needed to do was to see what other commands were started using -f to follow forked processes and -t execve to show calls to execve:

$ truss -f -t execve jcontrol
29211:  execve("/usr/bin/bash", 0xFFBFFAB4, 0xFFBFFAC0)  argc = 2
...

Friday Jun 13, 2014

Enabling large file support

For 32-bit apps the "default" maximum file size is 2GB. This is because the interfaces use the long datatype which is a signed int for 32-bit apps, and a signed long long for 64-bit apps. For many apps this is insufficient. Solaris already has huge numbers of large file aware commands, these are listed under man largefile.

For a developer wanting to support larger files, the obvious solution is to port to 64-bit, however there is also a way to remain with 32-bit apps. This is to compile with large file support.

Large file support provides a new set of interfaces that take 64-bit integers, enabling support of files greater than 2GB in size. In a number of cases these interfaces replace the existing ones, so you don't need to change the source. However, there are some interfaces where the long type is part of the ABI; in these cases there is a new interface to use.

The way to find out what flags to use is through the command getconf LFS_CFLAGS. The getconf command returns environment settings, and in this case we're asking it to provide the C flags needed to compile with large file support. It's useful to take a look at the other information that getconf can provide.

The documentation for compiling with large file support talks about both the flags that are needed, and what functions need to be changed. There are two functions that do not map directly onto large file equivalents because they have a long data type in their prototypes. These two functions are fseek and ftell; calls to these two functions need to be replaced by calls to fseeko and ftello

Wednesday Jun 11, 2014

Article in Oracle Scene magazine

Oracle Scene is the quarterly for the UK Oracle User Group. For the current issue, I've contributed an article on developing with Solaris Studio.

Wednesday Jun 04, 2014

Pretty printing using indent

If you need to pretty-print some code, then the compiler comes with indent for exactly this purpose!

Tuesday May 06, 2014

Introduction to OpenMP

Recently, I had the opportunity to talk with Nawal Copty, our OpenMP representative, about writing parallel applications using OpenMP, and about the new features in the OpenMP 4.0 specification. The video is available on youtube.

The set of recent videos can be accessed either through the Oracle Learning Library, or as a playlist on youtube.

Tuesday Apr 29, 2014

Unsigned integers considered annoying

Let's talk about unsigned integers. These can be tricky because they wrap-around from big to small. Signed integers wrap-around from positive to negative. Let's look at an example. Suppose I want to do something for all iterations of a loop except for the last OFFSET of them. I could write something like:

  if (i < length - OFFSET) {}

If I assume OFFSET is 8 then for length 10, I'll do something for the first 2 iterations. The problem occurs when the length is less than OFFSET. If length is 2, then I'd expect not to do anything for any of the iterations. For a signed integer 2 minus 8 is -6 which is less than i, so I don't do anything. For an unsigned integer 2 minus 8 is 0xFFFFFFFA which is still greater than i. Hence we'll continue to do whatever it is we shouldn't be doing in this instance.

So the obvious fix for this is that for unsigned integers we do:

  if (i + OFFSET < length) {}

This works over the range that we might expect it to work. Of course we have a problem with signed integers if length happens to be close to INT_MAX, at this point adding OFFSET to a large value of i may cause it to overflow and become a large negative number - which will continue to be less than length.

With unsigned ints we encounter this same problem at UINT_MAX where adding OFFSET to i could generate a small value, which is less than the boundary.

So in these cases we might want to write:

  if (i < length - OFFSET) {}

Oh....

So basically to cover all the situations we might want to write something like:

  if ( (length > OFFSET) && (i < length - OFFSET) ) {}

If this looks rather complex, then it's important to realise that we're handling a range check - and a range has upper and lower bounds. For signed integers zero - OFFSET is representable, so we can write:

  if (i < length - OFFSET) {}

without worrying about wrap-around. However for unsigned integers we need to define both the left and right ends of the range. Hence the more complex expression.

Thursday Apr 17, 2014

What's new in C++11

I always enjoy chatting with Steve Clamage about C++, and I was really pleased to get to interview him about what we should expect from the new 2011 standard.

Wednesday Apr 16, 2014

Lambda expressions in C++11

Lambda expressions are, IMO, one of the interesting features of C++11. At first glance they do seem a bit hard to parse, but once you get used to them you start to appreciate how useful they are. Steve Clamage and I have put together a short paper introducing lambda expressions.

Friday Apr 11, 2014

New set and map containers in the C++11 Standard Library

We've just published a short article on the std::unordered_map, std::unordered_set, std::multimap, and std::multiset containers in the C++ Standard Library.

Monday Apr 07, 2014

RAW hazards revisited (again)

I've talked about RAW hazards in the past, and even written articles about them. They are an interesting topic because they are situation where a small tweak to the code can avoid the problem.

In the article on RAW hazards there is some code that demonstrates various types of RAW hazard. One common situation is writing code to copy misaligned data into a buffer. The example code contains a test for this kind of copying, the results from this test, compiled with Solaris Studio 12.3, on my system look like:

Misaligned load v1 (bad) memcpy()
Elapsed = 16.486042 ns
Misaligned load v2 (bad) byte copy
Elapsed = 9.176913 ns
Misaligned load good
Elapsed = 5.243858 ns

However, we've done some work in the compiler on better identification of potential RAW hazards. If I recompile using the 12.4 Beta compiler I get the following results:

Misaligned load v1 (bad) memcpy()
Elapsed = 4.756911 ns
Misaligned load v2 (bad) byte copy
Elapsed = 5.005309 ns
Misaligned load good
Elapsed = 5.597687 ns

All three variants of the code produce the same performance - the RAW hazards have been eliminated!

Friday Apr 04, 2014

Interview with Don Kretsch

As well as filming the "to-camera" about the Beta program, I also got the opportunity to talk with my Senior Director Don Kretsch about the next compiler release.

About the Studio 12.4 Beta Programme

Here's a short video where I talk about the Solaris Studio 12.4 Beta programme.

Thursday Apr 03, 2014

SPARC roadmap

A new SPARC roadmap has been published. We have some very cool stuff coming :)

Monday Mar 31, 2014

Socialising Solaris Studio

I just figured that I'd talk about studio's social media presence.

First off, we have our own forums. One for the compilers and one for the tools. This is a good place to post comments and questions; posting here will get our attention.

We also have a presence on Facebook and Twitter.

Moving to the broader Oracle community, these pages list social media presence for a number of products.

Looking at Oracle blogs, the first stop probably has to be the entertaining The OTN Garage. It's also probably useful to browse the blogs by keywords, for example here's posts tagged with Solaris.

Thursday Mar 27, 2014

Solaris Studio 12.4 documentation

The preview docs for Solaris Studio 12.4 are now available.

Tuesday Mar 25, 2014

Solaris Studio 12.4 Beta now available

The beta programme for Solaris Studio 12.4 has opened. So download the bits and take them for a spin!

There's a whole bunch of features - you can read about them in the what's new document, but I just want to pick a couple of my favourites:

  • C++ 2011 support. If you've not read about it, C++ 2011 is a big change. There's a lot of stuff that has gone into it - improvements to the Standard library, lambda expressions etc. So there is plenty to try out. However, there are some features not supported in the beta, so take a look at the what's new pages for C++
  • Improvements to the Performance Analyzer. If you regularly read my blog, you'll know that this is the tool that I spend most of my time with. The new version has some very cool features; these might not sound much, but they fundamentally change the way you interact with the tool: an overview screen that helps you rapidly figure out what is of interest, improved filtering, mini-views which show more slices of the data on a single screen, etc. All of these massively improve the experience and the ease of using the tool.

There's a couple more things. If you try out features in the beta and you want to suggest improvements, or tell us about problems, please use the forums. There is a forum for the compiler and one for the tools.

Oh, and watch this space. I will be writing up some material on the new features....

Friday Mar 21, 2014

JavaOne award

I was thrilled to get a JavaOne 2013 Rockstar Award for Charlie Hunt's and my talk "Performance tuning where Java meets the hardware".

Getting the award was a surprise and a great honour. It's based on audience feedback, so it's really nice to find out that the audience enjoyed hearing the presentation as much as I enjoyed giving it.

Wednesday Feb 26, 2014

Multicore Application Programming available in Chinese!

This was a complete surprise to me. A box arrived on my doorstep, and inside were copies of Multicore Application Programming in Chinese. They look good, and have a glossy cover rather than the matte cover of the English version.

Article on RAW hazards

Feels like it's been a long while since I wrote up an article for OTN, so I'm pleased that I've finally got around to fixing that.

I've written about RAW hazards in the past. But I recently went through a patch of discovering them in a number of places, so I've written up a full article on them.

What is "nice" about RAW hazards is that once you recognise them for what they are (and that's the tricky bit), they are typically easy to avoid. So if you see 10 seconds of time attributable to RAW hazards in the profile, then you can often get the entire 10 seconds back by tweaking the code.

Monday Dec 30, 2013

Privileges

Before I start, this is not about security, it's probably the antithesis of security. So I'd recommend starting by reading about how using privileges can break the security of your system.

There are three tools that I regularly use that require escalated privileges: dtrace, cpustat, and busstat. You can read up on the way that Solaris manages privileges. But if you know what you want to do, the process to figure out how to get the necessary privileges is reasonable straightforward.

To find out what privileges you have you can use the ppriv -v $$ command. This will report all the privileges for the current shell.

To find out what privileges are stopping you from running a command, you should run it under ppriv -eD command. For example:

ppriv -eD cpustat -c instruction_counts 1 1
cpustat[13222]: missing privilege "sys_resource" (euid = 84945, syscall = 128) needed at rctl_rlimit_set+0x98
cpustat[13222]: missing privilege "cpc_cpu" (euid = 84945, syscall = 5) needed at kcpc_open+0x4
...

It is also possible to list all the privileges on the system using ppriv -l. This is helpful if the privilege is has a name that maps onto what you want to do. The privileges for dtrace are good examples of this:

$ ppriv -l|grep dtrace
dtrace_kernel
dtrace_proc
dtrace_user

You can then use usermod -K ... to assign the necessary privileges to a user. For example:

$ usermod -K defaultpriv=basic,sys_resource,cpc_cpu username

Information about privileges for users is recorded in /etc/user_attr, so it is possible to directly edit that file to add or remove privileges.

Using this approach you can determine that busstat needs sys_config, cpustat needs sys_resource and cpc_cpu, and dtrace needs dtrace_kernel, dtrace_proc, and dtrace_user.

Thursday Sep 12, 2013

UKOUG Conference - Three presentations

Ok, my two hour presentation at UKOUG is now split into two one hour presentations. So my schedule now looks like:

  • Monday 2nd December: Getting the most out of Oracle Solaris Studio
  • Monday 2nd December: Where code meets the processor - performance tuning C/C++ applications
  • Wednesday 4th December: Multicore, Multiprocess, Multithread to be presented on Wednesday 4th December

I'm very pleased that I've got three separate hour long sessions. The material better fits this distribution, plus I really don't think that people could sit comfortably for two hours.

I'll be hanging out at the conference for the entire week, so please do take the time to find me for a chat.

Wednesday Sep 11, 2013

Presenting at UK Oracle User Group meeting

I'm very excited to have been invited to present at the UK Oracle User Group conference in Manchester, UK on 1-4 December.

Currently I'm down for two presentations:

As you might expect, I'm very excited to be over there, I've not visited Manchester in about 20 years!

Thursday Aug 29, 2013

Timezone troubles when dual booting

I have a laptop that dual boots Solaris and Windows XP. When I switched between the two OSes I would have to reset the clock because the time would be eight hours out. This has been naggging at me for a while, so I dug into what was going on.

It seems that Windows assumes that the Real-Time Clock (RTC) in the bios is using local time. So it will read the clock and display whatever time is shown there.

Solaris on the other hand assumes that the clock is in Universal Time Format (UTC), so you have to apply a time zone transformation in order to get to the local time.

Obviously, if you adjust the clock to make one correct, then the other is wrong.

To me, it seems more natural to have the clock in a laptop work on UTC - because when you travel the local time changes. There is a registry setting in Windows that, when set to 1, tells the machine to use UTC:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation\RealTimeIsUniversal

However, it has some problems and is potentially not robust over sleep. So we have to work the other way, and get Solaris to use local time. Fortunately, this is a relatively simple change running the following as root (pick the appropriate timezone for your location):

rtc -z US/Pacific

Monday Jul 08, 2013

JavaOne and Oracle Open World 2013

I'll be up at both JavaOne and Oracle Open World presenting. I have a total of three presentations:

  • Mixed-Language Development: Leveraging Native Code from Java
  • Performance Tuning Where Java Meets the Hardware
  • Developing Efficient Database Applications with C and C++

I'm excited by these opportunities - particularly working with Charlie Hunt diving into Java Performance.

Thursday May 30, 2013

Binding to the current processor

Just hacked up a snippet of code to stop a thread migrating to a different CPU while it runs. This should help the thread get, and keep, local memory. This in turn should reduce run-to-run variance.

#include <sys/processor.h>

void bindnow()
{
  processorid_t proc = getcpuid();
  if (processor_bind(P_LWPID, P_MYID, proc, 0)) 
    { printf("Warning: Binding failed\n"); } 
  else
    { printf("Bound to CPU %i\n", proc); }
}

Thursday Nov 01, 2012

Rick Hetherington on the T5

There's an interview with Rick Hetherington about the new T5 processor. Well worth a quick read.

Thursday Oct 18, 2012

Maximising T4 performance

My presentation from Oracle Open World is available for download.

Tuesday Oct 09, 2012

25 years of SPARC

Looks like an interesting event at the Computer History Museum on 1 November. A panel discussing SPARC at 25: Past, Present and Future. Free sign up.

Wednesday Aug 29, 2012

CON6714 - Mixed-Language Development: Leveraging Native Code from Java

Here's the abstract from my JavaOne talk:

There are some situations in which it is necessary to call native code (C/C++ compiled code) from Java applications. This session describes how to do this efficiently and how to performance-tune the resulting applications.

The objectives for the session are:

  • Explain reasons for using native code in Java applications
  • Describe pitfalls of calling native code from Java
  • Discuss performance-tuning of Java apps that use native code

I'll cover how to call native code from Java, debugging native code, and then I'll dig into performance tuning the code. The talk is not going too deep on performance tuning - focusing on the JNI specific topics; I'll do a bit more about performance tuning in my OpenWorld talk later in the day.

Tuesday May 22, 2012

Square roots

If you are spending significant time calling sqrt() then to improve this you should compile with -xlibmil. Here's some example code that calls both fabs() and sqrt():

#include <math.h>
#include <stdio.h>

int main()
{
  double d=23.3;
  printf("%f\n",fabs(d));
  printf("%f\n",sqrt(d));
}

If we compile this with Studio 12.2 we will see calls to both fabs() and fsqrt():

$ cc -S -O  m.c bash-3.2$ grep call m.s
$ grep call m.s|grep -v printf
/* 0x0018            */         call    fabs    ! params =  %o0 %o1     ! Result =  %f0 %f1
/* 0x0044            */         call    sqrt    ! params =  %o0 %o1     ! Result =  %f0 %f1

If we add -xlibmil then these calls get replaced by equivalent instructions:

$ cc -S -O -xlibmil  m.c
$ grep abs m.s|grep -v print; grep sqrt m.s|grep -v print
/* 0x0018          7 */         fabsd   %f4,%f0
/* 0x0038            */         fsqrtd  %f6,%f2

The default for Studio 12.3 is to inline fabs(), but you still need to add -xlibmil for the compiler to inline fsqrt(), so it is a good idea to include the flag.

You can see the functions that are replaced by inline versions by grepping the inline template file (libm.il) for the word "inline":

$ grep inline libm.il
        .inline sqrtf,1
        .inline sqrt,2
        .inline ceil,2
        .inline ceilf,1
        .inline floor,2
        .inline floorf,1
        .inline rint,2
        .inline rintf,1
        .inline min_subnormal,0
        .inline min_subnormalf,0
        .inline max_subnormal,0
...

The caveat with -xlibmil is documented:

          However, these substitutions can cause the setting of
          errno to become unreliable. If your program depends on
          the value of errno, avoid this option. See the NOTES
          section at the end of this man page for more informa-
          tion.

An optimisation in the inline versions of these functions is that they do not set errno. Which can be a problem for some codes, but most codes don't read errno.

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge
Free Download

Search

Categories
Archives
« March 2015
SunMonTueWedThuFriSat
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
22
23
24
26
27
28
29
30
31
    
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs