New blog location

This will be my last blog entry here. I'm leaving Oracle after 16 amazing years. I've had a fantastic time, and I'm glad to have been able share some of that with you. If you wish to read about my continuing adventures please head along to http://www.darrylgove.com/. All the best, Darryl.

Tuesday, August 11, 2015 | Personal | Read More

Where does misaligned data come from?

A good question about data (mis)alignment is "Where did it come from?". So here's a reasonably detailed answer to that... If the compiler has generated the code for you and you've not done anything "weird" then the data should be correctly aligned. So most apps don't have misaligned data, and most of the time you (as a developer) don't have to worry about it. For example, if you allocate a local variable, or a global variable, then the compiler will correctly align it. If you...

Thursday, May 28, 2015 | Personal | Read More

Misaligned loads profiled (again)

A long time ago I described how misaligned loads appeared in profiles of 32-bit applications. Since then we've changed the level of detail presented in the Performance Analyzer. When I wrote the article the time spent on-cpu that wasn't User time was grouped as System time. We've now started showing more detail - and more detail is good. Here's a similar bit of code: #include <stdio.h>static int i,j;volatile double *d;void main (){ char a[10]; d=(double*)&a[1]; j=100000000; ...

Wednesday, May 27, 2015 | Work | Read More

Misaligned loads in 64-bit apps

A while back I wrote up how to use dtrace to identify misaligned loads in 32-bit apps. Here's a script to do the same for 64-bit apps: #!/usr/sbin/dtrace -spid$1::__do_misaligned_ldst_instr:entry{ @p[ustack()]=count();} Run it as './script '

Wednesday, May 13, 2015 | Personal | Read More

C++ rules enforced in Studio 12.4

Studio 12.4 has improved adherence to the C++ standard, so some codes which were accepted by 12.3 might get reported as errors with the new compiler. The compiler documentation has a list of the improvements and examples of how to modify problem code to make it standard compliant.

Tuesday, May 5, 2015 | Work | Read More

SPARC processor documentation

The documentation for older SPARC processors has been put up on the web!

Tuesday, April 28, 2015 | Personal | Read More

Using the Solaris Studio IDE for remote development

Solaris Studio has an IDE based on NetBeans.org, one of the features of the IDE is its ability to do remote development - ie do your development work on a Windows laptop while doing the builds on a remote Solaris or Linux box. Vladimir has written up a nice how-to guide covering the three models that Studio supports: As shown in the image, the three models are: Fully remote. The source and builds remain on the remote system. Mixed/shared. The source is on a shared network...

Tuesday, March 31, 2015 | Personal | Read More

Community redesign...

Rick has a nice post about the changes to the Oracle community pages. Useful quick read.

Wednesday, March 25, 2015 | Personal | Read More

New Studio C++ blogger

Please welcome another of my colleagues to blogs.oracle.com. Fedor's just posted some details about getting Boost to compile with Studio.

Sunday, March 22, 2015 | Personal | Read More

Building xerces 2.8.0

Some notes on the building xerces 2.8.0 on Solaris. You can find the build instructions on the xerces site. But there's some changes that are needed to make it work with recent Studio compilers. Remove -ptr from Makefile.incl. This is a deprecated option and should be removed from the Makefile. Add -library=stlport4 into Makefile.incl. Need this flag on the compile and link lines in order that stlport4 is used instead of the default libCstd. Most recent C++ codes require a more...

Monday, March 2, 2015 | Personal | Read More

Building old code

Just been looking at a strange link time error: ld.so.1: lddstub: fatal: tr/dev/nul: open failed: No such file or directory I got this compiling C++ code that was expecting one of the old Studio compilers (probably Workshop vintage). The clue to figuring out what was wrong was this warning in the build: CC: Warning: Option -ptr/dev/nul passed to ld, if ld is invoked, ignored otherwise What was happening was that the library was building using the long-since-deprecated -ptr...

Monday, March 2, 2015 | Personal | Read More

Improper member use error

I hit this Studio compiler error message, and it took me a few minutes to work out what was going wrong. So I'm writing it up in case anyone else hits it. Consider this code: typedef struct t{ int t1; int t2;} t_t;struct q{ int q1; int q2;};void main(){ struct t v; // Instantiate one structure v.q1 = 0; // Use member of a different structure} The C compiler produces this error: $ cc odd.c"odd.c", line 16: improper member use: q1cc: acomp failed for odd.c The...

Friday, February 27, 2015 | Personal | Read More

Profiling the kernel

One of the incredibly useful features in Studio is the ability to profile the kernel. The tool to do this is er_kernel. It's based around dtrace, so you either need to run it with escalated privileges, or you need to edit /etc/user_attr to add something like: <username>::::defaultpriv=basic,dtrace_user,dtrace_proc,dtrace_kernel The correct way to modify user_attr is with the command usermod: usermod -K defaultpriv=basic,dtrace_user,dtrace_proc,dtrace_kernel <username> There'...

Tuesday, February 17, 2015 | Personal | Read More

Printing out arguments

Rather unexciting application for printing out the arguments passed into an application: #include <stdio.h>void main(int argc, char** argv){ for (int i=0; i&ltargc; i++) { printf(" %i = \"%s\"\n",i, argv[i]); }}

Tuesday, February 10, 2015 | Personal | Read More

Digging into microstate accounting

Solaris has support for microstate accounting. This gives huge insight into where an application and its threads are spending their time. It breaks down time into the (obvious) user and system, but also allows you to see the time spent waiting on page faults and other useful-to-know states. This level of detail is available through the usage file in /proc/pid, there's a corresponding file for each lwp in /proc/pid/lwp/lwpid/lwpusage. You can find more details about the /proc f...

Thursday, February 5, 2015 | Personal | Read More

Namespaces in C++

A porting problem I hit with regularity is using functions in the standard namespace. Fortunately, it's a relatively easy problem to diagnose and fix. But it is very common, and it's worth discussing how it happens. C++ namespaces are a very useful feature that allows an application to use identical names for symbols in different contexts. Here's an example where we define two namespaces and place identically named functions in the two of them. #include <iostream>namespace ns1{...

Wednesday, February 4, 2015 | Personal | Read More

Bit manipulation: Gathering bits

In the last post on bit manipulation we looked at how we could identify bytes that were greater than a particular target value, and stop when we discovered one. The resulting vector of bytes contained a zero byte for those which did not meet the criteria, and a byte containing 0x80 for those that did. Obviously we could express the result much more efficiently if we assigned a single bit for each result. The following is "lightly" optimised code for producing a bit vector...

Tuesday, February 3, 2015 | Personal | Read More

Bit manipulation: finding a range of values

We previously looked at finding zero values in an array. A similar problem is to find a value larger than some target. The vanilla code for this is pretty simple: #include "timing.h"int range(char * array, unsigned int length, unsigned char target){ for (unsigned int i=0; i<length; i++) { if (array[i]>target) { return i; } } return -1;} It's possible to recode this to use bit operations, but there is a small complication. We need two versions of the routine...

Monday, February 2, 2015 | Personal | Read More

Finding zero values in an array

A common thing to want to do is to find zero values in an array. This is obviously necessary for string length. So we'll start out with a test harness and a simple implementation: #include "timing.h"unsigned int len(char* array){ unsigned int length = 0; while( array[length] ) { length++; } return length;}#define COUNT 100000void main(){ char array[ COUNT ]; for (int i=1; i<COUNT; i++) { array[i-1] = 'a'; array[i] = 0; if ( i != len(array) ) { printf( "Error at...

Friday, January 30, 2015 | Personal | Read More

gedit troubles

Hit a couple of issues with gedit, just documenting them in case others hit the same problems. X11 connection rejected because of wrong authentication. This turned out to be because there was already a copy of gedit running on the system. GConf Error: Failed to contact configuration server; some possible causes are that you need to enable TCP/IP networking for ORBit, or you have stale NFS locks due to a system crash. See http://projects.gnome.org/gconf/ for information....

Thursday, January 29, 2015 | Personal | Read More

Bit manipulation: Population Count

Population count is one of the more esoteric instructions. It's the operation to count the number of set bits in a register. It comes up with sufficient frequency that most processors have a hardware instruction to do it. However, for this example, we're going to look at coding it in software. First of all we'll write a baseline version of the code: int popc(unsigned long long value){ unsigned long long bit = 1; int popc = 0; while ( bit ) { if ( value & bit ) {...

Thursday, January 29, 2015 | Personal | Read More

Tracking application resource use

One question you might ask is "how much memory is my application consuming?". Obviously you can use prstat (prstat -cp <pid> or prstat -cmLp <pid>) to examine the behaviour of a process. But how about programmatically finding that information. OTN has just published an article where I demonstrate how to find out about the resource use of a process, and incidentally how to put that functionality into a library that reports resource use over time.

Wednesday, January 28, 2015 | Personal | Read More

Inline functions in C

Functions declared as inline are slightly more complex than might be expected. Douglas Walls has provided a chapter-and-verse write up. But the issue bears further explanation. When a function is declared as inline it's a hint to the compiler that the function could be inlined. It is not a command to the compiler that the function must be inlined. If the compiler chooses not to inline the function, then the function will be left with a function call that needs to be resolved,...

Wednesday, January 28, 2015 | Personal | Read More

Improving performance through bit manipulation: clear last set bit

Bit manipulation is one of those fun areas where you can get a performance gain from recoding a routine to use logical or arithmetic instructions rather than a more straight-forward code. Of course, in doing this you need to avoid the pit fall of premature optimisation - where you needlessly make the code more obscure with no benefit, or a benefit that disappears as soon as you run your code on a different machine. So with that caveat in mind, let's take a look at a simple...

Wednesday, January 28, 2015 | Personal | Read More

Hands on with Solaris Studio

Over at the OTN Garage, Rick has published details of the Solaris Studio hands-on lab. This is a great chance to learn about the Code Analyzer and the Performance Analyzer. Just to quickly recap, the Code Analyzer is really a suite of tools that do checking on application errors. This includes static analysis - ie extensive compile time checking - and dynamic analysis - run time error detection. If you regularly read this blog, you'll need no introduction to the...

Monday, January 26, 2015 | Personal | Read More

Missing semi-colon

Thought I'd highlight this error message: class foo{ foo();}foo::foo() { } $ CC -c c.cpp"c.cpp", line 6: Error: A constructor may not have a return type specification.1 Error(s) detected. The problem is that the class definition is not terminated with a semi-colon. It should be: class foo{ foo();}; // Semi-colonfoo::foo() { }

Tuesday, January 13, 2015 | Personal | Read More

Behaviour of std::list::splice in the 2003 and 2011 C++ standards

There's an interesting corner case in the behaviour of std::list::splice. In the C++98/C++03 standards it is defined such that iterators referring to the spliced element(s) are invalidated. This behaviour changes in the C++11 standard, where iterators remain valid. The text of the 2003 standard (section, p2, p7, p12) describes the splice operation as "destructively" moving elements from one list to another. If one list is spliced into another, then all iterators and...

Wednesday, January 7, 2015 | Personal | Read More

New articles about Solaris Studio

We've started posting new articles directly into the communities section of the Oracle website. If you're not familiar with this location, it's also where you can post questions on languages or tools. With the change it should be easier to find articles relevant to developers, and it should be easy to comment on them. So hopefully this works out well. There's currently three articles listed on the content page. I've already posted about the article on the Performance Analyz...

Tuesday, January 6, 2015 | Personal | Read More

The Performance Analyzer Overview screen

A while back I promised a more complete article about the Performance Analyzer Overview screen introduced in Studio 12.4. Well, here it is! Just to recap, the Overview screen displays a summary of the contents of the experiment. This enables you to pick the appropriate metrics to display, so quickly allows you to find where the time is being spent, and then to use the rest of the tool to drill down into what is taking the time.

Tuesday, January 6, 2015 | Personal | Read More

Checking whether hardware supports crypto instructions

A quick example of how to tell if the machine that you're running on supports crypto instructions. The 2011 SPARC Architecture manual tells you to read the cfr register before using the instruction. The cfr register contains a bit for every implemented crypto instruction. However, the cfr register is not implemented on all processors. So you would need to check whether this register is implemented before reading it.... So there has to be a better way. Fortunately, Solaris...

Thursday, December 11, 2014 | Work | Read More

Writing inline templates

Writing some inline templates today... I've written about doing this kind of stuff in the past here and, in more detail, here. I happen to need to pass a bundle of parameters on to the routine. The best way of checking how the parameters will be passed is to get the compiler to provide some initial template. Here's an example routine: int parameters (int p0, int * p1, int * p2, int* p3, int * p4, int * p5, int * p6, int p7){ return p0 + *p1 + *p2 + *p3 + *p4 + ((*p5)<<2) +...

Wednesday, November 19, 2014 | Work | Read More

Software in Silicon Cloud

I missed this press release about Software in Silicon Cloud. It's the announcement for a service where you can try out a SPARC M7 processor. There's an accompanying website which has the sign up plus some more information about the service. What's particularly exciting is that it talks a bit more about Application Data Integrity (ADI). Larry Ellison called this "the most important piece of engineering we’ve done in a long, long time.". Incorrect handling of pointers is a large...

Thursday, November 13, 2014 | Personal | Read More

Oracle Solaris Studio playlist

There's an extensive list of Solaris Studio videos on youtube. In particular there's a bunch of tutorials covering the features of the IDE. The IDE doesn't often get the attention it deserves. It's based off NetBeans, and is full of useful code refactoring tools, navigation tools, etc. To find out more, take a look at some of the videos.

Wednesday, November 12, 2014 | Work | Read More

New Performance Analyzer Overview screen

I love using the Performance Analyzer, but the question I often get when I show it to people, is "Where do I start?". So one of the improvements in Solaris Studio 12.4 is an Overview screen to help people get started with the tool. Here's what it looks like: The reason this is important, is that many applications spend time in various place - like waiting on disk, or in user locks - and it's not always obvious where is going to be the most effective place to look for...

Wednesday, November 12, 2014 | Personal | Read More

Performance made easy

The big news of the day is that Oracle Solaris Studio 12.4 is available for download. I'd like to thank all those people who tried out the beta releases and gave us feedback. There's a number of things that are new in this release. The most obvious one is C++11 support, I've written a bit about the lambda expression support, tuples, and unordered containers. My favourite tool, the Performance Analyzer, has also had a bit of a facelift. I'll talk about the Overview screen in...

Tuesday, November 11, 2014 | Work | Read More

SPARC Software in Silicon

Short video by Juan Loaiza about the Software in Silicon work in the upcoming SPARC processor.

Tuesday, November 4, 2014 | Personal | Read More

OpenWorld and JavaOne slides available for download

Thanks everyone who attended my talks last week. My slides for OpenWorld and JavaOne are available for download: Best Practices for Optimizing Oracle Software for Oracle Hardware Java Performance: Hardware, Structures, and Algorithms

Friday, October 10, 2014 | Work | Read More

SPARC Processor Documentation

I'm pretty excited, we've now got documentation up for the SPARC processors. Take a look at the SPARC T4 supplement, the SPARC T4 performance instrumentation supplement, the SPARC M5 supplement, or the familiar SPARC 2011 Architecture.

Sunday, September 28, 2014 | Work | Read More

Comparing constant duration profiles

I was putting together my slides for Open World, and in one of them I'm showing profile data from a server-style workload. ie one that keeps running until stopped. In this case the profile can be an arbitrary duration, and it's the work done in that time which is the important metric, not the total amount of time taken. Profiling for a constant duration is a slightly unusual situation. We normally profile a workload that takes N seconds, do some tuning, and it now takes (N-S)...

Tuesday, September 23, 2014 | Work | Read More

Fun with signal handlers

I recently had a couple of projects where I needed to write some signal handling code. I figured it would be helpful to write up a short article on my experiences. The article contains two examples. The first is using a timer to write a simple profiler for an application - so you can find out what code is currently being executed. The second is potentially more esoteric - handling illegal instructions. This is probably worth explaining a bit. When a SPARC processor hits an...

Friday, September 5, 2014 | Work | Read More

C++11 Array and Tuple Containers

This article came out a week or so back. It's a quick overview, from Steve Clamage and myself, of the C++11 tuple and array containers. When you take a look at the page, I want you to take a look at the "about the authors" section on the right. I've been chatting to various people and we came up with this as a way to make the page more interesting, and also to make the "see also" suggestions more obvious. Let me know if you have any ideas for further improvements.

Thursday, September 4, 2014 | Work | Read More

My schedule for JavaOne and Oracle Open World

I'm very excited to have got my schedule for Open World and JavaOne: CON8108: Engineering Insights: Best Practices for Optimizing Oracle Software for Oracle Hardware Venue / Room: Intercontinental - Grand Ballroom C Date and Time: 10/1/14, 16:45 - 17:30 CON2654: Java Performance: Hardware, Structures, and Algorithms Venue / Room: Hilton - Imperial Ballroom A Date and Time: 9/29/14, 17:30 - 18:30 The first talk will be about some of the techniques I use when performance tuning...

Tuesday, August 26, 2014 | Work | Read More

Providing feedback on the Solaris Studio 12.4 Beta

Obviously, the point of the Solaris Studio 12.4 Beta programme was for everyone to try out the new version of the compiler and tools, and for us to gather feedback on what was working, what was broken, and what was missing. We've had lots of useful feedback - you can see some of it on the forums. But we're after more. Hence we have a Solaris Studio 12.4 Beta survey where you can tell us more about your experiences. Your comments are really helpful to us. Thanks.

Friday, August 15, 2014 | Work | Read More

Studio 12.4 Beta Refresh, performance counters, and CPI

We've just released the refresh beta for Solaris Studio 12.4 - free download. This release features quite a lot of changes to a number of components. It's worth calling out improvements in the C++11 support and other tools. We've had few comments and posts on the Studio forums, and a bunch of these have resulted in improvements in this refresh. One of the features that is deserving of greater attention is default hardware counters in the Performance Analyzer. Default hardware...

Friday, July 11, 2014 | Personal | Read More

Guest post on the OTN Garage

Contributed a post on how compilers handle constants to the OTN Garage. The whole OTN blog is worth reading because as well as serving up useful info, Rick has a good irreverent style of writing.

Wednesday, June 25, 2014 | Personal | Read More

Presenting at JavaOne and Oracle Open World

Once again I'll be presenting at Oracle Open World, and JavaOne. You can search the full catalogue on the web. The details of my two talks are: Engineering Insights: Best Practices for Optimizing Oracle Software for Oracle Hardware [CON8108] Oracle Solaris Studio is an indispensable toolset for optimizing key Oracle software running on Oracle hardware. This presentation steps through a series of case studies from real Oracle applications, illustrating how the various Oracle...

Monday, June 23, 2014 | Personal | Read More

What's happening

Been isolating a behaviour difference, used a couple of techniques to get traces of process activity. First off tracing bash scripts by explicitly starting them with bash -x. For example here's some tracing of xzless: $ bash -x xzless+ xz='xz --format=auto'+ version='xzless (XZ Utils) 5.0.1'+ usage='Usage: xzless [OPTION]... [FILE]...... Another favourite tool is truss, which does all kinds of amazing tracing. In this instance all I needed to do was to see what other commands...

Friday, June 20, 2014 | Personal | Read More

Enabling large file support

For 32-bit apps the "default" maximum file size is 2GB. This is because the interfaces use the long datatype which is a signed int for 32-bit apps, and a signed long long for 64-bit apps. For many apps this is insufficient. Solaris already has huge numbers of large file aware commands, these are listed under man largefile. For a developer wanting to support larger files, the obvious solution is to port to 64-bit, however there is also a way to remain with 32-bit apps. This is...

Friday, June 13, 2014 | Personal | Read More

Article in Oracle Scene magazine

Oracle Scene is the quarterly for the UK Oracle User Group. For the current issue, I've contributed an article on developing with Solaris Studio.

Wednesday, June 11, 2014 | Personal | Read More

Pretty printing using indent

If you need to pretty-print some code, then the compiler comes with indent for exactly this purpose!

Wednesday, June 4, 2014 | Personal | Read More

Generic hardware counter events

A while back, Solaris introduced support for PAPI - which is probably as close as we can get to a de-facto standard for performance counter naming. For performance counter geeks like me, this is not quite enough information, I actually want to know the names of the raw counters used. Fortunately this is provided in the generic_events man page: $ man generic_eventsReformatting page. Please Wait... doneCPU Performance Counters Library Functions generic_events(3CPC)NAME generic_events - generic performance counter eventsDESCRIPTION The Solaris cpc(3CPC) subsystem implements a number of predefined, generic performance counter events. Each generic... Intel Pentium Pro/II/III Processor Generic Event Platform Event Event Mask _____________________________________________________________ PAPI_ca_shr l2_ifetch 0xf PAPI_ca_cln bus_tran_rfo 0x0 PAPI_ca_itv bus_tran_inval 0x0 PAPI_tlb_im itlb_miss 0x0 PAPI_btac_m btb_misses 0x0 PAPI_hw_int hw_int_rx 0x0...

Friday, May 23, 2014 | Work | Read More

Introduction to OpenMP

Recently, I had the opportunity to talk with Nawal Copty, our OpenMP representative, about writing parallel applications using OpenMP, and about the new features in the OpenMP 4.0 specification. The video is available on youtube. The set of recent videos can be accessed either through the Oracle Learning Library, or as a playlist on youtube.

Tuesday, May 6, 2014 | Personal | Read More

Unsigned integers considered annoying

Let's talk about unsigned integers. These can be tricky because they wrap-around from big to small. Signed integers wrap-around from positive to negative. Let's look at an example. Suppose I want to do something for all iterations of a loop except for the last OFFSET of them. I could write something like: if (i < length - OFFSET) {} If I assume OFFSET is 8 then for length 10, I'll do something for the first 2 iterations. The problem occurs when the length is less than...

Tuesday, April 29, 2014 | Personal | Read More

What's new in C++11

I always enjoy chatting with Steve Clamage about C++, and I was really pleased to get to interview him about what we should expect from the new 2011 standard.

Thursday, April 17, 2014 | Personal | Read More

Lambda expressions in C++11

Lambda expressions are, IMO, one of the interesting features of C++11. At first glance they do seem a bit hard to parse, but once you get used to them you start to appreciate how useful they are. Steve Clamage and I have put together a short paper introducing lambda expressions.

Wednesday, April 16, 2014 | Personal | Read More

RAW hazards revisited (again)

I've talked about RAW hazards in the past, and even written articles about them. They are an interesting topic because they are situation where a small tweak to the code can avoid the problem. In the article on RAW hazards there is some code that demonstrates various types of RAW hazard. One common situation is writing code to copy misaligned data into a buffer. The example code contains a test for this kind of copying, the results from this test, compiled with Solaris...

Tuesday, April 8, 2014 | Personal | Read More

Interview with Don Kretsch

As well as filming the "to-camera" about the Beta program, I also got the opportunity to talk with my Senior Director Don Kretsch about the next compiler release.

Friday, April 4, 2014 | Personal | Read More

About the Studio 12.4 Beta Programme

Here's a short video where I talk about the Solaris Studio 12.4 Beta programme.

Friday, April 4, 2014 | Personal | Read More

Discovering the Code Analyzer

We're doing something different with the Studio 12.4 Beta programme. We're also putting together some material about the compiler and features: videos, whitepapers, etc. One of the first videos is now officially available. You might have seen the preproduction "leak" if you happen to follow Studio on either facebook or twitter. This first video is an interview with Raj Prakash, the project lead for the Code Analyzer. The Code Analyzer is our suite for checking the correctness...

Friday, April 4, 2014 | Work | Read More

SPARC roadmap

A new SPARC roadmap has been published. We have some very cool stuff coming :)

Thursday, April 3, 2014 | Personal | Read More

Socialising Solaris Studio

I just figured that I'd talk about studio's social media presence. First off, we have our own forums. One for the compilers and one for the tools. This is a good place to post comments and questions; posting here will get our attention. We also have a presence on Facebook and Twitter. Moving to the broader Oracle community, these pages list social media presence for a number of products. Looking at Oracle blogs, the first stop probably has to be the entertaining The OTN Garage....

Monday, March 31, 2014 | Personal | Read More

Solaris Studio 12.4 Beta now available

The beta programme for Solaris Studio 12.4 has opened. So download the bits and take them for a spin! There's a whole bunch of features - you can read about them in the what's new document, but I just want to pick a couple of my favourites: C++ 2011 support. If you've not read about it, C++ 2011 is a big change. There's a lot of stuff that has gone into it - improvements to the Standard library, lambda expressions etc. So there is plenty to try out. However, there are some...

Wednesday, March 26, 2014 | Personal | Read More

JavaOne award

I was thrilled to get a JavaOne 2013 Rockstar Award for Charlie Hunt's and my talk "Performance tuning where Java meets the hardware". Getting the award was a surprise and a great honour. It's based on audience feedback, so it's really nice to find out that the audience enjoyed hearing the presentation as much as I enjoyed giving it.

Friday, March 21, 2014 | Personal | Read More

Multicore Application Programming available in Chinese!

This was a complete surprise to me. A box arrived on my doorstep, and inside were copies of Multicore Application Programming in Chinese. They look good, and have a glossy cover rather than the matte cover of the English version.

Wednesday, February 26, 2014 | Personal | Read More

Article on RAW hazards

Feels like it's been a long while since I wrote up an article for OTN, so I'm pleased that I've finally got around to fixing that. I've written about RAW hazards in the past. But I recently went through a patch of discovering them in a number of places, so I've written up a full article on them. What is "nice" about RAW hazards is that once you recognise them for what they are (and that's the tricky bit), they are typically easy to avoid. So if you see 10 seconds of time...

Wednesday, February 26, 2014 | Personal | Read More

OpenMP, macros, and #define

Sometimes you need to include directives in macros. The classic example would be putting OpenMP directives into macros. The "obvious" way of doing this is: #define BARRIER \#pragma omp barriervoid foo(){ BARRIER} Which produces the following error: "test.c", line 6: invalid source character: '#'"test.c", line 6: undefined symbol: pragma"test.c", line 6: syntax error before or at: omp Fortunately C99 introduced the _Pragma mechanism to solve this problem. So the functioning...

Tuesday, February 25, 2014 | Work | Read More


Before I start, this is not about security, it's probably the antithesis of security. So I'd recommend starting by reading about how using privileges can break the security of your system. There are three tools that I regularly use that require escalated privileges: dtrace, cpustat, and busstat. You can read up on the way that Solaris manages privileges. But if you know what you want to do, the process to figure out how to get the necessary privileges is reasonable...

Monday, December 30, 2013 | Personal | Read More

SPARC processor documentation

The SPARC processor documentation can be found here. What is really exciting though is that you can finally download the Oracle SPARC Architecture 2011 spec, which describes the current SPARC instruction set.

Wednesday, September 18, 2013 | Work | Read More

UKOUG Conference - Three presentations

Ok, my two hour presentation at UKOUG is now split into two one hour presentations. So my schedule now looks like: Monday 2nd December: Getting the most out of Oracle Solaris Studio Monday 2nd December: Where code meets the processor - performance tuning C/C++ applications Wednesday 4th December: Multicore, Multiprocess, Multithread to be presented on Wednesday 4th December I'm very pleased that I've got three separate hour long sessions. The material better fits this...

Thursday, September 12, 2013 | Personal | Read More

Presenting at UK Oracle User Group meeting

I'm very excited to have been invited to present at the UK Oracle User Group conference in Manchester, UK on 1-4 December. Currently I'm down for two presentations: a two hour slot on Monday: "Where code meets the processor...". and a one hour slot on Wednesday: "Multicore, Multiprocess, Multithread". As you might expect, I'm very excited to be over there, I've not visited Manchester in about 20 years!

Wednesday, September 11, 2013 | Personal | Read More

Timezone troubles when dual booting

I have a laptop that dual boots Solaris and Windows XP. When I switched between the two OSes I would have to reset the clock because the time would be eight hours out. This has been naggging at me for a while, so I dug into what was going on. It seems that Windows assumes that the Real-Time Clock (RTC) in the bios is using local time. So it will read the clock and display whatever time is shown there. Solaris on the other hand assumes that the clock is in Universal Time Format...

Thursday, August 29, 2013 | Personal | Read More

My Oracle Open World and JavaOne schedule

I've got my schedule for Oracle Open World and JavaOne: Performance Tuning Where Java Meets the Hardware - CON3762 9/23/13 (Monday) 1:00 PM - Hilton - Continental Ballroom 5 Developing Efficient Database Applications with C and C++ - CON9355 9/26/13 (Thursday) 12:30 PM - Marriott Marquis - Golden Gate C2 Mixed-Language Development: Leveraging Native Code from Java - CON3408 9/26/13 (Thursday) 2:00 PM - Hilton - Continental Ballroom 6 Note that on Thursday I have about 30...

Tuesday, August 27, 2013 | Work | Read More

How to use a lot of threads ....

The SPARC M5-32 has 1,536 virtual CPUs. There is a lot that you can do with that much resource and a bunch of us sat down to write a white paper discussing the options. There are a couple of key observations in there. First of all it is quite likely that such a large system will not end up running a single instance of a single application. Therefore it is useful to understand the various options for virtualising the system. The second observation is that there are a number...

Friday, August 9, 2013 | Work | Read More

JavaOne and Oracle Open World 2013

I'll be up at both JavaOne and Oracle Open World presenting. I have a total of three presentations: Mixed-Language Development: Leveraging Native Code from Java Performance Tuning Where Java Meets the Hardware Developing Efficient Database Applications with C and C++ I'm excited by these opportunities - particularly working with Charlie Hunt diving into Java Performance.

Monday, July 8, 2013 | Personal | Read More

SPARC family

This is a nice table showing the various SPARC processors being shipped by Oracle.

Tuesday, June 11, 2013 | Work | Read More

Binding to the current processor

Just hacked up a snippet of code to stop a thread migrating to a different CPU while it runs. This should help the thread get, and keep, local memory. This in turn should reduce run-to-run variance. #include <sys/processor.h>void bindnow(){ processorid_t proc = getcpuid(); if (processor_bind(P_LWPID, P_MYID, proc, 0)) { printf("Warning: Binding failed\n"); } else { printf("Bound to CPU %i\n", proc); }}

Friday, May 31, 2013 | Personal | Read More

One executable, many platforms

Different processors have different optimal sequences of code. Fortunately, most of the time the differences are minor, and we can easily accommodate them by generating generic code. If you needed more than this, then the "old" model was to use dynamic string tokens to pick the best library for the platform. This works well, and was the mechanism that libc.so used. However, the downside is that you now need to ship a bundle of libraries with the application; this can get...

Wednesday, May 29, 2013 | Work | Read More

OpenMP and language level parallelisation

The C11 and C++11 standards introduced some very useful features into the language. In particular they provided language-level access to threading and synchronisation primitives. So using the new standards we can write multithreaded code that compiles and runs on standard compliant platforms. I've tackled translating Windows and POSIX threads before, but not having to use a shim is fantastic news. There's some ideas afoot to do something similar for higher level parallelism....

Monday, April 1, 2013 | Work | Read More

The pains of preprocessing

Ok, so I've encountered this twice in 24 hours. So it's probably worth talking about it. The preprocessor does a simple text substitution as it works its way through your source files. Sometimes this has "unanticipated" side-effects. When this happens, you'll normally get a "hey, this makes no sense at all" error from the compiler. Here's an example: $ more c.c#include <ucontext.h>#include <stdio.h>int main(){ int FS; FS=0; printf("FS=%i",FS);}$ CC c.c$ CC c.c"c.c", line 6:...

Thursday, March 14, 2013 | Work | Read More

Compiling for T4

I've recently had quite a few queries about compiling for T4 based systems. So it's probably a good time to review what I consider to be the best practices. Always use the latest compiler. Being in the compiler team, this is bound to be something I'd recommend :) But the serious points are that (a) Every release the tools get better and better, so you are going to be much more effective using the latest release (b) Every release we improve the generated code, so you will...

Wednesday, December 12, 2012 | Work | Read More

Library order is important

I've written quite extensively about link ordering issues, but I've not discussed the interaction between archive libraries and shared libraries. So let's take a simple program that calls a maths library function: #include <math.h>int main(){ for (int i=0; i<10000000; i++) { sin(i); }} We compile and run it to get the following performance: bash-3.2$ cc -g -O fp.c -lmbash-3.2$ timex ./a.outreal 6.06user 6.04sys 0.01 Now most people will...

Wednesday, December 5, 2012 | Work | Read More

It could be worse....

As "guest" pointed out, in my file I/O test I didn't open the file with O_SYNC, so in fact the time was spent in OS code rather than in disk I/O. It's a straightforward change to add O_SYNC to the open() call, but it's also useful to reduce the iteration count - since the cost per write is much higher: ...#define SIZE 1024void test_write(){ starttime(); int file = open("./test.dat",O_WRONLY|O_CREAT|O_SYNC,S_IWGRP|S_IWOTH|S_IWUSR);... Running this gave the following...

Wednesday, December 5, 2012 | Work | Read More

Write and fprintf for file I/O

fprintf() does buffered I/O, where as write() does unbuffered I/O. So once the write() completes, the data is in the file, whereas, for fprintf() it may take a while for the file to get updated to reflect the output. This results in a significant performance difference - the write works at disk speed. The following is a program to test this: #include <fcntl.h>#include <unistd.h>#include <stdio.h>#include <stdlib.h>#include <errno.h>#include <stdio.h>#include <sys/time.h>#incl...

Tuesday, December 4, 2012 | Work | Read More

Rick Hetherington on the T5

There's an interview with Rick Hetherington about the new T5 processor. Well worth a quick read.

Thursday, November 1, 2012 | Personal | Read More

Mixing Java and native code

This was a bit of surprise to me. The slides are available from my presentation at JavaOne on mixed language development. What I wasn't expecting was that there would also be a video of the presentation.

Thursday, October 18, 2012 | Work | Read More

Maximising T4 performance

My presentation from Oracle Open World is available for download.

Thursday, October 18, 2012 | Personal | Read More

25 years of SPARC

Looks like an interesting event at the Computer History Museum on 1 November. A panel discussing SPARC at 25: Past, Present and Future. Free sign up.

Tuesday, October 9, 2012 | Personal | Read More

Current SPARC Architectures

Different generations of SPARC processors implement different architectures. The architecture that the compiler targets is controlled implicitly by the -xtarget flag and explicitly by the -arch flag. If an application targets a recent architecture, then the compiler gets to play with all the instructions that the new architecture provides. The downside is that the application won't work on older processors that don't have the new instructions. So for developer's there is a...

Friday, September 14, 2012 | Work | Read More

SPARC Architecture 2011

With what appears to be minimal fanfare, an update of the SPARC Architecture has been released. If you ever look at SPARC disassembly code, then this is the document that you need to bookmark. If you are not familiar with it, then it basically describes how a SPARC processor should behave - it doesn't describe a particular implementation, just the "generic" processor. As with all revisions, it supercedes the SPARC v9 book published back in the 90s, having both corrections,...

Friday, August 31, 2012 | Work | Read More

CON6714 - Mixed-Language Development: Leveraging Native Code from Java

Here's the abstract from my JavaOne talk: There are some situations in which it is necessary to call native code (C/C++ compiled code) from Java applications. This session describes how to do this efficiently and how to performance-tune the resulting applications. The objectives for the session are: Explain reasons for using native code in Java applications Describe pitfalls of calling native code from Java Discuss performance-tuning of Java apps that use native code I'll cover how...

Wednesday, August 29, 2012 | Personal | Read More

Monday, 1st October: Presenting at JavaOne and Oracle Open World

On Monday 1 October I will be presenting at both JavaOne and Oracle Open World. The full conference schedule is available from here. The logistics for my sessions are as follows: JavaOne: 8:30am Monday 1 October. CON6714: "Mixed-Language Development: Leveraging Native Code from Java". San Francisco Hilton - Continental Ballroom 6 Oracle OpenWorld: 10:45am Monday 1 October. CON6382: "Maximizing Your SPARC T4 Oracle Solaris Application Performance". Marriott Marquis - Golden...

Monday, August 27, 2012 | Work | Read More

Square roots

If you are spending significant time calling sqrt() then to improve this you should compile with -xlibmil. Here's some example code that calls both fabs() and sqrt(): #include <math.h>#include <stdio.h>int main(){ double d=23.3; printf("%f\n",fabs(d)); printf("%f\n",sqrt(d));} If we compile this with Studio 12.2 we will see calls to both fabs() and fsqrt(): $ cc -S -O m.c bash-3.2$ grep call m.s$ grep call m.s|grep -v printf/* 0x0018 */ call fabs ...

Tuesday, May 22, 2012 | Personal | Read More

Solaris Developer talk next week

Vijay Tatkar will be talking about developing on Solaris next week Tuesday at 9am PST.

Friday, May 18, 2012 | Work | Read More


If you are computing both the sine and cosine of an angle, then you will be twice as quick if you call sincos() than if you call cos() and sin() independently: #include int main(){ double a,b,c; a=1.0; for (int i=0;i<100000000;i++) { b=sin(a); c=cos(a); }}$ cc -O sc.c -lm$ timex ./a.outreal 19.13 vs #include int main(){ double a,b,c; a=1.0; for (int i=0;i<100000000;i++) { sincos(a,&b,&c); }}$ cc -O sc.c -lm$ timex ./a.outreal 9.80

Monday, April 23, 2012 | Personal | Read More

What is -xcode=abs44?

I've talked about building 64-bit libraries with position independent code. When building 64-bit applications there are two options for the code that the compiler generates: -xcode=abs64 or -xcode=abs44, the default is -xcode=abs44. These are documented in the user guides. The abs44 and abs64 options produce 64-bit applications that constrain the code + data + BSS to either 44 bit or 64 bits of address. These options constrain the addresses statically encoded in...

Friday, April 20, 2012 | Work | Read More