Wednesday Feb 26, 2014
Sunday Nov 28, 2010
By Darryl Gove-Oracle on Nov 28, 2010
Thursday Sep 09, 2010
By Darryl Gove-Oracle on Sep 09, 2010
I've just handed over the final set of edits to the manuscript. These are edits to the laid-out pages. Some are those last few grammatical errors you only catch after reading a sentence twenty times. Others are tweaks to the figures. There's still a fair amount of production work to do, but my final input will be a review of the indexing - probably next week.
So it's probably a good time to talk about the cover. This is a picture that my wife took last year. It's a picture of the globe at the cliff tops at Durlston Head near Swanage in England. It's 40 tonnes and over 100 years old. It's also surrounded by stone tablets some containing contemporary educational information, and a couple of blank ones that are there just so people can draw on them.
Monday May 17, 2010
By Darryl Gove-Oracle on May 17, 2010
I've uploaded the current table of contents for Multicore Application Programming. You can find all the detail in there, but I think it's appropriate to talk about how the book is structured.
Chapter 1. The design of any processor has a massive impact on its performance. This is particularly true for multicore processors since multiple software threads will be sharing hardware resources. Hence the first chapter provides a whistle-stop tour of the critical features of hardware. It is important to do this up front as the terminology will be used later in the book when discussing how hardware and software interact.
Chapter 2. Serial performance remains important, even for multicore processors. There's two main reasons for this. The first is that a parallel program is really a bunch of serial threads working together, so improving the performance of the serial code will improve the performance of the parallel program. The second reason is that even a parallel program will have serial sections of code. The performance of the serial code will limit the maximum performance that the parallel program can attain.
Chapter 3. One of important aspects of using multicore processors is identifying where the parallelism is going to come from. If you look at any system today, there are likely to be many active processes. So at one level no change is necessary, systems will automatically use multiple cores. However, we want to get beyond that, and so the chapter discusses approaches like virtualisation as well as discussing the more obvious approach of multi-thread or multi-process programming. One message that needs to be broadcast is that multicore processors do not need a rewrite of existing applications. However, getting the most from a multicore processor may well require that.
Chapter 4. The book discusses Windows native threading, OpenMP, automatic parallelisation, as well as the POSIX threads that are available on OS-X, Linux, and Solaris. Although the details do sometimes change across platforms, the concepts do not. This chapter discusses synchronisation primitives like mutex locks and so on, this enables the chapters which avoids having to repeat information in the implementation chapters.
Chapter 5. This chapter covers POSIX threads (pthreads), which are available on Linux, OS-X, and Solaris, as well as other platforms not covered in the book. The chapter covers multithreaded as well as multiprocess programming, together with methods of communicating between threads and processes.
Chapter 6. This chapter covers Windows native threading. The function names and the parameters that need to be passed to them are different to the POSIX API, but the functionality is the same. This chapter provides the same coverage for Windows native threads that chapter 5 provides for pthreads.
Chapter 7. The previous two chapters provide a low level API for threading. This gives very great control, but provides more opportunities for errors, and requires considerable lines of code to be written for even the most basic parallel code. Automatic parallelisation and OpenMP place more of the burden of parallelisation on the compiler, less on the developer. Automatic parallelisation is the ideal situation, where the compiler does all the work. However, there are limitations to this approach, and this chapter discusses the current limitations and how to make changes to the code that will enable the compiler to do a better job. OpenMP is a very flexible technology for writing parallel applications. It is widely supported and provides support for a number of different approaches to parallelism.
Chapter 8. Synchronisation primitives provided by the operating system or compiler can have high overheads. So it is tempting to write replacements. This chapter covers some of the potential problems that need to be avoided. Most applications will be adequately served by the synchronisation primitives already provided, the discussion in the chapter provides insight about how hardware, compilers, and software can cause bugs in parallel applications.
Chapter 9. The difference between a multicore system and a single core system is in its ability to simultaneously handle multiple active threads. The difference between a multicore system and a multiprocessor system is in the sharing of processor resources between threads. Fundamentally, the key attribute of a multicore system is how it scales to multiple threads, and how the characteristics of the application affect that scaling. This chapter discusses what factors impact scaling on multicore processors, and also what the benefits multicore processors bring to parallel applications.
Chapter 10. Writing parallel programs is a growing and challenging field. The challenges come from producing correct code and getting the code to scale to large numbers of cores. There are some approaches that provide high numbers of cores, there are other approaches which address issues of producing correct code. This chapter discusses a large number of other approaches to programming parallelism.
Chapter 11. The concluding chapter of the book reprises some of the key points of the previous chapters, and tackles the question of how to write correct, scalable, parallel applications.
Tuesday May 11, 2010
By Darryl Gove-Oracle on May 11, 2010
I'm very pleased to be able to talk about my next book Multicore Application Programming. I've been working on this for some time, and it's a great relief to be able to finally point to a webpage indicating that it really exists!
The release date is sometime around September/October. Amazon has it as the 11th October, which is probably about right. It takes a chunk of time for the text to go through editing, typesetting, and printing, before it's finally out in the shops. The current status is that it's a set of documents with a fair number of virtual sticky tags attached indicating points which need to be refined.
One thing that should immediately jump out from the subtitle is that the book (currently) covers Windows, Linux, and Solaris. In writing the book I felt it was critical to try and bridge the gaps between operating systems, and avoid writing it about only one.
Obviously the difference between Solaris and Linux is pretty minimal. The differences with Windows are much greater, but, when writing to the Windows native threading API, the actual differences are more syntactic than functional.
By this I mean that the name of the function changes, the parameters change a bit, but the meaning of the function call does not change. For example, you might call pthread_create(), on Windows you might call _beginthreadex(); the name of the function changes, there are a few different parameters, but both calls create a new thread.
I'll write a follow up post containing more details about the contents of the book.
Sunday Jul 06, 2008
By Darryl Gove-Oracle on Jul 06, 2008
Spent a couple of hours on the phone to my dad, over in the UK. He had home network troubles, the wireless network wasn't working, and the USB modem was sluggish. We switched to the the wireless router, which eventually came up with a combination of multiple reboots and ensuring that the cables were firmly plugged in. The PC got solved with a (typical) reinstall of the wireless network card driver.
Once the PC was up and running, I wanted to check the performance issues. One of the things we'd been meaning to try was crossloop which is a vnc client/server behind a user-friendly front-end. This worked very well despite a noticeable lag between the UK and California.
I'd planned to use the Performance tool that comes with XP, (Start|Control Panel|Administrative Tools|Performance). This gives a useful system-wide view of performance. It really only indicates what component is maxed out, not what is maxing it out. I did consider getting wintop, it no longer seems to be distributed by MS. However, I did locate Process Explorer (part of the sysinternals collection) which seems to do a much more thorough job than wintop.
There was one only one background process which was consuming significant resources, and that was bigfix, as far as I could tell the app was no longer supported by the company (amusingly, the link "Why is bigfix free?" describes that the free version is being EOL'd!).
I couldn't tell responsiveness over a vnc link, but apparently swapping to the wireless network rather than the DSL modem, and also removing bigfix had made the PC more responsive. Process Explorer also reported adequate free memory and low disk activity.
- Where does misaligned data come from?
- Misaligned loads profiled (again)
- Misaligned loads in 64-bit apps
- C++ rules enforced in Studio 12.4
- SPARC processor documentation
- Using the Solaris Studio IDE for remote development
- Community redesign...
- New Studio C++ blogger
- Building xerces 2.8.0
- Building old code
The Developer's Edge
Solaris Application Programming
- Coding for multiple threads on a CMT system
- Compiling for the UltraSPARC IIICu ...
- Cool Tools for SPARC systems overview
- GCC for SPARC Systems compiler options
- Improving Code Layout ...
- Interpreting UltraSPARC T1/T2 performance counters
- Memory ordering - part 1
- Memory ordering - part 2
- Performance Analysis Using SPOT
- Selecting Training Workloads ...
- Selecting the Best Compiler Options
- Sun Memory Error Discovery Tool
- UltraSPARC-IIICu Performance Counters ...
- Using Inline Templates ...
- Using Profile Feedback
- Using SHADE to Trace Program Execution
- Using VIS Instructions ...
- Using redistributable libraries
- CPU2006 training workload quality
- CPU2006 working set size
- Coding for multiple threads on a CMT system
- Compilers, Tools, and Performance - OpenSolaris Japan
- Developing and deploying software on the UltraSPARC-T1
- Evaluating training data for profile feedback ...
- Multithreaded programming for CMT sytems
- Parallelising a serial application
- SVOSUG Compiler Flags
- SVOSUG OpenSPARC
- SVOSUG Parallelisation
- SVOSUG book presentation
- Solaris and Sun Studio
- Strategies for improving the performance of serial codes on a CMT system
- Techniques for utilizing CMT