Friday Jan 16, 2009

Out of memory in the Performance Analyzer

I've been working on an Analyzer experiment from a long running multithreaded application. Being MT I really needed to see the Timeline view to make sense of what was happening. However, when I switched to the Timeline I got a Java Out of Memory error (insufficient heap space).

Tracking this down, I used prstat to watch the Java application run and the memory footprint increase. I'd expected it to get to 4GB and die at that point, so I was rather surprised when the process was only consuming 1.1GB when the error occurred.

I looked at the commandline options for the Java process using pargs, and spotted the flag -Xmx1024m; which sets the max memory to be 1GB. Ok, found the culprit. You can use the -J option to analyzer to pass flags to the JVM. The following invocation of the analyzer sets the limit to 4GB:

$ analyzer -J-Xmx4096m test.1.er

If you need more memory than that, you'll have to go to the 64-bit JVM, and allocate an appropriate amount of memory:

$ analyzer -J-d64 -J-Xmx8192m test.1.er

Thursday Feb 07, 2008

Page size and memory layout

Support for large pages has been available since Solaris 9, I've previously talked about the various ways that an application can be coaxed into using large pages. However, I wanted to quickly write up how the large pages are laid out in memory. Take the following code that allocates a large chunk of memory, and then iterates over it for enough time to run pmap -xs on it:

#include <stdlib.h>

void main()
{
  int x,y;
  char \*c;
  c=(char\*)malloc(sizeof(char)\*300000000);
  for (y=0; y<; y++)
  for (x=0; x<300000000; x++) { c[x]=c[x]+y;}
}

Compiling this code to use 4MB pages and then running the resulting executable produces a pmap output like:

% cc -xpagesize=4M t.c
% a.out&
[1] 15501
% pmap -xs 15501
15501:  a.out
 Address  Kbytes     RSS    Anon  Locked Pgsz Mode   Mapped File
00010000       8       8       -       -   8K r-x--  a.out
00020000       8       8       8       -   8K rwx--  a.out
00022000    3960    3960    3960       -   8K rwx--    [ heap ]
00400000  290816  290816  290816       -   4M rwx--    [ heap ]
...

Notice that the heap starts on 8KB pages, and uses these up until the memory reaches a 4MB boundary and then starts using 4MB pages. In this case it means that nearly 4MB of the memory is not using 4MB pages - if this happens to be where the majority of the program's active data resides, then there will still be plenty of TLB misses.

Fortunately, it is possible to tell the linker where to start the heap. There are some mapfiles provided in /usr/lib/ld/ for various scenarios, the one that we need is map.bssalign. Recompiling with this produces the following memory layout:

% cc -M /usr/lib/ld/map.bssalign -xpagesize=4M t.c
% a.out&
[1] 19077
% pmap -xs 19077
19077:  a.out
 Address  Kbytes     RSS    Anon  Locked Pgsz Mode   Mapped File
00010000       8       8       -       -   8K r-x--  a.out
00020000       8       8       8       -   8K rwx--  a.out
00400000  294912  294912  294912       -   4M rwx--    [ heap ]

With this change the heap now starts on a 4MB boundary and is entirely mapped with 4MB pages.

About

Darryl Gove is a senior engineer in the Solaris Studio team, working on optimising applications and benchmarks for current and future processors. He is also the author of the books:
Multicore Application Programming
Solaris Application Programming
The Developer's Edge

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
5
6
8
9
10
12
13
14
15
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Bookmarks
The Developer's Edge
Solaris Application Programming
Publications
Webcasts
Presentations
OpenSPARC Book
Multicore Application Programming
Docs