Active File Sparsing

Filesystem sparse file usage ( and why you should care).

Most modern unix systems support the notion of a "sparse file".  I'll use a core file to demonstrate the principles on HP-UX ia64 11.31.   The sample program is taken from the HP How To Get Valid Core Files document, which   I used it to understand how HPUX processes core files.

The x.c program simply allocates a large ( > 2GB ) anonymous map file, and aborts creating a core file.  The 'core' file will be suppressed or truncated based on a number of conditions this document describes.

  • ulimit -c
    The number of file blocks for a core file.  There are really only two valid values:  unlimited or zero.
    Anything else will produce a corrupt core file that cannot be analyzed, so what's the point of that.
    On my test system, an administrator had set these values:
    $ ulimit -c
    4194303
    $ ulimit -Hc
    unlimited

    The default core size was (4194303*512*1024) =  2199022731264 ( 2047MB / 2GB )
    This is not helpful.  But the hard limit is unlimited, so in my login .profile I set "ulimit -c unlimited".
  • Not enough space in the filesystem ( use "bdf" to check )
  • "LargeFile" option not set in the filesystem ( no verification check provided )
  • ulimit -f
    This wasn't mentioned, but 'core' is a file and two ulimit values control whether it can be written fully.
  • "disk quotas"  may also limit what you can write in the filesystem.  It wasn't mentioned.

My objective was to determine how much filesystem space was required to write this empty mmap storage.

It's a perfect example of a sparse file, and can be written in a way that consumes very few filesystem blocks.  The idea is to write only the non-zero data to a file, and use filesystem seek to skip over areas of the file that are zero.   When a sparse file is read by an application, the kernel "materializes" the empty space to give the appearance that it contains zeros.

So the question now is:  how many filesystem blocks does this crashing program consume.  Does the HPUX kernel write zeros into the filesystem, or does it write a sparse file.   Process memory is often dominated by allocated space that has never been touched by the program and uses no "real memory pages".  Are the "non-dirty pages" written to a core file?

Here's the test program:
--- Makefile ---
testcore:       testcore.c
        cc +DD64 testcore.c -o testcore
        ./testcore
--- testcore.c ---
#include <errno.h>
#include <stdlib.h>
#include <sys/mman.h>
int main() {
  if (mmap(NULL, 2*1024*1024*1024L, PROT_READ|PROT_WRITE,
      MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) == NULL) perror("mmap");
  else abort();
}

Here's the test:
$ make
$ ls -l core
-rw------- 1 ddunbar staff 2147728064 Aug 22 21:12 core
$ what core
core:
         $ B.11.31  Jun 26 2008 10:06:45 $
        92453-07 linker dld HP Itanium(R) B.12.55
        92453-07 linker uld HP Itanium(R) B.12.55
$ pax -xpax -wf testcore.pax core
$ rm core
$ gzip testcore.pax
$ gzip -dc testcore.pax.gz | pax -r 
$ rm testcore.pax.gz

Here's the analysis:

 FS Used

 KB Used

 Comments

 0

 1573520

 0

 Starting KB used
1

 3670911

 2097391

 testcore ( KB used by 'core')
2

 3705190

 2131670

 pax archive core (KB used  by .pax)
3

 1575572

 2052

 gzip pax archive ( KB used by pax.gz)
4

 1573808

 288

 restore core, remove pax.gz (KB used by core)
5

 1573520

 0

 Remove core ( back to original filesystem usage)

  1. The 'core' file written by 'testcore' is written byte-for-byte to the filesystem.
    2GB core file consumes 2GB of filesystem space.
  2. The 'pax' program archives 'core', and consumes slightly more than 2GB of space.
    I suspect the standards for a pax archive dictate that sparsing cannot be used;  this archive has to be readable by any conforming pax implementation.
  3. The good news is that strings of binary zeros compress quite nicely with gzip.
    Dramatic drop in amount of filesystem blocks consumed to store the compressed pax archive.
  4. If we now pretend we are the receipent of testcore.pax.gz, we can pipe the gzip decompression into the pax -r (read) to write 'core'.  This is where pax "active sparsing" results can be seen.   'core' now consumes dramatically fewer filesystem blocks than the original.   Put another way, this filesystem can hold far more 'core' files if they are sparsed to skip strings of zero bytes.
  5. Removing the 'core' file returns us to the original "Used" filesystem blocks.
    Nothing left behind because of this exercise.

Conclusions:

  1. If HPUX used active sparsing logic to write from memory to the file 'core', the process would complete in much less time, use far fewer cpu cycles to accomplish this, and far fewer filesystem blocks.
  2. If gdb 'packcore' would use pax to create the packcore archive, instead of tar, the receiving filesystem will use far fewer resources.  ( Actually, it isn't necessary to create a pax file; just use pax to extract from either tar or pax archive. )
  3. If gdb "unpackcore" used gzip piping to restore the archive, unnecessary filesystem blocks will also be spared.
    2a, 3a. If gdb "unpackcore" used gzip piping through pax, instead of tar, a sparse core file would be written.

Bonusware:  A toy program that demonstrates how to write a sparse file.  ( 13 years to the day ).

--- Makefile ---
lf:     lf.c
        cc +DD64 lf.c -o lf
        ./lf
        ls -l large.1
        { bdf . ; rm -f large.1 ; bdf . ;} | grep '/'

--- lf.c ---

/* lf:  Create a large (sparse) file Dick Dunbar 1999-08-23 */
#define Size 0x800000000ULL /* 2**35 == 0x8,00000000 == 34,359,738,368 == 32GB */
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/types.h>

int               mm_fh;
char              filename[] = "large.1";
#define mode664   S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH
#define fileRESET O_RDWR|O_CREAT|O_TRUNC|O_DIRECT
unsigned char     buf[]={0x0000000000000001};
int main() {
   mm_fh = open(filename, fileRESET, mode664, 0);
   if ( mm_fh <= 0) { perror("lf: open large.1"); exit(1); }
   lseek(mm_fh, Size-8, SEEK_SET);
   write(mm_fh, buf, 8);
   close(mm_fh);
}

A 32GB file takes zero space in the filesystem. 

$ make lf
        cc +DD64 lf.c -o lf
        ./lf
        ls -l large.1
-rw-rw-r--   1 ddunbar    dba        34359738368 Aug 23 19:32 large.1
        { bdf . ; rm -f large.1 ; bdf . ;} | grep '/'
/dev/vg01/lvol2    102400000 1575642 94522857    2% /scratch
/dev/vg01/lvol2    102400000 1575642 94522857    2% /scratch


An archive need not be written by pax in order to use active sparsing.

Most unix systems now support the gnu tar '-z' option, which produces a compressed tar file (.tgz).
The option doesn't appear to be available in HP-UX ia64 11.31, but it is common to compress tar files with gzip.

$ tar -cvf testcore.tar core ; gzip -dv testcore.tar

The resulting 'testcore.tar.gz' file is usually extracted by gzip | tar as in:

$ gzip -dc testcore.tar.gz | tar -xvf -

pax didn't "sparse" the file when it wrote the pax archive;  it sparses file(s) when the are extracted.  Since pax understands tar archive format, it can be used in place of tar.

$ gzip -dc testcore.tar.gz | pax -r

Processing an HP-UX gdb packcore.tar.Z file looks like this:

$ zcat packcore.tar.Z | pax -r


There is another pax trick that is useful.  Suppose you receive a tar file that was created with absolute paths:

Sending machine:

$ tar -cvf problem.tar /opt/langtools/bin/gdb

In order to use "tar" to extract the output, you would have to be root and willing to overwrite the /opt/langtools/bin directory.

pax to the rescue:

Receiving machine transforms the absolute path to a relative one.

$ pax -rv -s "'^/'./'" problem.tar
 ./opt/langtools/bin 

Comments:

Post a Comment:
Comments are closed for this entry.
About

Dick Dunbar
is an escalation engineer working in the Customer Engineering & Advocacy Lab (CEAL team)
for Oracle Analytics and Performance Management.
I live and work in Santa Cruz, California.
I'll share the techniques I use to detect, avoid and repair problems.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today