By dap on Sep 09, 2010
What do you think happens when you do this:
# cat vmcore.4 > /dev/null
If you've used Unix systems before, you might expect this to read vmcore.4 into memory and do nothing with it, since cat(1) reads a file, and "> /dev/null" sends it to the null driver, which accepts data and does nothing. This appears pointless, but can actually be useful to bring a file into memory, for example, or to evict other files from memory (if this file is larger than total cache size).
But here's a result I found surprising:
# ls -l vmcore.1 -rw-r--r-- 1 root root 5083361280 Oct 30 2009 vmcore.1 # time cat vmcore.1 > /dev/null real 0m0.007s user 0m0.001s sys 0m0.007s
That works out to 726GB/s. That's way too fast, even reading from main memory. The obvious question is how does cat(1) know that I'm sending to /dev/null and not bother to read the file at all?
Of course, you can answer this by examining the cat source in the ON gate. There's no special case for /dev/null (though that does exist elsewhere), but rather this behavior is a consequence of an optimization in which cat(1) maps the input file and writes the mapped buffer instead of using read(2) to fill a buffer and write that. With truss(1) it's clear exactly what's going on:
# truss cat vmcore.1 > /dev/null execve("/usr/bin/cat", 0x08046DC4, 0x08046DD0) argc = 2 [ ... ] write(1, ..., 8388608) = 8388608 mmap64(0xFE600000, 8388608, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 8388608) = 0xFE600000 write(1, ..., 8388608) = 8388608 mmap64(0xFE600000, 8388608, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 0x01000000) = 0xFE600000 [ ... ] mmap64(0xFE600000, 8388608, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 0x000000012E000000) = 0xFE600000 write(1, ..., 8388608) = 8388608 mmap64(0xFE600000, 8253440, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 0x000000012E800000) = 0xFE600000 write(1, ..., 8253440) = 8253440 llseek(3, 0x000000012EFDF000, SEEK_SET) = 0x12EFDF000 munmap(0xFE600000, 8388608) = 0 llseek(3, 0, SEEK_CUR) = 0x12EFDF000 close(3) = 0 close(1) = 0 _exit(0)
cat(1) really is issuing tons of writes from the mapped file, but the /dev/null device just returns immediately without doing anything. The file mapping is never even read. If you actually wanted to read the file (for the side effects mentioned above, for example), you can defeat this with an extra pipe:
# time cat vmcore.1 | cat > /dev/null real 0m32.661s user 0m0.865s sys 0m32.127s
That's more like it: about 155MB/s streaming from a single disk. In this case the second cat invocation can't use this optimization since stdin is actually a pipe, not the input file.
There's another surprising result of the initial example: the file's access time actually gets updated even though it was never read:
# ls -lu vmcore.2 -rw-r--r-- 1 root root 6338052096 Nov 3 2009 vmcore.2 # time cat vmcore.2 > /dev/null real 0m0.040s user 0m0.001s sys 0m0.008s # ls -lu vmcore.2 -rw-r--r-- 1 root root 6338052096 Aug 6 15:55 vmcore.2
363 /\* 364 \* NFS V2 will let root open a file it does not have permission 365 \* to read. This read() is here to make sure that the access 366 \* time on the input file will be updated. The VSC tests for 367 \* cat do this: 368 \* cat file > /dev/null 369 \* In this case the write()/mmap() pair will not read the file 370 \* and the access time will not be updated. 371 \*/ 372 373 if (read(fi_desc, &x, 1) == -1) 374 read_error = 1;
I found this all rather surprising because I think of cat(1) as one of the basic primitives that's dead-simple by design. If you really want something simple to read a file into memory, you might be better off with dd(1M).