Recovering postgres data
By avalon on May 01, 2008
For a while now I've been toying around with postgres at home and from time to time, I seem to have reliability problems. Those problems do not appear to be postgres's, rather it is likely they are due to the operating system (NetBSD) or hardware (a small & cheap HP box.) Whatever the cause, what I see is postgres refusing to access data because an index file is borked or some other internal meta data file is borked and it refuses to load any data. My data was there but it wouldn't give me access. Grrrr....
With files of a couple of gigabytes in size (yeah, small, I know), it seemed obvious that the data was still there - somewhere. Using some standard tools such as strings and hexdump showed that the data was all there, intact, I just need to work out how to get it out of the files. With some experimentation, I had managed to work out a good amount of how it worked but still ran into a few problems. Then I saw a pointer to another utility, pg_filedump, that worked on the data/index files in postgres. The key to making real progress was looking at the header files for postgres that defined its data file formats. I had, up until this point, been refusing to do this, intent on reverse engineering as much of the data format as I could (this was to make the task challenging - just looking at the source code seemed like cheating.) After finding the header files, I started over and life was a bit easier.
The result has been something of a success. Not being content with the easy sailing, I've left priting out tuple data to what I could make sense of in the binary files. The only important types I haven't been able to decode yet are time, timestamp and date.
There is a bold warning at the top of the man page in this file for a very good reason: if the database has crashed and will not start up, there is no way of knowing what the status of your data really is. In a risk free environment, you would recover data from backups and rebuild the database from that point. This program requires you to make an educated guess about the sanity and correctness of the data that it retrieves: if you want to reuse it. i.e. no dba worth anything would ever use this with production data - that's what backups and application log files are for.http://coombs.anu.edu.au/~avalon/pg_dumpdata.tgz