"Password-protected, but not encrypted"
By davew on Nov 26, 2007
While I'm happy that my own details haven't been leaked as part of the HMRC data leak (not having children is good for my privacy as well as my bank balance and my carbon footprint, it would seem), I'm following the news closely, as more information about the leak is disclosed.
The Chancellor of the Exchequer was interviewed on Radio 4's "Today" programme on Tuesday morning last week, and said something which particuarly surprised me. Specifically, he referred to the way in which the data was stored on the missing discs as "password-protected, but not encrypted".
I conjecture that you can't actually have password protection, without encryption.
Consider one of these missing disks. If it was to turn up and you put in your DVD-ROM drive, you could dd the blocks off it, to get yourself a file of anything up to 4-and-a-bit GB in size. If you then grep through it for known cleartext (such as names of folk you know, who are parents) you'll get matches unless the data is either compressed or encrypted; it's fair to assume that the files on the disks will have been generated by fresh extraction from a database of some sort, so you're going to be looking at a reasonably sequential set of blocks, without much fragmentation or indirection.
This neatly bypasses any application-layer password system.
If the files on the disks are simply compressed, you could either reconstruct the compressed data sets from the dd'ed blocks using forensic tools, or simply mount the disks, copy the files to scratch space and decompress them.
Here's where you're likely to hit password protection - at the application layer.
Thinking about what is likely to have been done when marshalling the files to burn onto the disks, it's rather probably that whatever raw data required, was put into a password-protected zip archive (in fact, http://news.bbc.co.uk/1/hi/uk_politics/7106987.stm suggests this is the case).
The zip compression standard indicates that, where password protection is applied, the password is used to unlock a soft keystore from which a symmetric key is extracted, and that key then decrypts the main body of the archive before the usual decompression takes place.
Please note the use of the word "decrypts", Chancellor :-).
Apparently, WinZip 9.x introduced AES encryption, so depending on what version of what zipping app is in use at HMRC, it may even be using a US-formally-approved algorithm.
Granted, the soft keystore needs to be bound up with the data in the file (and it's usually advisable to keep your keys somewhere where your data isn't), but encryption is still encryption. However, for earlier versions of Zip, I'm reliably informed that the PC1 encryption algorithm it uses, is rather straightforward to break.
It's also possible that, rather than password-protect a zip archive, HMRC sent the data in some password-protected spreadsheet form; let's look at what happens with StarOffice Spreadsheet and Microsoft Excel, in this regard...
From the OASIS standard for ODF 1.0...
The encryption process takes place in the following multiple stages:
1. A 20-byte SHA1 digest of the user entered password is created and passed to the package component.
2. The package component initializes a random number generator with the current time.
3. The random number generator is used to generate a random 8-byte initialization vector and 16-byte salt for each file.
4. This salt is used together with the 20-byte SHA1 digest of the password to derive a unique 128-bit key for each file. The algorithm used to derive the key is PBKDF2 using HMAC-SHA- 1 (see [RFC2898]) with an iteration count of 1024.
5. The derived key is used together with the initialization vector to encrypt the file using the Blowfish algorithm in cipher-feedback (CFB) mode.
For Excel, here's the appropriate quote directly from Microsoft's support site:
"You can use a strong password with the Password to Open feature in conjunction with RC4 level advanced encryption to require a user to enter a password to open an Office file."
Not as explicitly defined as the ODF standard, but then, that's Microsoft for you.
Nonetheless, RC4, if correctly implemented, is Plenty Good Enough to count as "encryption".
Of course, if the HMRC infrastructure had been built on top of Trusted Extensions, the "junior employee" (noting the rumours forming, that more senior staff may have been complicit) would probably not have had the label at which all this data was stored within his clearance range, or the clearance range of a role that he was allowed to assume without passing through a two-person rule; he certainly wouldn't have had the privilege to mount or burn media at that label...
Actually, it looks like "password protction without encryption" has been implemented as a feature, as "Password to modify" in Microsoft Office - but, as you might expect, it doesn't work...