SSH messages: "Bad packet length", "Corrupted MAC on input"

I'd like to explain two SSH messages that are seen from time to time. Those are generic SSH protocol version 2 messages but the actual wording can change, of course. The wording here comes from OpenSSH (and thus was inherited by SunSSH). Before I explain what is the difference between those two fatal errors let's see what the SSH packet looks like and how it is processed since we will need that later:

SSH
packet

I leave the initial key exchange out of that so let's pretend we already have all the keys we need. See RFC 4253 (The Secure Shell (SSH) Transport Layer Protocol) for the details. So, we can see that almost everything gets encrypted before sending the packet over the network, including the packet length (4 bytes). The only field that is not encrypted is the MAC (Message Authentication Code). This MAC is created from the unencrypted packet sequence number and those yet-unencrypted fields pictured in the light green. This MAC is then appended after the encrypted part, finishing the SSH packet.

MAC is usually an HMAC computed from MD5 (result is typically 16 bytes) or SHA-1 (20 bytes). There is a draft on UMAC in SSH as well which can have shorter outputs but that's not relevant for now. The important thing is that the MAC field is not encrypted.

Now let's investigate those two error messages.

"Bad packet length"

This problem happens when either party decrypts the first cipher block of the SSH packet and checks the packet length. Obviously, the packet length must be at least 5 bytes. RFC 4253 specifies that any implementation must support the length of the whole packet to be at least 35000 bytes. Anyway, SSH implementations usually allow for longer packets. OpenSSH/SunSSH accepts length field to be up to 256KB (256 \* 1024). So, this must be true otherwise we have a bad packet length:

5 <= length <= 256 \* 1024

Possible reasons for "Bad packet length"

There is usually only one reason for this error message - bad encryption or decryption. In that case the peer decrypts the 1st cipher block and gets some rubbish in those 4 bytes. The probability that the rubbish will fit into the correct packet length is (256 \* 1024)/2\^32. That's roughly 0.006%, meaning 6 badly processed cipher blocks out of 100000 of them will pass the initial length test. And even after that, only 1 of 16 succeeds in case of the mostly used cipher - AES; all in all, on average only 4 random cipher blocks out of 1 milion pass the initial packet length field tests (1/2\^18). We will talk a little bit more about that later.

Could it be just a corruption on the wire? Could be but the chance that it happened in those specific 4 bytes for the first time during the SSH TCP connection is also very small. Note that if the corruption happened during the initial key exchange the connection would have been closed - the protocol is protected against such situation. So, if the problem is the encryption or decryption it usually happens right after the key exhange when the first encrypted message, SSH_MSG_SERVICE_REQUEST, is sent.

Obviously, it could be a bug in the SSH implementation itself but that's usually not the case, here it would probably either work or not. I don't ever remember either of the two errors explained in this blog entry being a bug in SunSSH.

Example

An example that is exactly the opposite of what I concluded on above, ie. seeing that error message while encryption/decryption is working fine, is this bug in recent Nevada builds: 6777719 Fused tcp path can re-order packets. Since it can, in rare cases, reorder the packets when loopback is involved, the decrypted 1st cipher blocks gets rubbish in the length packet field. This is so that you can see that nothing is for certain... In a moment it should be clear that this bug practically cannot cause the ""Corrupted MAC on input" since this problem always corrupts the beginning of the SSH packet - from the SSH application point of view. Well it can - with 0.0004% probability as we already discussed.

"Corrupted MAC on input"

This situation happens when the packet is decrypted, the length field is checked, the MAC is computed over the decrypted data and then checked against the MAC field from the SSH packet (see the picture above). If those two MACs don't match we print the "bad mac" error message.

Possible reasons for "Corrupted MAC on input"

If you see those messages instead of the "Bad packet length" one you can safely assume that the encryption/decryption works fine. If it wasn't then the packet length check could hardly pass a few times in a row - assuming we have seen the message a couple of times at least. That means that we have a data corruption somewhere. There are many situations this could happen. It could be a mulfunctioning:

  • firewall, or
  • NAT, or
  • NIC device driver, or
  • NIC itself, or
  • switch/router along the way, or
  • ...something else that corrupted the data in between the two SSH parties

Again, it could also be the SSH implementation itself but as with the "bad packet length" problem that's usually not the case. Note that all those corruptions assume that the TCP packet passes the checksum test but that can easily happen. The checksum is basically a sum of all 16 bit words in the TCP frame; see RFC 793 (Transmission Control Protocol) for the details.

Example

When working on SunSSH with HW crypto support project I hit this error message when testing large data transfers on UltraSparc-T2 machines. The problem was that the HW counter in the AES CTR mode in the UltraSparc-T2 CPU was just 32-bit while SSH protocol requires 128 bits. The solution was to enhance the n2cp driver to cope with such situation. The problem was tracked under 6746885 ssh error: "2: Corrupted MAC on input." when AES CTR is used with the n2cp driver. Note that this bug can not cause the error message on Solaris 10 since hardware accelerated SunSSH is planned for the next S10U7 and the fix will be there as well.

Conclusion

Those two error messages explained are the most common errors when it comes to the SSH data integrity and corruption problems. They are regularly asked about in various mailing lists and while one can find some anwers through the search engines I hope this blog entry can help to properly understand the underlying magic behind them.

Comments:

Excellent, informative explanation. Thank you for taking the time to write this.

Posted by Leonardo Boiko on February 03, 2010 at 09:52 AM CET #

Hi Jan ,

I have used sun solaris openssl pkcs11 engine and when I used this engine with openssh ,I am getting "Bad packet length" error .

This error I am getting when i run ssh hostname
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received

Upto this ,everything works fine .

This issue come after client authenticate to the server

Permission denied, please try again.
root@hostname.gmail.com's password:
debug3: packet_send2: adding 64 (len 53 padlen 11 extra_pad 64)
debug2: we sent a password packet, wait for reply
debug1: Authentication succeeded (password).
debug1: channel 0: new [client-session]
debug3: ssh_session2_open: channel_new: 0
debug2: channel 0: send open
debug1: Entering interactive session.
Disconnecting: Bad packet length 4247931972.

I have traced the PKCS11# API trace but can see every thing returns sucessfully .

If the sshd is not calling any crypto operation from PKCS11 engine ,everything works fine with ssh calling crypto operation from PKCS11 engine .

If both sshd and ssh calls the crypto operation from PKCS11 engine ,this problem comes into picture.

Please direct me what could be the problem here .

Thanks
Jais

Posted by Jais michael on February 23, 2010 at 12:27 AM CET #

i seemed to run into this using aes ciphers. running ssh with a 3des-cbc seemed help get around it.

Try something like: ssh -c 3des-cbc host.domain.tld

Posted by Jim Norton on March 04, 2010 at 03:30 PM CET #

Jais, see my blog entry on HW accelerated SunSSH. OpenSSH has different privilege separation model and since fork() is involved a few times there as well, the problem you see might be connected to fork safety issues with PKCS#11 libraries in general. Please see the presentation attached to the blog entry for more information.

Posted by Jan on March 05, 2010 at 02:38 AM CET #

Thank you for the information. I am experiencing a case where the tcp packet is not passing its checksum, and when I look at the packets, I noticed that the only thing that has changes is the very last byte of the ssh's MAC field. When we receive it in our tcp dumps, it looks like that last byte is always converted to "0x08". Have you ever seen something like this?

Posted by Kevin Luc on March 02, 2011 at 09:53 AM CET #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Jan Pechanec

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today