Monday Aug 24, 2009

Using DTrace to debug encrypted protocols

UPDATED: I hadn't fully swapped in the context when I wrote this blog entry, and Jordan, the engineer working this bug, tells me that the primary problem is an incorrect interpretation of the security layers bitmask on the AD side. I describe that in detail at the end of the original post, plus I add links to the relevant RFCs).

A few months ago there was a bug report that the OpenSolaris CIFS server stack did not interop with Active Directory when "LDAP signing" was enabled. But packet captures, and truss/DTrace clearly showed that smbd/idmapd were properly encrypting and signing all LDAP traffic (when LDAP signing was disabled anyways), and with AES too. So, what gives?

Well, in the process of debugging the problem I realized that I needed to look at the cleartext of otherwise encrypted LDAP protocol data. Normally the way one would do this is to build a special version of the relevant library (the libsasl "gssapi" plugin, in this case) that prints the relevant cleartext. But that's really obnoxious. There's got to be a better way!

Well, there is. I'd already done this sort of thing in the past when debugging other interop bugs related to the Solaris Kerberos stack, and I'd done it with DTrace.

Let's drill down the protocol stack. The LDAP clients in our case were using SASL/GSSAPI/Kerberos V5, with confidentiality protection "SASL security layers", for network security. After looking at some AD docs I quickly concluded that "LDAP signing" clearly meant just that. So the next step was to look at the SASL/GSSAPI part of that stack. The RFC (originally RFC2222 now RFC4752 says that after exchanging the GSS-API Kerberos V5 messages [RFC4121] that setup a shared security context (session keys, ...), the server sends a message to the client consisting of: a one-byte bitmask indicating what "security layers" the server supports (none, integrity protection, or confidentiality+integrity protection), and a 24 bit, network byte order maximum message size. But these four bytes are encrypted, so I couldn't just capture packets and dissect them. The first order of business, then, was to extract these four bytes somehow.

I resorted to DTrace. Since the data in question is in user-land, I had to resort to using copyin() and hand-coding pointer traversal. The relevant function, gss_unwrap(), takes a pointer to a gss_buffer_desc struct that points to the ciphertext, and a pointer to a another gss_buffer_desc where the pointer to the cleartext will be stored. The script:

#!/usr/sbin/dtrace -Fs

 \* If we establish a sec context, then the next unwrap
 \* is of interest.
        self->trace_unwrap = 1;

        self->trace_wrap = 1;

        /\* Trace the ciphertext \*/
        this->gss_wrapped_bufp = arg2;
        this->buflen = \*(unsigned int \*)copyin(this->gss_wrapped_bufp, 4);
        this->bufp = \*(unsigned int \*)copyin(this->gss_wrapped_bufp + 4, 4);
        this->buf = copyin(this->bufp, 32);
        tracemem(this->buf, 32);

        /\* Remember where the cleartext will go \*/
        self->gss_bufp = arg3;
        printf("unwrapped token will be in a gss_buffer_desc at %p\\n", arg3);
        this->gss_buf = copyin(self->gss_bufp, 8);
        tracemem(this->gss_buf, 8);
 \* Now grab the cleartext and print it.
/self->trace_unwrap && self->gss_bufp/
        this->gss_buf = copyin(self->gss_bufp, 8);
        tracemem(this->gss_buf, 8);
        this->buflen = \*(unsigned int \*)copyin(self->gss_bufp, 4);
        self->bufp = \*(unsigned int \*)copyin(self->gss_bufp + 4, 4);
        printf("\\nServer wrap token was %d bytes long; data at %p (%p)\\n",
                this->buflen, self->bufp, self->gss_bufp);
        this->buf = copyin(self->bufp, 4);
        self->trace_unwrap = 0;
        printf("Server wrap token data: %d\\n", \*(int \*)this->buf);
        tracemem(this->buf, 4);
 \* Do the same for the client's reply to the
 \* server's security layers and max message
 \* size negotiation offer.
        self->trace_wrap = 0;
        self->trace_unwrap = 0;
        this->gss_bufp = arg4;
        this->buflen = \*(unsigned int \*)copyin(this->gss_bufp, 4);
        this->bufp = \*(unsigned int \*)copyin(this->gss_bufp + 4, 4);
        this->buf = copyin(this->bufp, 4);
        printf("Client reply is %d bytes long: %d\\n", this->buflen,
                \*(int \*)this->buf);
        tracemem(this->buf, 4);

Armed with this script I could see that AD was offering all three security layer options, or only confidentiality protection, depending on whether LDAP signing was enabled. So far so good. The max message size offered was 10MB. 10MB! That's enormous, and fishy. I immediately suspected an endianness bug. 10MB in flipped around would be... 40KB, which makes much more sense -- our client's default is 64KB. And what is 64KB interpreted as? All possible interpretations will surely be non-sensical to AD: 16MB, 256, or 1 byte.

Armed with a hypothesis, I needed more evidence. DTrace helped yet again. This time I used copyout to change the client's response to the server's security layer and max message size negotiation message. And lo and behold, it worked. The script:

#!/usr/sbin/dtrace -wFs

        self->trace_unwrap = 0;
        printf("This script is an attempted workaround for a possible interop bug in Windows Active Directory: if LDAP signing and s
ealing is enabled and idmapd fails to connect normally but succeeds when this script is used, then AD has an endianness interop bug 
in its SASL/GSSAPI implementation\\n");

 \* We're looking to modify the SASL/GSSAPI client security layer and max
 \* buffer selection.  That happens in the first wrap token sent after
 \* establishing a sec context.
        self->trace_unwrap = 1;

/\* This is that call to gss_wrap() \*/
        self->trace_wrap = 0;
        self->trace_wrap = 0;
        self->trace_unwrap = 0;
        this->gss_bufp = arg4;
        this->buflen = \*(unsigned int \*)copyin(this->gss_bufp, 4);
        this->bufp = \*(unsigned int \*)copyin(this->gss_bufp + 4, 4);
        this->sec_layer = \*(char \*)copyin(this->bufp, 1);
        this->maxbuf_msb = (char \*)copyin(this->bufp + 1, 1);
        this->maxbuf_mid = (char \*)copyin(this->bufp + 2, 1);
        this->maxbuf_lsb = (char \*)copyin(this->bufp + 3, 1);

        printf("The client's wants to select: sec_layer = %d, max buffer = %d\\n",
                \*this->maxbuf_msb << 16 +
                \*this->maxbuf_mid << 8  +

        /\* Now fix it so it matches what we've seen AD advertise \*/
        \*this->maxbuf_msb = 0xa0;
        \*this->maxbuf_mid = 0;
        \*this->maxbuf_lsb = 0;
        copyout(this->maxbuf_msb, this->bufp + 1, 1);
        copyout(this->maxbuf_mid, this->bufp + 2, 1);
        copyout(this->maxbuf_lsb, this->bufp + 3, 1);
        printf("Modified the client's SASL/GSSAPI max buffer selection\\n");

 \* These wrap tokens will be for the security layer -- if we see these
 \* then idmapd and AD are happy together
        printf("It worked!  AD has an endianness interop bug in its SASL/GSSAPI implementation -- tell them to read RFC4752\\n");

Yes, DTrace is unwieldy when dealing with user-land C data (and no doubt it's even more so for high level language data). But it does the job!

Separately from the endianness issue, AD also misinterprets the security layers bitmask. The RFC is clear, in my opinion, though it takes careful reading (so maybe it's "clear"), that this bitmask is a mask of one, two or three bits set when sent by the server, but a single bit when sent by the client. It's also clear, if one follows the chain of documents, that "confidentiality protection" means "confidentiality _and_ integrity protection" in this context (again, perhaps I should say "clear"). The real problem is that the RFC is written in English, not in English-technicalese, saying this about the bitmask sent by the server:

              The client passes this token to GSS_Unwrap and interprets
   the first octet of resulting cleartext as a bit-mask specifying the
   security layers supported by the server and the second through fourth
   octets as the maximum size output_message to send to the server.

and this about the bitmask sent by the client:

   client then constructs data, with the first octet containing the
   bit-mask specifying the selected security layer, the second through
   fourth octets containing in network byte order the maximum size
   output_message the client is able to receive, and the remaining
   octets containing the authorization identity.

Note that "security layers" is plural in the first case, singular in the second.

Note too that for GSS-API mechanisms GSS_Wrap/Unwrap() always do integrity protection -- only confidentiality protection is optional. But RFCs 2222/4752 say nothing of this, so that only an expert in the GSS-API would have known this. AD expects the client to send 0x06 as the bitmask when the server is configured to require LDAP signing and sealing. Makes sense: 0x04 is "confidentiality protection" ("sealing") and 0x02 is "integrity protection" ("signing"). But other implementations would be free to consider that an error, which means that we have an interesting interop problem... And, given the weak language of RFCs 2222/4752, this mistake seems entirely reasonable, even if it is very unfortunate.

Friday Dec 12, 2008

Automated Porting Difficulties: Run-time failures in roboported FOSS

As I explained in my previous blog entry, I'm working on a project whose goal is to automate the process of finding, building and integrating FOSS into OpenSolaris so as to populate our /pending and /contrib (and eventually /dev) IPS package repositories with as much useful FOSS as possible.

We've not done a good job of tracking build failures due to missing interfaces in OpenSolaris, though in the next round of porting we intend to track and investigate build failures. But when we tested candidate packages for /contrib we did run into run-time failures that were due to differences between Linux and Solaris. These we mostly due to:

  1. FOSS expected a Linux-style /proc
  2. CLI conflicts

The first of those was shocking at first, but I quickly remembered: the Linux /proc interfaces are text-based, thus no headers are needed in order to build programs that use /proc. Applications targeting the Solaris /proc could not possibly build on Linux (aside from cross-compilation targeting Solaris, of course): the necessary header, <procfs.h>, would not exist, therefore compilation would break.

Dealing with Linux /proc applications is going to be interesting. Even detecting them is going to be interesting, since they could be simple shell/Python/whatever scripts: simply grepping for "/proc" && !"procfs.h" will surely result in many false positives requiring manual investigation.

The second common run-time failure mode is also difficult to detect a priori, but I think we can at least deal with it automatically. The incompatible CLIs problems results in errors like:

Usage: grep -hblcnsviw pattern file . . .

when running FOSS that expected GNU grep, for example. Other common cases include ls(1), ifconfig(1M), etcetera.

Fortunately OpenSolaris already has a way to get Linux-compatible command-line environments: just put /usr/gnu/bin before /usr/bin in your PATH. Unfortunately that's also not an option here because some programs will expect a Solaris CLI and others will expect a Linux CLI.

But fortunately, once again, I think there's an obvious way to select which CLI environment to use (Solaris vs. Linux) on a per-executable basis (at least for ELF executables): link in an interposer on the exec(2) family of functions, and have the interposer ensure that the correct preference of /usr/gnu/bin or /bin is chosen. Of course, this will be a simple solution only in the case of programs that compile into ELF, and not remotely as simple, perhaps not even feasible for scripts of any kind.

I haven't yet tried the interposer approach for the CLI preference problem, but I will, and I'm reasonably certain that it will work. I'm not as optimistic about the /proc problem; right now I've no good ideas about how to handle the /proc problem, short of manually porting the applications in question or choosing to not package them for OpenSolaris at all until the upstream communities add support for the Solaris /proc. I.e., the /proc problem is very interesting.

Wednesday Dec 10, 2008

Massively porting FOSS for OpenSolaris 2008.11 /pending and /contrib repositories

Today is the official release of OpenSolaris 2008.11, including commercial support.

Along with OpenSolaris 2008.11 we're also publishing new repositories full of various open source software built and packaged for OpenSolaris:

  • A pending repository with 1,708 FOSS pkgs today, and many more coming. This is "pending" in that we want to promote the packages in it to the contrib repository.
  • A contrib repository with 154 FOSS pkgs today, and many more coming soon.

These packages came from two related OpenSolaris projects in the OpenSolaris software porters community:

The two projects focus on different goals. Here I describe the work that we did on the PkgFactory/Roboporter project. Our primary goal is to port and package FOSS to OpenSolaris as quickly as possible. We do not yet focus very much on proper integration with OpenSolaris, such as making sure that the FOSS we package is properly integrated with RBAC, SMF, Solaris audit facilities, with manpages placed in the correct sections, etcetera, though we do intend to get to the point where we do get close enough to proper integration that the most valuable packages can then be polished off manually, put through the ARC and c-team processes, and pushed to the /dev repository.

Note, by the way, that the /pending and /contrib repositories are open to all contributors. The processes involved for contributing packages to these repositories are described in the SW Porters community pages, so if there's something you'd like to make sure that your favorite FOSS is included you can always do it yourself!

The 154 packages in /contrib are a representative subset of the 1,708 packages in /pending, which in turn are a representative subset of some 10,000 FOSS pkgs that we had in an project-private repository. That's right, 10,000, which we built in a matter of just a few weeks. [NOTE: Most, but not all of the 1,708 packages in /pending and 154 in /contrib came from the pkgfactory project.]

The project began with Doug Leavitt doing incredible automation of: a) searching for and downloading spec files from SFE and similar from Ubuntu and other Linux packaging repositories, b) building them on Solaris. (b) is particularly interesting, but I'll let Doug blog about that. With Doug's efforts we had over 12,000 packages in a project-private IPS repository, and the next step was to clean things up, cut the list down to something that we could reasonably test and push to /pending and /contrib. That's where Baban Kenkre and I jumped in.

To come up with that 1,704 package list we first removed all the Perl5 CPAN stuff from the list of 12,000, then we wrote a utility to look for conflicts between our repository, the Solaris WOS and OpenSolaris. It turned out we had many conflicts even withing our own repository (some 2,000 pkgs were removed as a result, if I remember correctly, after removing the Perl5 packages). Then we got down and dirty and did as much [very light-weight] testing as we could.

What's really interesting here is that the tool we wrote to look for conflicts turned out to be really useful in general. That's because it loads package information from our project's repo, the SVR4 Solaris WOS and OpenSolaris into a SQLite3 database, and analyzes the data to some degree. What's really useful about this is that with little knowledge of SQL we did many ad-hoc queries that helped a lot when it came to whittling down our package list and testing. For example: getting a list of all executables in /bin and /usr/sbin that are delivered by our package factory and which have manpages, was trivial, and quite useful (because then I could read the manpages in one terminal and try the executables in another, which made the process of light-weight testing much faster than it would have otherwise been). We did lots of ad-hoc queries against this little database, the kinds of queries that without a database would have required significantly more scripting; SQL is a very powerful language!

That's it for now. We'll blog more later. In the meantime, check out the /pending and /contrib repositories. We hope you're pleased. And keep in mind that what you see there is mostly result of just a few weeks of the PkgFactory project work, so you can expect: a) higher quality as we improve our integration techniques and tools, and b) more, many, many more packages as we move forward. Our two projects' ultimate goal is to package for OpenSolaris all of the useful, redistributable FOSS that you can find on Sourceforge and other places.


I'm an engineer at Oracle (erstwhile Sun), where I've been since 2002, working on Sun_SSH, Solaris Kerberos, Active Directory interoperability, Lustre, and misc. other things.


« August 2016