Wednesday Apr 16, 2014

Is FIPS 140-2 Actively harmful to software?

Solaris 11 recently completed a FIPS 140-2 validation for the kernel and userspace cryptographic frameworks.  This was  a huge amount of work for the teams and it is something I had been pushing for since before we wrote a single line of code for the cryptographic framework back in 2000 during its initial design for Solaris 10.

So you would imaging I'm happy right ?  Well not exactly, I'm glad I won't have to keep answering questions from customers as to why we don't have a FIPS 140-2 validation but I'm not happy with the process or what it has done to our code base.

FIPS 140-2 is an old standard that doesn't deal well with modern systems and especially doesn't fit nicely with software implementations.  It is very focused on standalone hardware devices, and plugin hardware security modules or similar physical devices.  My colleague Josh over in Oracle Seceval  has already posted a great article on why we only managed to get FIPS 140-2 @ level 1 instead of level 2.  So I'm not going to cover that but instead talk about some of the technical code changes we had to make inorder to "pass" our validation of FIPS 140-2.

There are two main parts to completing a FIPS 140-2 validation: the first part is CAVP  (Cryptographic Algorithm Validation Program) this is about proving your implementation of a given algorithm is correct using NIST assigned test vectors.  This part went relatively quickly and easily and has the potential to find bugs in crypto algorithms that otherwise appear to be working correctly.  The second part is CMVP (Cryptographic Module Validation Program), this part looks at the security model of the whole "FIPS 140-2 module", in our case we had a separate validation for kernel crypto framework and userspace crypto framework.

CMVP requires we draw boundary around the delivered software components that make up the FIPS 140-2 validation boundary - so files in the file system.  Ideally you want to keep this as small as possible so that non crypto relevant libraries and tools are not part of the FIPS 140-2 boundary. We certainly made some mistakes drawing our boundary in userspace since it was a little larger than it needed to be.  We ended up with some "utility" libraries inside the boundary, so good software engineering practice of factoring out code actually made our FIPS 140-2 boundary bigger.

Why does the FIPS 140-2 boundary matter ?  Well unlike in Common Criteria with flaw remediation in the FIPS 140-2 validation world you can't make any changes to the compiled binaries that make up the boundary without potentially invalidating the existing valiation. Which means having to go through some or all of the process again and importantly this cost real money and a significant amount of elapsed time. 

It isn't even possible to fix "obvious" bugs such as memory leaks, or even things that might lead to vulnerabilties without at least engaging with a validation lab.  This is bad for over all system security, after all isn't FIPS 140-2 supposed to be a security standard ?  I can see, with a bit of squinting, how this can maybe make some sense in a hardware module world but it doesn't make any sense for software.

We also had to add POST (Power On Self Test) code that runs known answer tests for all the FIPS 140-2 approved algorithms that are implemented inside the boundary at "startup time" and before any consumer outside of the framework can use the crypto interfaces. 

For our Kernerl framework we implemented this using the module init hooks and also leveraged the fact that the kcf module itself starts very early in boot (long before we even mount the root file system from inside the kernel).  Since kernel modules are generally only unloaded to be updated the impact of having to do this self test on every startup isn't a big deal.

However in userspace we were forced because of "Implementation Guidance", I'll get back to this later on why it isn't guidance, to do this on every process that directly or indirectly causes the cryptographic framework libaries to be loaded.  This is really bad and is counter to sensible software engineering practice. On general purpose modern operating systems (well anything from the last 15+ years really) like Solaris share library pages are mapped shared so the same readonly pages of code are being used by all the processes that start up.  So this just wastes CPU resources and causes performance problems for short lived processes.  We measured the impact this had on Solaris boot time and it was if I'm remebering correctly about a 9% increase in the time it takes to boot to multi-user. 

I've actually spoken with NIST about the "always on POST" and we tried hard to come up with an alternative solution but so far we can't seem to agree on a method that would allow this to be done just once at system boot and only do it again if the on disk binaries are acutally changed (which we can easily detect!).

Now lets combine these last two things, we had to add code that runs every time our libraries load and we can't make changes to the delivered binaries without possibly causing our validation to become invalid.  Solaris actually had a bug in some of the new FIPS 140-2 POST code in userspace that had a risk of a file descriptor leak (it wasn't something that was an exploitable security vulnerability and it was only one single fd) but we couldn't have changed that without revising which binaries were part of the FIPS 140-2 validation.  This is bad for customers that are forced by their governments or other security standards to run with FIPS 140-2 validated crypto modules, because sometimes they might have to miss out on critical fixes.

I promissed I'd get back to "Implementation Guidance", this is really aroundable way of updating the standard with new interpretations that often look to developers like whole new requirements (that we were supposed to magically know about) without the standard being revised.  While the approved validation labs to get pre-review of these new or updated IGs the impact for vendors is huge.   A module that passes FIPS 140-2 (which is a specific revision, the current one as of this time, of the standard) today might not pass FIPS 140-2 in the future - even if nothing was changed. 

In fact we are in potentially in this situation with Solaris 11.  We have completed and passed a FIPS 140-2 validation but due to changes in the Implementation Guidance we aren't sure we would be able to submit the identical code again and pass. So we may have to make changes just to pass FIPS 140-2 new or updated IGs that has no functional beneift to our customers. 

This has serious implications for software implementations of cryptographic modules.  I can understand that if we change any of the core crypto algorithm code we should re run the CAVP test vectors again - and in fact we do that internally using our test suite for all changes to the crypto framework anyway (our test suite is actually much more comprehensive than what FIPS 140 required), but not being able to make simple bug fixes or changes to non algorithm code is not good for software quality.

So what we do we do in Solaris ?  We make the bug fixes and and new non FIPS 140-2 relevant algorithms (such as Camellia) anyway because most of our customers don't care about FIPS 140-2 and even many of those that do they only care to "tick the box" that the vendor has completed the validation.

In Solaris the kernel and userland cryptographic frameworks always contain the FIPS 140-2 required code but it is only enabled if you run 'cryptoadm enable fips-140' .  This turns on the FIPS 140-2 POST checking and a few other runtime checks.

So should I run Solaris 11 with FIPS 140-2 mode enabled ?

My personal opinion is that unless you have a very hard requirement to do so I wouldn't - it is the same crypto algorithm and key management code you are running anyway but you won't have the pointless POST code running that will hurt the start up time of short lived processes. Now having said that my day to day Solaris workstation (which runs the latest bi weekly builds of the Solaris 12 development train) does actually run in FIPS 140-2 mode so that I can help detect any possible issues in the FIPS 140-2 mode of operating long before a code release gets to customers.  We also run our test suites with it enabled and disabled.

I really hope that when a revision to FIPS 140 finally does come around (it is already many years behind schedule) it will deal better with software implementations. When FIPS 140-3 was first in public review I sent on a lot of comments to it for that area.   I really hope that the FIPS 140 program can adopt a sensible approach to allowing vendors to provide bugfixes without having to redo validations - in particular it should not cost the vendor any time or money beyond what they normally do themselves.

In the mean time the Solaris Cryptographic Framework team are hard at work; fixing bugs, improving performance adding features new algorithms and (grudgingly) adding what we think will allow us to pass a future FIPS 140 validation based on the currently known IGs.

-- Darren

Wednesday Aug 24, 2011

Storing the initial (encryption) key in SMF on Solaris 11

When deploying services that use encryption as part of their network protocols it is often desirable to protect the long term keys that these services use.  For example a web server serving up TLS needs an RSA private key, similarly so do most SSH servers and IKE daemons.  Developers, administrators and security officers have to make an appropriate trade off between the requirements of unattended reboot (for high availability with or without clustering) and securing the long term key material from unauthorised access.

Even when using a Hardware Security Module (HSM) such as the CA-6000 card to store the RSA private keys there is still a requirement for an "initial" key, in the case of devices like the CA-6000 this is the PKCS#11 token PIN (passphrase) that has to be provided to get access to the sensitive and non extractable keys stored in it.

So were do we get this "initial" key from ?  We can't prompt for it because we want unattended restart of the services (or even whole machine).

This is where the compromises come in, it isn't unusual to store this PIN/passphrase in a root owned and access protected (usually 0600) file in the root file system.  However that then means that the service must run as the same uid as the PIN file or run with privilege (say file_dac_read) necessary to read it which might not always be appropriate.

It is possible to use the Solaris Service Management Framework (SMF) as storage for these initial keys and provide access only to authorised users for updating them and authorised services for reading them.  This allows us to easily implement a separation of duty between the user id the service runs as and administrators that can manage the initial key but without requiring them to be root or even run any program that runs with privilege or as the same user the service runs as.

This is possible because the SMF repository is backed by an sqlite database that is stored in the filesystem owned and readable/writable only by root.  All access to its contents is mediated by the svc.configd daemon.  Normally all properties of a given service are readable by any user (because they aren't really sensitive) but writable only by users with the specified value_authorization (which can be specified to the granularity of an individual property group).  SMF also provides the ability to require a property specific read_authorization in order to be able to view the value of a property.  We can use this functionality of SMF as an alternate to the PIN/passphrase in a file model.  The risks a similar in that the PIN/passphrase is still stored in the root file sytem but we can now have it stored in the root owned SMF database but allow very fine grained access from unprivileged processes (no need to have the file_dac_read privilege or run with uid root), so there is an additional little bit of protection via least privilege.

For example the following SMF manifest fragment shows a property group called config that requires a different authorisation for reading and updating the property 'pin':

        <property_group name='config' type='application'>
                <propval name='pin' type='astring'
                        value='1234' />
                <propval name='read_authorization' type='astring'
                        value='solaris.smf.value.myserv.readpin' />
                <propval name='value_authorization' type='astring'
                        value='solaris.smf.value.myserv.writepin' />

If this were part of a service that ran as the user 'myserv' then we would need to assign the solaris.smf.value.myserv.readpin  authorisation to that user (using useradd(1M)), any user that doesn't have that authorisation (or a higher level one eg solaris.smf.value)  will get permission denied on attempting to read the property. That would allow the processes of the service to read but not change the stored PIN without requiring any privileges(5) to do so, and the pin value remains in a root 0600 file (the SMF repository) .  To change the PIN value we may have an admin account (or role) that runs as a different user that has the solaris.smf.value.myserv.writepin.  The following shell script fragment shows how to get the value of the property in a method script:

svcprop -p config/pin $SMF_FMRI

What the service does with the pin or initial key is upto the service developer but it could for example be the value passed to C_Login() to get access to sensitive keys stored in a CA-6000 card or even those stored encrypted on disk with the pkcs11_softtoken(5) keystore.

The SMF repository database is not currently encrypted on disk so a user that can read it can still run strings over it or use sqlite to read the raw database.  If we were to encrypt the SMF repository on disk (using an Sqlite extension since it isn't supported natively) we of course now again have an "inital key" problem.  So were do we get that key from ? One possible place is from something like the TPM (Trusted Platform Module), however accessing the TPM currently requires the tscd service to be running so we have a start up ordering problem so we couldn't currently attempt to do that.  Also we can't encrypt the root file system with ZFS (the SMF repository lives in /etc/svc/repository.db) due to similar "initial key" issues.

There are alternate ways to address the risks, for example using a network accessible keystore that hosts have to connect to and get their long term keys from (the assumption here is the server can adequately authenticate the client - but where does the client securely store that initial credential?). Solaris has provides the pkcs11_kms(5) module for getting AES keys from the Oracle Key Manager product, but using that requires a C_Login() with a PIN since the client authentication credentials used to authenticate the client to the Oracle Key Manager server instance are stored encrypted on the client.

So have we really "solved" the problem ?  Not really, but we have provided an alternative method of locally storing the initial key/pin/passphrase that for some services under the control of SMF may be more appropriate than "initial key in a file".


Tuesday Nov 16, 2010

Choosing a value for the ZFS encryption property

The 'on' value for the ZFS encryption property maps to 'aes-128-ccm', because it is the fastest of the 6 available modes of encryption currently provided and is believed to provide sufficient security for many deployments.  Depending on the filesystem/zvol workload you may not be able to notice (or care if you do notice) the difference between the AES key lengths and modes.  However note that at this time I believe the collective wisdom in the cryptography community appears to be to recommend AES128 over AES256. [Note that this is not a statement of Oracle's endorsement or verification of that research].

Both CCM and GCM are provided so that if one turns out to have flaws, and modes of an encryption algorithm some times do have flaws independent of the base algorithm, hopefully the other will still be available for use safely.

On systems without hardware/cpu support for Galios multiplication (for example Intel Westmere  or SPARC T3) GCM will be slower because the Galios field multiplication has to happen in software without any hardware/cpu assist.  However depending on your workload you might not even notice the difference between CCM and GCM.

One reason you may want to select aes-128-gcm rather than aes-128-ccm is that GCM is one of the modes for AES in NSA Suite B but CCM is not.

ZFS encryption was designed and implemented to be extensible to new algorithm/mode combinations for data encryption and key wrapping.

Are there symmetric algorithms, for data encryption, other than AES that are of interest?

The wrapping key algorithm currently matches the data encryption key algorithm, is there interest in providing different wrapping key algorithms and configuration properties for selecting which one ? For example doing key wrapping with an RSA keypair/certificate ?  

[Note this is not a commitment from Oracle to implementing/providing any suggested additions in any release of any product but if there are others of interest we would like to know so they can be considered.]


Darren Moffat-Oracle


« July 2016