Tuesday Apr 29, 2014

Using /etc/system.d rather than /etc/system to package your Solaris kernel config

The request for an easy way to package Solaris kernel configuration (/etc/system basically) came up both via the Solaris Customer Advisory Board meetings and requests from customers with early access to Solaris 11.2 via the Platinum Customer Program.  I also had another fix for the Solaris Cryptographic Framework that I needed to implement to stop cryptoadm(1M) from writing to /etc/system (some of the background to what that is needed is in my recent blog post about FIPS 140-2).

So /etc/system.d was born.  My initial plan for the implementation was to read the "fragment" files directly from the kernel. However that is very complex to do at the time we need to read these; since it happens (in kernel boot time scales) eons before we have the root file system mounted. We can however read from a well known file name that is in the boot archive.

The way I ended up implementing this is that during boot archive creation (either manually running 'bootadm update-archive' or as a result of BE or packaging operations or just a system reboot) we assemble together the content of /etc/system.d into a single well known /etc/system.d/.self-assembly (but considered a Private interface) file.  We read the files in /etc/system.d/ in C locale collation order and ignore all files that start with a "." character, this ensures that the assembly is predictable and consistent across all systems.

I then had too choose wither /etc/system.d or /etc/system "wins" if a variable happens to get set in both.  The decision was that /etc/system is read second and thus wins, this preserves existing behaviours. 

I also enhanced the diagnostic output from when the system file parser detects duplication so that we could indicate which file it was that caused the issue. When bootadm creates the .self-assembly file it includes START/END comment markers so that you will be able to easily determine which file from /etc/system.d delivered a given setting.

So now you can much more easily deliver any Solaris kernel customisations you need by using IPS to deliver fragments (one line or  many) into /etc/system.d/ instead of attempting to modify /etc/system via first boot SMF services or other scripting.  This also means they apply on first boot of the image after install as well. 

So how do I pick which file name in /etc/system.d/ to use so that it doesn't clash with other people ? The recommendation (which will be documented in the man pages and in /etc/system itself) is to use the full name of the IPS package (with '/' replaced by ':' ) as the prefix or name of any files you deliver to /etc/system.

As part of the same change I updated cryptoadm(1M) and dtrace(1M) to no longer write to /etc/system but instead write to files in /etc/system.d/ and I followed my own advice on file naming!

Information on how to get the Solaris 11.2 Beta is available from this OTN page.

Note that this particular change came in after the Solaris 11.2 Beta build was closed so you won't see this in Solaris 11.2 Beta (which is build 37).

Solaris 11.2 Compliance Framework

During the Solaris 11 launch (November 2011) one of the questions I was asked from the audience was from a retail customer asking for documentation on how to configure Solaris to pass a PCI-DSS audit.  At that time we didn't have anything beyond saying that Solaris was secure by default and it was no longer necessary to run the Solaris Security Toolkit to get there.  Since then we have produced a PCI-DSS white paper with Coalfire (a PCI-DSS QSA) and we have invested a significant amount of work in building a new Compliance Framework and making compliance a "lifestyle" feature in Solaris core development.

We delievered OpenSCAP in Solaris 11.1 since SCAP is the foundation language of how we will provide compliance reporting. So I'm please to be able to finally talk about the first really signficant part of the Solaris compliance infrastruture which is part of Solaris 11.2.

Starting with Solaris 11.2 we have a new command compliance(1M) for running system assements against security/compliance benchmarks and for generating html reports from those.  For now this only works on a single host but the team hard at work adding multi-node support (using the Solaris RAD infrastructure) for a future release.

The much more signficant part of what the compliance team has been working on is "content".  A framework without any content is just a new "box of bits, lots of assembly required" and that doesn't meet the needs of busy Solaris administrators.  So starting with Solaris 11.2 we are delivering our interpretation of important security/compliance standards such as PCI-DSS.  We have also provided two Oracle authored policies: 'Solaris Baseline' and 'Solaris Recommended', a freshly installed system should be getting all passes on the Baseline benchmark.  The checks in the Recommended benchmark are those that are a little more controversial and/or take longer to run.

Lets dive in and generate an assesment and report from one of the Solaris 11.2 compliance benchmarks we provide:

# pkg install security/compliance 
# compliance assess
# compliance report

That will give us an html report that we can then view.  Since we didn't give any compliance benchmark name it defaults to 'Solaris Baseline', so now lets install and run the PCI-DSS benchmark. The 'security/compliance' package has a group dependency for 'security/compliance/benchmark/pci-dss' so it will be installed already but if you don't want it you can remove that benchmark and keep the others and the infrastructure.

# compliance assess -b pci-dss
Assessment will be named 'pci-dss.Solaris_PCI-DSS.2014-04-14,16:39'
# compliance report -a pci-dss.Solaris_PCI-DSS.2014-04-14,16:39

If we want the report to only show those tests that failed we can do that like this:

# compliance report -s fail -a pci-dss.Solaris_PCI-DSS.2014-04-14,16:39

We understand that many of your Solaris systems won't match up exactly to the benchmarks we have provided and as a result we have delivered the content in a way that you can customise it. Over time the ability to build custom benchmarks from the checks we provide will be come part of the compliance(1M) command but for now you can enable/disable checks by editing a copy of the XML files. Yes I know many of you don't like XML but this time it isn't too scary for just this part, crafting a whole check from scratch is hard though but that is the SCAP/XCCDF/OVAL language for you!.

So for now here is the harder than it should be way to customise one of the delivered benchmarks, using the PCI-DSS benchmark as an example:

# cd /usr/lib/compliance/benchmarks
# mkdir example
# cd example
# cp ../pci-dss/pci-dss-xccdf.xml example-xccdf.xml
# ln -s ../../tests
# ln -s example-xccdf.xml xccdf.xml

# vi example-xccdf.xml

In your editor you are looking for lines that look like this to enable or disable a given test:

<select idref="Test_1.2" selected="true" />

You probably also want to update these lines to indicate that it is your benchmark rather than the original we delivered.

<status date="2013-12-12">draft</status>
<title>Payment Card Industry Data Security Standard</title>

Once you have made the changes you want exit from your editor and run 'compliance list' and you should see your example benchmark listed, you can run run assesments and generate reports from that one just as above.  It is important you do this by making a copy of the xccdf.xml file otherwise the 'pkg verify' test is always going to fail and more importantly your changes would be lost on package update.

Note that we have plans to re-number these tests and provide a peristent unique identifier and namespace for each of the tests we deliver, it just didn't make the cut off for Solaris 11.2 release.  So for now note that the test numbering will very likely change in a future release.

I would really value feedback on the framework itself and probably even more importantly the actual compliance checks that our Solaris Baseline, Solaris Recommended, and PCI-DSS security benchmarks include.

Information on how to get the Solaris 11.2 Beta is available from this OTN page.

Wednesday Apr 16, 2014

Is FIPS 140-2 Actively harmful to software?

Solaris 11 recently completed a FIPS 140-2 validation for the kernel and userspace cryptographic frameworks.  This was  a huge amount of work for the teams and it is something I had been pushing for since before we wrote a single line of code for the cryptographic framework back in 2000 during its initial design for Solaris 10.

So you would imaging I'm happy right ?  Well not exactly, I'm glad I won't have to keep answering questions from customers as to why we don't have a FIPS 140-2 validation but I'm not happy with the process or what it has done to our code base.

FIPS 140-2 is an old standard that doesn't deal well with modern systems and especially doesn't fit nicely with software implementations.  It is very focused on standalone hardware devices, and plugin hardware security modules or similar physical devices.  My colleague Josh over in Oracle Seceval  has already posted a great article on why we only managed to get FIPS 140-2 @ level 1 instead of level 2.  So I'm not going to cover that but instead talk about some of the technical code changes we had to make inorder to "pass" our validation of FIPS 140-2.

There are two main parts to completing a FIPS 140-2 validation: the first part is CAVP  (Cryptographic Algorithm Validation Program) this is about proving your implementation of a given algorithm is correct using NIST assigned test vectors.  This part went relatively quickly and easily and has the potential to find bugs in crypto algorithms that otherwise appear to be working correctly.  The second part is CMVP (Cryptographic Module Validation Program), this part looks at the security model of the whole "FIPS 140-2 module", in our case we had a separate validation for kernel crypto framework and userspace crypto framework.

CMVP requires we draw boundary around the delivered software components that make up the FIPS 140-2 validation boundary - so files in the file system.  Ideally you want to keep this as small as possible so that non crypto relevant libraries and tools are not part of the FIPS 140-2 boundary. We certainly made some mistakes drawing our boundary in userspace since it was a little larger than it needed to be.  We ended up with some "utility" libraries inside the boundary, so good software engineering practice of factoring out code actually made our FIPS 140-2 boundary bigger.

Why does the FIPS 140-2 boundary matter ?  Well unlike in Common Criteria with flaw remediation in the FIPS 140-2 validation world you can't make any changes to the compiled binaries that make up the boundary without potentially invalidating the existing valiation. Which means having to go through some or all of the process again and importantly this cost real money and a significant amount of elapsed time. 

It isn't even possible to fix "obvious" bugs such as memory leaks, or even things that might lead to vulnerabilties without at least engaging with a validation lab.  This is bad for over all system security, after all isn't FIPS 140-2 supposed to be a security standard ?  I can see, with a bit of squinting, how this can maybe make some sense in a hardware module world but it doesn't make any sense for software.

We also had to add POST (Power On Self Test) code that runs known answer tests for all the FIPS 140-2 approved algorithms that are implemented inside the boundary at "startup time" and before any consumer outside of the framework can use the crypto interfaces. 

For our Kernerl framework we implemented this using the module init hooks and also leveraged the fact that the kcf module itself starts very early in boot (long before we even mount the root file system from inside the kernel).  Since kernel modules are generally only unloaded to be updated the impact of having to do this self test on every startup isn't a big deal.

However in userspace we were forced because of "Implementation Guidance", I'll get back to this later on why it isn't guidance, to do this on every process that directly or indirectly causes the cryptographic framework libaries to be loaded.  This is really bad and is counter to sensible software engineering practice. On general purpose modern operating systems (well anything from the last 15+ years really) like Solaris share library pages are mapped shared so the same readonly pages of code are being used by all the processes that start up.  So this just wastes CPU resources and causes performance problems for short lived processes.  We measured the impact this had on Solaris boot time and it was if I'm remebering correctly about a 9% increase in the time it takes to boot to multi-user. 

I've actually spoken with NIST about the "always on POST" and we tried hard to come up with an alternative solution but so far we can't seem to agree on a method that would allow this to be done just once at system boot and only do it again if the on disk binaries are acutally changed (which we can easily detect!).

Now lets combine these last two things, we had to add code that runs every time our libraries load and we can't make changes to the delivered binaries without possibly causing our validation to become invalid.  Solaris actually had a bug in some of the new FIPS 140-2 POST code in userspace that had a risk of a file descriptor leak (it wasn't something that was an exploitable security vulnerability and it was only one single fd) but we couldn't have changed that without revising which binaries were part of the FIPS 140-2 validation.  This is bad for customers that are forced by their governments or other security standards to run with FIPS 140-2 validated crypto modules, because sometimes they might have to miss out on critical fixes.

I promissed I'd get back to "Implementation Guidance", this is really aroundable way of updating the standard with new interpretations that often look to developers like whole new requirements (that we were supposed to magically know about) without the standard being revised.  While the approved validation labs to get pre-review of these new or updated IGs the impact for vendors is huge.   A module that passes FIPS 140-2 (which is a specific revision, the current one as of this time, of the standard) today might not pass FIPS 140-2 in the future - even if nothing was changed. 

In fact we are in potentially in this situation with Solaris 11.  We have completed and passed a FIPS 140-2 validation but due to changes in the Implementation Guidance we aren't sure we would be able to submit the identical code again and pass. So we may have to make changes just to pass FIPS 140-2 new or updated IGs that has no functional beneift to our customers. 

This has serious implications for software implementations of cryptographic modules.  I can understand that if we change any of the core crypto algorithm code we should re run the CAVP test vectors again - and in fact we do that internally using our test suite for all changes to the crypto framework anyway (our test suite is actually much more comprehensive than what FIPS 140 required), but not being able to make simple bug fixes or changes to non algorithm code is not good for software quality.

So what we do we do in Solaris ?  We make the bug fixes and and new non FIPS 140-2 relevant algorithms (such as Camellia) anyway because most of our customers don't care about FIPS 140-2 and even many of those that do they only care to "tick the box" that the vendor has completed the validation.

In Solaris the kernel and userland cryptographic frameworks always contain the FIPS 140-2 required code but it is only enabled if you run 'cryptoadm enable fips-140' .  This turns on the FIPS 140-2 POST checking and a few other runtime checks.

So should I run Solaris 11 with FIPS 140-2 mode enabled ?

My personal opinion is that unless you have a very hard requirement to do so I wouldn't - it is the same crypto algorithm and key management code you are running anyway but you won't have the pointless POST code running that will hurt the start up time of short lived processes. Now having said that my day to day Solaris workstation (which runs the latest bi weekly builds of the Solaris 12 development train) does actually run in FIPS 140-2 mode so that I can help detect any possible issues in the FIPS 140-2 mode of operating long before a code release gets to customers.  We also run our test suites with it enabled and disabled.

I really hope that when a revision to FIPS 140 finally does come around (it is already many years behind schedule) it will deal better with software implementations. When FIPS 140-3 was first in public review I sent on a lot of comments to it for that area.   I really hope that the FIPS 140 program can adopt a sensible approach to allowing vendors to provide bugfixes without having to redo validations - in particular it should not cost the vendor any time or money beyond what they normally do themselves.

In the mean time the Solaris Cryptographic Framework team are hard at work; fixing bugs, improving performance adding features new algorithms and (grudgingly) adding what we think will allow us to pass a future FIPS 140 validation based on the currently known IGs.

-- Darren




« April 2014 »