Tuesday Jul 07, 2015

Solaris new system calls: getentropy(2) and getrandom(2)

The traditional UNIX/Linux method of getting random numbers (bit streams really) from the kernel to a user space process was to open(2) the /dev/random or /dev/urandom pseudo devices and read(2) an appropriate amount from it, remembering to close(2) the file descriptor if your application or library cached it.  Solaris 11.3 adds two new system calls, getrandom(2) and getentropy(2), for getting random bit streams or raw entropy. These are compatible with APIs recently introduced in OpenBSD and Linux.

OpenBSD introduced a getentropy(2) system call that reads a maximum of 256 bytes from the kernel entropy pool and returns it to user space for use in user space random number generators (such as in the OpenSSL library). The getentropy(2) system call returns 0 on success and always returns the amount of data requested or fails completely, setting errno.  It is an error to request more than 256 bytes of entropy and doing so causes errno to be set to EIO. 

On Solaris the output of getentropy(2) is entropy and should not be used where randomness is needed, in particular it must not be used where an IV or nonce is needed when calling a cryptographic operation.  It is intended only for seeding a user space RBG (Random Bit Generator) system. More specifically the data returned by getentropy(2) has not had the required FIPS 140-2 processing for the DRBG applied to it.

Recent Linux kernels have a getrandom(2) system call that reads between 1 and 1024 bytes of randomness from the kernel.  Unlike getentropy(2) it is intended to be used directly for cryptographic use cases such as an IV or nonce.  The getrandom(2) call can be told whether to use the kernel pool usually used for /dev/random or the one for /dev/urandom by using the GRND_RANDOM flag to request the former. If GRND_RANDOM is specified then getrandom(2) will block until sufficient randomness can be generated if the pool is low, if non blocking behaviour is specified then the GRND_NONBLOCK flag can be passed. 

#include <sys/random.h>
int getrandom(void *buf, size_t buflen, unsigned int flags);
int getentropy(void *buf, size_t buflen);

On Solaris if GRND_RANDOM is not specified then getrandom(2) is always a non blocking call. Note that this differs slightly from Linux but not in a way that impacts its usage.  The other difference is that on Solaris getrandom(2) will either fail completely or will return a buffer filled with the requested size, where as the Linux implementation can return partial buffers.  In order to ensure code portability developers must check the return value of getrandom(2) every time it is called, for example:

#include <sys/random.h>
#include <stdlib.h>
size_t bufsz = 128;
char *buf;
int ret;
buf = malloc(bufsz);
errno = 0;
ret = getrandom(buf, bufsz, GRND_RANDOM);
if (ret < 0 || ret != bufsz) {
    perror("getrandom failed");

The output of getrandom(2) on Solaris has been through a FIPS 140-2 approved DRBG function as defined in NIST SP-900-90A.

In addition to the above to system calls the OpenBSD arc4random(3C), arc4random_buf(3C) and arc4random_uniform(3C) functions are also now provided from libc, these are available by including <stdlib.h>.

OpenSSH in Solaris 11.3

Solaris 9 was the first release where we included an implementation of the IETF SSH client and server protocols, I led that project and at the time I was also the document editor for the IETF standards documents. We started with OpenSSH but for various reasons it ended up over time being a Solaris specific fork called SunSSH. We have regularly resynced for features and bug fixes with OpenSSH, but SunSSH remains a fork.

Starting with Solaris 11.3 we supply OpenSSH in addition to SunSSH.  The intent is that in some future release SunSSH will be removed leaving only OpenSSH. 

For Solaris 11.3 both OpenSSH and SunSSH can be installed on a machine at the same time or the administrator can choose to install only one.  SunSSH is delivered by pkg:/network/ssh and OpenSSH by pkg:/network/openssh.

Both packages effectively deliver the same svc:/network/ssh:default, really it comes from pkg:/network/ssh-common. When both OpenSSH and SunSSH are installed an IPS package mediator is used to select which one is run by the SMF service and which one is /usr/bin/ssh.  A system with OpenSSH set as the default would look like this:

$ pkg mediator ssh
ssh          system            local      openssh

Our intent is that the OpenSSH delivered in Solaris has as few Solaris-specific changes applied as possible. We have managed to push some bug fixes upstream to the OpenSSH community but there are still some Solaris-specific changes for enhancements we felt were important to customers migrating from SunSSH. These Solaris-specific changes are applied to OpenSSH during the build process using patch(1) and are thus maintained in a directory called patches Some of the patches are purely build related and some are features. The current list of feature patches include:

  • GSS credential storage
  • PAM Service Name per SSH userauth method as per SunSSH
  • PAM can not be disabled with the UsePAM option
  • DisableBanner option for ssh client: see ssh_config(4)/li>

While the intent going forward is to keep up with the OpenSSH releases we may choose to backport a fix from a later version of OpenSSH to fix a bug or security vulnerability rather than delivering the whole release.  Often the reasons for doing this will be releated to the available Solaris release train at the time and the size of change in the later release. This means that the version of OpenSSH could change in a Solaris SRU.

The OpenSSH releases also include some very useful features that we hadn't ported over to SunSSH. The pkg mediator allows for selecting which binaries are the system default but it doesn't help with the per user configuration file. In order to use features from OpenSSH when the users home directory may also be used by a SunSSH client use of the options to ignore unknown options is needed.  This is a feature that originated in SunSSH but when the equivalent feature arrived in OpenSSH the option name was different.  To over come this I have the following in my ~/.ssh/config file:

IgnoreUnknown IgnoreIfUnknown
IgnoreIfUnknown IgnoreUnknown,ControlMaster,ControlPersist,ControlPath

That allows me to have Host blocks that use ControlMaster configuration that OpenSSH knows about but SunSSH doesn't and ensures that neither of them complains about unknown options. Some of the other important differences between SunSSH and OpenSSH are:

UseOpenSSLEngine No replacement, OpenSSL defaults to using Hardware acceleration on modern SPARC and Intel CPUs
MaxAuthTriesLog OpenSSH always logs at MaxAuthTries / 2
PreUserAuthHook Use AuthorizedKeysCommand instead.
IgnoreIfUknown IgnoreUnknown
Message Localisation Client and Server messages are no longer localised.
Use of /etc/default/login No replacement, set policy using PAM and sshd_config.

Customising Solaris Compliance Policies

When we introduced the compliance framework in Solaris 11.2 there was no easy way to customise (tailor) the policies to suit individual machine or site deployment needs. While it was certainly possible for users familiar with the XCCDF/OVAL policy language it wasn't easy to do in away that preserved your customisations while still allowing access to new and policy rules when the system was updated.

To address this a new subcommand for compliance(1M) has been added that allows creation of a tailoring.  The initial release of tailoring in Solaris 11.3 allows the enabling and disabling of individual checks, and the team is already working on enhancing it to support variables in a future release.

The default and simplest way of using 'compliance tailor' is use the interactive pick tool:

# compliance tailor -t mysite
*** compliance tailor: No existing tailoring 'mysite', initializing
tailoring:mysite> pick

The above shows the interactive mode where using 'x' or 'space' allows us to enable or disable an individual test.  Note also that since the Solaris 11.2 release all the tests have been renumbered and now have unique rule identifiers that are stable across releases of Solaris.  The same rule number always refers to the same test in all of the security benchmark policy files delivered with Solaris.

When exiting from the interactive pick mode just type 'commit' to write this out to a locally installed tailoring; that will create an XCCDF tailoring file under /var/share/compliance/tailorings.  Those tailoring files should not be copied from release to release.

There is also an 'export' action for the tailoring subcommand that allows you to save off your customisations for importing into a different system, this works similarly to zonecfg(1M) export.

$ compliance tailor -t mysite export | tee /tmp/mysite.out
set tailoring=mysite
# version=2015-06-29T14:16:34.000+00:00
set benchmark=solaris
set profile=Baseline
# OSC-16005: All local filesystems are ZFS
exclude OSC-16005
# OSC-15000: Find and list files with extended attributes
include OSC-15000
# OSC-35000: /etc/motd and /etc/issue contain appropriate policy text
include OSC-35000

The saved command file can then be used for input redirection to create the same tailoring on another system.

To run an assessment of the system using a tailoring we simply need to do this:

# compliance assess -t mysite
Assessment will be named 'mysite.2015-06-29,15:22'
Title   Package integrity is verified
Rule    OSC-54005
Result  PASS



Thursday Nov 27, 2014

CVE metadata in Solaris IPS packages for improved Compliance reporting

I'm pleased to announce that the Solaris 11 /support repository now contains metadata for tracking security vulnerability fixes by the assigned CVE number. This was a project that Pete Dennis and I have been working on for some time now.  While we worked together on the design of the metadata and how it should be presented I'm completely indebted to Pete for doing the automation of the package generation from the data in our bug databases - we don't want humans to generate this by hand because it is error prone.

What does this mean for you the Solaris system admin or a compliance auditor  ?

The intent of this is to make it much easier to determine if your system has all the known and required security vulnerability fixes.  This should stop you from attempting to derive this information based on other sources. This is critically important since sometimes in Solaris we choose to fix a bug in an upstream FOSS component by applying the code patch rather than consuming a whole new version - this is a risk/benefit trade off that we take on a case by case basis.  While all this data has been available for sometime in My Oracle Support documents it wasn't easily available in a scriptable form - since it was intended to be read by humans.

The implemenation of this is designed to be applied to any other Oracle, or 3rd party, products that are also delivered in IPS form. 

For Solaris we have added a new metadata package called:  pkg:/support/critical-patch-update/solaris-11-cpu. This new package lives above entire in the dependency hiearchy and contains only metadata that maps the CVE-IDs to the package version that contains the fix.  This allows us to retrospectively update the critical-patch-update metadata where an already shipping version contains the fix for a given CVE.

Each time we publish a new critical patch update a new version of the package is published to the /support repository along with all the new versions of the packages that contain the fixes.  The versioning for this package is @YYYY.MM.VV where VV is usually going to be '1' but this allows us to respin/republish a given critical patch update with in the same month, note the VV is not DD it is NOT the day in the month.

We can search this data using either the web interface on https://pkg.oracle.com/solaris/support or using the CLI.  So lets look at some examples of the CLI searching.  Say we want to find out which packages contain the fix for the bash(1) Shellshock vulnerability and we know that one of the CVE-IDs for that is CVE-2014-7187:

# pkg search :CVE-2014-7187:
INDEX         ACTION VALUE                                                PACKAGE
CVE-2014-7187 set    pkg://solaris/shell/bash@4.1.11,5.11- pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1
CVE-2014-7187 set    pkg://solaris/shell/bash@4.1.11,5.11- pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1

That output tells us which packages and versions contain a fix and which critical patch update it was provided in.

For another example lets find out the whole list of fixes in the October 2014 critical patch update:

pkg search -r info.cve:|grep 2014.10
info.cve   set    CVE-1999-0103 pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1
info.cve   set    CVE-2002-2443 pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1
info.cve   set    CVE-2003-0001 pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1

Note that the placement of colon (:) is very signficant in IPS and that the keyname is 'info.cve' and the values have CVE in upper case.

The way that metadata is setup allows for the cases where a given CVE-ID applies to multiple packages and also where a given package version contains fixes for multiple CVE-IDs.

If we simply want to know if the fix for a given CVE-ID is installed the using 'pkg search -l' with the CVE-ID is sufficent eg:

# pkg search -l CVE-2014-7187
info.cve   set    CVE-2014-7187 pkg:/support/critical-patch-update/solaris-11-cpu@2014.10-1

If it wasn't installed then we would have gotten no output.

If you want to have a system track only the Critical Patch Updates and not every SRU available in the /support repository then when you do an update instead of doing 'pkg update' or 'pkg update entire@<latest>' do 'pkg update solaris-11-cpu@latest'.  Note that initially systems won't have the 'solaris-11-cpu' package installed since as I mentioned previously is is "above" the entire package.  When you update to the latest version of the Critical Patch Update package it will install any intermediated SRUs that were released between this version and the prior one you had installed.

For some further examples have a look at MOS Doc ID 1948847.  Currently only the Solaris 11.2 critical patch update metadata is present but we intend to very soon backpublish the data for prior Solaris 11 critical patch updates as well.

Hope this helps with your compliance auditing.  If you are an auditor no more arguments with the sys admin teams about whither a fix is installed. If you are a system admin more time for your "real job" since you can now easily give the auditor the data they need.

The compliance team is also working on updating the Solaris and PCI-DSS security benchmarks we deliver with the compliance(1M) framework to use this new CVE metadata in determining if the system is upto date, since from a compliance view point we really want to know that all known and available security fixes are installed not that you are running the absolutely latest and greatest version of Solaris.

I have also been working with the OVAL community (part of SCAP) to define a schema for the Solaris IPS system.  This is due to be released with a future version of the OVAL standard.  That will allow us to use this same framework of metadata to also deliver OVAL xml files that map the CVE-IDs to packages.  When that is available we may choose to deliver OVAL xml files as part of the critical-patch-update package as an additional method of querying the same metadata. 

-- Darren

Thursday Jul 31, 2014

OpenStack Security integration for Solaris 11.2

As a part-time member/meddeler of the Solaris OpenStack engineering team I was asked to create some posts for the team's new OpenStack blog.

I've so far written up two short articles, one covering using ZFS encryption with Cinder, and one on Immutable OpenStack VMs.

Friday May 23, 2014

Overview of Solaris Zones Security Models

Over the years of explaining the security model of Solaris Zones and LDOMs to customers "security people" I've encountered two basic "schools of thought".  The first is "shared kernel bad" the second is "shared kernel good".

Which camp is right ?  Well both are, because there are advantages to both models. 

If you have a shared kernel there the policy engine has more information about what is going on and can make more informed access and data flow decisions, however if an exploit should happen at the kernel level it has the potential to impact multiple (or all) guests. 

If you have separate kernels then a kernel level exploit should only impact that single guest, except if it then results in a VM breakout.

Solaris non global zones fall into the "shared kernel" style.  Solaris Zones are included in the Solaris 11 Common Criteria Evaluation for the virtualisation extension (VIRT) to the OSPP.  Solaris Zones are also the foundation of our multi-level Trusted Extensions feature and is used for separation of classified data by many government/military deployments around the world.

LDOMs are "separate kernel", but LDOMs are also unlike hypervisors in the x86 world because we can shutdown the underlying control domain OS (assuming the guests have another path to the IO requirements they need, either on their own or from root-domains or io-domains).  So LDOMs can be deployed in a way that they are more protected from a VM breakout being used to cause a reconfiguration of resources.  The Solaris 11 CC evaluation still applies to Solaris instances running in an LDOM regardless of wither they are the control-domain, io-domain, root-domain or guest-domain.

Solaris 11.2 introduces a new brand of zone called "Kernel Zones", they look like native Zones, you configure them like native Zones and they actually are Solaris zones but with the ability to run a separate kernel.  Having their own kernel means that they support suspend/resume independent of the host OS - which gives us warm migration.  In particular "Kernel Zones" are not like popular x86 hypervisors such as VirtualBox, Xen or VMWare.

General purpose hypervisors, like VirtualBox, support multiple different guest operating systems so they have to virtualise the hardware. Some hypervisors (most type 2 but even some type 1) support multiple different host operating systems for providing the services as well. So this means the guest can only assume virtualised hardware.

Solaris Kernel Zones are different, the zones kernel knows it is running on Solaris as the "host" and the Solaris global zone "host" also knows that the kernel zone is Solaris.  This means we get to make more informed decisions about resources, general requirements and security policy. Even with this we can still host Solaris 11.2 kernel zones on the internal developement release of Solaris 12 and vice versa.

Note that what follows is an out line of implementation details that are subject to change at any time: The kernel of a Solaris Kernel Zone is represented as a user land process in a Solaris non global zone.  That non global zone is configured with less privilege than a normal non global zone would have and it is always configured as an immutable zone.  So if there happened to be an exploit of the guest kernel that resulted in a VM break out you would end up in an immutable non global zone with lowered privilege.

This means that we can have advantages of "shared kernel" and "separate kernel" security models with Solaris Kernel Zones and we have the management simplicity of traditional Solaris Zones (# zonecfg -z mykz 'create -t SYSsolaris-kz' && zoneadm -z mykz install && zoneadm -z mykz boot)

If you want even more layers of protection on SPARC it is possible to host Kernel Zones inside a guest LDOM.

Tuesday Apr 29, 2014

Using /etc/system.d rather than /etc/system to package your Solaris kernel config

The request for an easy way to package Solaris kernel configuration (/etc/system basically) came up both via the Solaris Customer Advisory Board meetings and requests from customers with early access to Solaris 11.2 via the Platinum Customer Program.  I also had another fix for the Solaris Cryptographic Framework that I needed to implement to stop cryptoadm(1M) from writing to /etc/system (some of the background to what that is needed is in my recent blog post about FIPS 140-2).

So /etc/system.d was born.  My initial plan for the implementation was to read the "fragment" files directly from the kernel. However that is very complex to do at the time we need to read these; since it happens (in kernel boot time scales) eons before we have the root file system mounted. We can however read from a well known file name that is in the boot archive.

The way I ended up implementing this is that during boot archive creation (either manually running 'bootadm update-archive' or as a result of BE or packaging operations or just a system reboot) we assemble together the content of /etc/system.d into a single well known /etc/system.d/.self-assembly (but considered a Private interface) file.  We read the files in /etc/system.d/ in C locale collation order and ignore all files that start with a "." character, this ensures that the assembly is predictable and consistent across all systems.

I then had too choose wither /etc/system.d or /etc/system "wins" if a variable happens to get set in both.  The decision was that /etc/system is read second and thus wins, this preserves existing behaviours. 

I also enhanced the diagnostic output from when the system file parser detects duplication so that we could indicate which file it was that caused the issue. When bootadm creates the .self-assembly file it includes START/END comment markers so that you will be able to easily determine which file from /etc/system.d delivered a given setting.

So now you can much more easily deliver any Solaris kernel customisations you need by using IPS to deliver fragments (one line or  many) into /etc/system.d/ instead of attempting to modify /etc/system via first boot SMF services or other scripting.  This also means they apply on first boot of the image after install as well. 

So how do I pick which file name in /etc/system.d/ to use so that it doesn't clash with other people ? The recommendation (which will be documented in the man pages and in /etc/system itself) is to use the full name of the IPS package (with '/' replaced by ':' ) as the prefix or name of any files you deliver to /etc/system.

As part of the same change I updated cryptoadm(1M) and dtrace(1M) to no longer write to /etc/system but instead write to files in /etc/system.d/ and I followed my own advice on file naming!

Information on how to get the Solaris 11.2 Beta is available from this OTN page.

Note that this particular change came in after the Solaris 11.2 Beta build was closed so you won't see this in Solaris 11.2 Beta (which is build 37).

Solaris 11.2 Compliance Framework

During the Solaris 11 launch (November 2011) one of the questions I was asked from the audience was from a retail customer asking for documentation on how to configure Solaris to pass a PCI-DSS audit.  At that time we didn't have anything beyond saying that Solaris was secure by default and it was no longer necessary to run the Solaris Security Toolkit to get there.  Since then we have produced a PCI-DSS white paper with Coalfire (a PCI-DSS QSA) and we have invested a significant amount of work in building a new Compliance Framework and making compliance a "lifestyle" feature in Solaris core development.

We delievered OpenSCAP in Solaris 11.1 since SCAP is the foundation language of how we will provide compliance reporting. So I'm please to be able to finally talk about the first really signficant part of the Solaris compliance infrastruture which is part of Solaris 11.2.

Starting with Solaris 11.2 we have a new command compliance(1M) for running system assements against security/compliance benchmarks and for generating html reports from those.  For now this only works on a single host but the team hard at work adding multi-node support (using the Solaris RAD infrastructure) for a future release.

The much more signficant part of what the compliance team has been working on is "content".  A framework without any content is just a new "box of bits, lots of assembly required" and that doesn't meet the needs of busy Solaris administrators.  So starting with Solaris 11.2 we are delivering our interpretation of important security/compliance standards such as PCI-DSS.  We have also provided two Oracle authored policies: 'Solaris Baseline' and 'Solaris Recommended', a freshly installed system should be getting all passes on the Baseline benchmark.  The checks in the Recommended benchmark are those that are a little more controversial and/or take longer to run.

Lets dive in and generate an assesment and report from one of the Solaris 11.2 compliance benchmarks we provide:

# pkg install security/compliance 
# compliance assess
# compliance report

That will give us an html report that we can then view.  Since we didn't give any compliance benchmark name it defaults to 'Solaris Baseline', so now lets install and run the PCI-DSS benchmark. The 'security/compliance' package has a group dependency for 'security/compliance/benchmark/pci-dss' so it will be installed already but if you don't want it you can remove that benchmark and keep the others and the infrastructure.

# compliance assess -b pci-dss
Assessment will be named 'pci-dss.Solaris_PCI-DSS.2014-04-14,16:39'
# compliance report -a pci-dss.Solaris_PCI-DSS.2014-04-14,16:39

If we want the report to only show those tests that failed we can do that like this:

# compliance report -s fail -a pci-dss.Solaris_PCI-DSS.2014-04-14,16:39

We understand that many of your Solaris systems won't match up exactly to the benchmarks we have provided and as a result we have delivered the content in a way that you can customise it. Over time the ability to build custom benchmarks from the checks we provide will be come part of the compliance(1M) command but for now you can enable/disable checks by editing a copy of the XML files. Yes I know many of you don't like XML but this time it isn't too scary for just this part, crafting a whole check from scratch is hard though but that is the SCAP/XCCDF/OVAL language for you!.

So for now here is the harder than it should be way to customise one of the delivered benchmarks, using the PCI-DSS benchmark as an example:

# cd /usr/lib/compliance/benchmarks
# mkdir example
# cd example
# cp ../pci-dss/pci-dss-xccdf.xml example-xccdf.xml
# ln -s ../../tests
# ln -s example-xccdf.xml xccdf.xml

# vi example-xccdf.xml

In your editor you are looking for lines that look like this to enable or disable a given test:

<select idref="Test_1.2" selected="true" />

You probably also want to update these lines to indicate that it is your benchmark rather than the original we delivered.

<status date="2013-12-12">draft</status>
<title>Payment Card Industry Data Security Standard</title>

Once you have made the changes you want exit from your editor and run 'compliance list' and you should see your example benchmark listed, you can run run assesments and generate reports from that one just as above.  It is important you do this by making a copy of the xccdf.xml file otherwise the 'pkg verify' test is always going to fail and more importantly your changes would be lost on package update.

Note that we have plans to re-number these tests and provide a peristent unique identifier and namespace for each of the tests we deliver, it just didn't make the cut off for Solaris 11.2 release.  So for now note that the test numbering will very likely change in a future release.

I would really value feedback on the framework itself and probably even more importantly the actual compliance checks that our Solaris Baseline, Solaris Recommended, and PCI-DSS security benchmarks include.

Information on how to get the Solaris 11.2 Beta is available from this OTN page.

Wednesday Apr 16, 2014

Is FIPS 140-2 Actively harmful to software?

Solaris 11 recently completed a FIPS 140-2 validation for the kernel and userspace cryptographic frameworks.  This was  a huge amount of work for the teams and it is something I had been pushing for since before we wrote a single line of code for the cryptographic framework back in 2000 during its initial design for Solaris 10.

So you would imaging I'm happy right ?  Well not exactly, I'm glad I won't have to keep answering questions from customers as to why we don't have a FIPS 140-2 validation but I'm not happy with the process or what it has done to our code base.

FIPS 140-2 is an old standard that doesn't deal well with modern systems and especially doesn't fit nicely with software implementations.  It is very focused on standalone hardware devices, and plugin hardware security modules or similar physical devices.  My colleague Josh over in Oracle Seceval  has already posted a great article on why we only managed to get FIPS 140-2 @ level 1 instead of level 2.  So I'm not going to cover that but instead talk about some of the technical code changes we had to make inorder to "pass" our validation of FIPS 140-2.

There are two main parts to completing a FIPS 140-2 validation: the first part is CAVP  (Cryptographic Algorithm Validation Program) this is about proving your implementation of a given algorithm is correct using NIST assigned test vectors.  This part went relatively quickly and easily and has the potential to find bugs in crypto algorithms that otherwise appear to be working correctly.  The second part is CMVP (Cryptographic Module Validation Program), this part looks at the security model of the whole "FIPS 140-2 module", in our case we had a separate validation for kernel crypto framework and userspace crypto framework.

CMVP requires we draw boundary around the delivered software components that make up the FIPS 140-2 validation boundary - so files in the file system.  Ideally you want to keep this as small as possible so that non crypto relevant libraries and tools are not part of the FIPS 140-2 boundary. We certainly made some mistakes drawing our boundary in userspace since it was a little larger than it needed to be.  We ended up with some "utility" libraries inside the boundary, so good software engineering practice of factoring out code actually made our FIPS 140-2 boundary bigger.

Why does the FIPS 140-2 boundary matter ?  Well unlike in Common Criteria with flaw remediation in the FIPS 140-2 validation world you can't make any changes to the compiled binaries that make up the boundary without potentially invalidating the existing valiation. Which means having to go through some or all of the process again and importantly this cost real money and a significant amount of elapsed time. 

It isn't even possible to fix "obvious" bugs such as memory leaks, or even things that might lead to vulnerabilties without at least engaging with a validation lab.  This is bad for over all system security, after all isn't FIPS 140-2 supposed to be a security standard ?  I can see, with a bit of squinting, how this can maybe make some sense in a hardware module world but it doesn't make any sense for software.

We also had to add POST (Power On Self Test) code that runs known answer tests for all the FIPS 140-2 approved algorithms that are implemented inside the boundary at "startup time" and before any consumer outside of the framework can use the crypto interfaces. 

For our Kernerl framework we implemented this using the module init hooks and also leveraged the fact that the kcf module itself starts very early in boot (long before we even mount the root file system from inside the kernel).  Since kernel modules are generally only unloaded to be updated the impact of having to do this self test on every startup isn't a big deal.

However in userspace we were forced because of "Implementation Guidance", I'll get back to this later on why it isn't guidance, to do this on every process that directly or indirectly causes the cryptographic framework libaries to be loaded.  This is really bad and is counter to sensible software engineering practice. On general purpose modern operating systems (well anything from the last 15+ years really) like Solaris share library pages are mapped shared so the same readonly pages of code are being used by all the processes that start up.  So this just wastes CPU resources and causes performance problems for short lived processes.  We measured the impact this had on Solaris boot time and it was if I'm remebering correctly about a 9% increase in the time it takes to boot to multi-user. 

I've actually spoken with NIST about the "always on POST" and we tried hard to come up with an alternative solution but so far we can't seem to agree on a method that would allow this to be done just once at system boot and only do it again if the on disk binaries are acutally changed (which we can easily detect!).

Now lets combine these last two things, we had to add code that runs every time our libraries load and we can't make changes to the delivered binaries without possibly causing our validation to become invalid.  Solaris actually had a bug in some of the new FIPS 140-2 POST code in userspace that had a risk of a file descriptor leak (it wasn't something that was an exploitable security vulnerability and it was only one single fd) but we couldn't have changed that without revising which binaries were part of the FIPS 140-2 validation.  This is bad for customers that are forced by their governments or other security standards to run with FIPS 140-2 validated crypto modules, because sometimes they might have to miss out on critical fixes.

I promissed I'd get back to "Implementation Guidance", this is really aroundable way of updating the standard with new interpretations that often look to developers like whole new requirements (that we were supposed to magically know about) without the standard being revised.  While the approved validation labs to get pre-review of these new or updated IGs the impact for vendors is huge.   A module that passes FIPS 140-2 (which is a specific revision, the current one as of this time, of the standard) today might not pass FIPS 140-2 in the future - even if nothing was changed. 

In fact we are in potentially in this situation with Solaris 11.  We have completed and passed a FIPS 140-2 validation but due to changes in the Implementation Guidance we aren't sure we would be able to submit the identical code again and pass. So we may have to make changes just to pass FIPS 140-2 new or updated IGs that has no functional beneift to our customers. 

This has serious implications for software implementations of cryptographic modules.  I can understand that if we change any of the core crypto algorithm code we should re run the CAVP test vectors again - and in fact we do that internally using our test suite for all changes to the crypto framework anyway (our test suite is actually much more comprehensive than what FIPS 140 required), but not being able to make simple bug fixes or changes to non algorithm code is not good for software quality.

So what we do we do in Solaris ?  We make the bug fixes and and new non FIPS 140-2 relevant algorithms (such as Camellia) anyway because most of our customers don't care about FIPS 140-2 and even many of those that do they only care to "tick the box" that the vendor has completed the validation.

In Solaris the kernel and userland cryptographic frameworks always contain the FIPS 140-2 required code but it is only enabled if you run 'cryptoadm enable fips-140' .  This turns on the FIPS 140-2 POST checking and a few other runtime checks.

So should I run Solaris 11 with FIPS 140-2 mode enabled ?

My personal opinion is that unless you have a very hard requirement to do so I wouldn't - it is the same crypto algorithm and key management code you are running anyway but you won't have the pointless POST code running that will hurt the start up time of short lived processes. Now having said that my day to day Solaris workstation (which runs the latest bi weekly builds of the Solaris 12 development train) does actually run in FIPS 140-2 mode so that I can help detect any possible issues in the FIPS 140-2 mode of operating long before a code release gets to customers.  We also run our test suites with it enabled and disabled.

I really hope that when a revision to FIPS 140 finally does come around (it is already many years behind schedule) it will deal better with software implementations. When FIPS 140-3 was first in public review I sent on a lot of comments to it for that area.   I really hope that the FIPS 140 program can adopt a sensible approach to allowing vendors to provide bugfixes without having to redo validations - in particular it should not cost the vendor any time or money beyond what they normally do themselves.

In the mean time the Solaris Cryptographic Framework team are hard at work; fixing bugs, improving performance adding features new algorithms and (grudgingly) adding what we think will allow us to pass a future FIPS 140 validation based on the currently known IGs.

-- Darren

Monday Dec 16, 2013

Kernel Cryptographic Framework FIPS 140-2 Validation

After many years of preparation work by the Solaris Crypto Team, and almost a year (just shy about two weeks) since we submitted to NIST our documentation for validation, the Kernel Crypto Framework has its first FIPS 140-2 validation completed.  This applies  Solaris 11.1 SRU 5 when running on SPARC T4, T5 (the M5 and M6 use the same crypto core and we expect to vendor affirm those), M3000 class (Fujitsu SPARC64) and on Intel (with and without AES-NI).

Many thanks to all those at Oracle who have helped make this happen and to our consultants and validation lab.

Monday Oct 21, 2013

Do your filesystems have un-owned files ?

As part of our work for integrated compliance reporting in Solaris we plan to provide a check for determining if the system has "un-owned files", ie those which are owned by a uid that does not exist in our configured nameservice.  Tests such as this already exist in the Solaris CIS Benchmark (9.24 Find Un-owned Files and Directories) and other security benchmarks.

The obvious method of doing this would be using find(1) with the -nouser flag.  However that requires we bring into memory the metadata for every single file and directory in every local file system we have mounted.  That is probaby not an acceptable thing to do on a production system that has a large amount of storage and it is potentially going to take a long time.

Just as I went to bed last night an idea for a much faster way of listing file systems that have un-owned files came to me. I've now implemented it and I'm happy to report it works very well and peforms many orders of magnatude better than using find(1) ever will.   ZFS (since pool version 15) has per user space accounting and quotas.  We can report very quickly and without actually reading any files at all how much space any given user id is using on a ZFS filesystem.  Using that information we can implement a check to very quickly list which filesystems contain un-owned files.

First a few caveats because the output data won't be exactly the same as what you get with find but it answers the same basic question.  This only works for ZFS and it will only tell you which filesystems have files owned by unknown users not the actual files.  If you really want to know what the files are (ie to give them an owner) you still have to run find(1).  However it has the huge advantage that it doesn't use find(1) so it won't be dragging the metadata for every single file and directory on the system into memory. It also has the advantage that it can check filesystems that are not mounted currently (which find(1) can't do).

It ran in about 4 seconds on a system with 300 ZFS datasets from 2 pools totalling about 3.2T of allocated space, and that includes the uid lookups and output.


for fs in $(zfs list -H -o name -t filesystem -r rpool) ; do
        for uid in $(zfs userspace -Hipn -o name,used $fs | cut -f1); do
                if [ -z "$(getent passwd $uid)" ]; then
                        unknowns="$unknowns$uid "
        if [ ! -z "$unknowns" ]; then
                mountpoint=$(zfs list -H -o mountpoint $fs)
                mounted=$(zfs list -H -o mounted $fs)
                echo "ZFS File system $fs mounted ($mounted) on $mountpoint \c"
                echo "has files owned by unknown user ids: $unknowns";
Sample output:

ZFS File system rpool/ROOT/solaris-30/var mounted (no) on /var has files owned by unknown user ids: 6435 33667 101
ZFS File system rpool/ROOT/solaris-32/var mounted (yes) on /var has files owned by unknown user ids: 6435 33667
ZFS File system builds/bob mounted (yes) on /builds/bob has files owned by unknown user ids: 101

Note that the above might not actually appear exactly like that in any future Solaris product or feature, it is provided just as an example of what you can do with ZFS user space accounting to answer questions like the above.

Thursday Sep 12, 2013

Solaris Random Number Generation

The following was originally written to assist some of our new hires in learning about how our random number generators work and also to provide context for some questions that were asked as part of the ongoing (at the time of writing) FIPS 140-2 evaluation of the Solaris 11 Cryptographic Framework.

1. Consumer Interfaces

The Solaris random number generation (RNG) system is used to generate random numbers, which utilizes both hardware and software mechanisms for entropy collection. It has consumer interfaces for applications; it can generate high-quality random numbers suitable for long term
asymmetric keys and pseudo-random numbers for session keys or other cryptographic uses, such as a nonce.

1.1 Interface to user space

The random(7D) device driver provides the /dev/random and /dev/urandom devices to user space, but it doesn't implement any of the random number generation or extraction itself.

There is a single kernel module (random) for implementing both the /dev/random and /dev/urandom devices the two primary entry points are rnd_read() and rnd_write() for servicing read(2) and write(2) system calls respectively.

rnd_read() calls either kcf_rnd_get_bytes() or kcf_rnd_get_pseudo_bytes() depending on wither the device node is an instance of /dev/random or /dev/urandom respectively.  There is a cap on the maximum number of bytes that can be transfered in a single read, MAXRETBYTES_RANDOM (1040) and MAXRETBYTES_URANDOM(128 * 1040) respectively.

rnd_write() uses random_add_entropy() and random_add_pseduo_entropy() they both pass 0 as the estimate of the amount of entropy that came from userspace, so we don't trust userspace to estimate the value of the entropy being provided.  Also only a user with uid root or all privilege can open /dev/random or /dev/urandom for write and thus call rnd_write().

1.2 Interface in kernel space

The kcf module provides an API for randomnes for in kernel KCF consumers. It implements the functions mentioned above that are called to service the read(2)/write(2) calls and also provides the interfaces for kernel consumers to access the random and urandom pools.

If no providers are configured no randomness can be returned and a message logged informing the administrator of the mis-configuration.

2. /dev/random

We periodically collect random bits from providers which are registered with the Kernel Cryptographic Framework (kCF) as capable of random number generation. The random bits are maintained in a cache and it is used for high quality random numbers (/dev/random) requests. If the cache has sufficient random bytes available the request is serviced from the cache.  Otherwise we pick a provider and call its SPI routine.  If we do not get enough random bytes from the provider call we fill in the remainder of the request by continously replenishing the cache and using that until the full requested size is met.

The maximum request size that will be services for a single read(2) system call on /dev/random is 1040 bytes.

2.1 Initialisation

kcf_rnd_init() is where we setup the locks and get everything started, it is called by the _init() routine in the kcf module, which itself is called on very early in system boot - before the root filesystem is mounted and most modules are loaded.

For /dev/random and random_get_bytes() a static array of 1024 bytes is setup by kcf_rnd_init().  

We start by placing the value of gethrtime(), high resolution time since the boot time, and drv_getparam(), the current time of day as the initial seed values into the pool (both of these are 64 bit integers).  We set the number of random bytes available in the pool to 0.

2.2 Adding randomness to the rndpool

The rndc_addbytes() function adds new random bytes to the pool (aka cache). It holds the rndpool_lock mutex while it xor in the bytes to the rndpool.  The starting point is the global rindex variable which is updated as each byte is added.  It also increases the rnbyte_cnt.

If the rndpool becomes full before the passed in number of bytes is all used we continue to add the bytes to the pool/cache but do not increase rndbyte_cnt, it also moves on the global findex to match rindex as it does so.

2.3 Scheduled mixing

The kcf_rnd_schedule_timeout() ensures that we perform mixing of the rndpool.  The timeout is itself randomly generated by reading (but not consuming) the first 32 bits of rndpool to derive a new time out of of between 2 and 5.544480 seconds.  When the timeout expires the KCF rnd_handler() function [ from kcf_random.c ] is called.

If we have readers blocked for entropy or the count of available bytes is less than the pool size we start an asynchronous task to call rngprov_getbyte() gather more entropy from the available providers.

If there is at least the minimum (20 bytes) of entropy available we wake up the threads stopped in a poll(2)/select(2) of /dev/random. If there are any threads waiting on entropy we wake those up too.  The waiting and wake up is performed by cv_wait_sig() and cv_broadcast(), this means that the pool lock will be held when cv_broadcast wakes up a thread it will have the random pool lock held.

Finally it schedules the next time out.

2.4 External caller Seeding

The random_add_entropy() call is able able to provide entropy from an external (to KCF or its providers) source of randomness.  It takes a buffer and a size as well as an estimated amount of entropy in the buffer. There are no callers in Solaris that provide a non 0 value for the estimate
of entropy to random_add_entropy().  The only caller of random_add_entropy() is actually the write(2) entry point for /dev/random.

Seeding is performed by calling the first available software entropy provider plugged into KCF and calling its KCF_SEED_RANDOM entropy function. The term "software" here really means "device driver driven" rather than CPU instruction set driven.  For example the n2rng provider
is device driver driven but the architecture based Intel RDRAND is regardedas "software".  The terminology is for legacy reasons in the early years of the Solaris cryptographic framework.  This does however mean we never attempt to seed the hardware RNG on SPARC S2 or S3 core based systems (T2 through M6 inclusive) but we will attempt to do so on Intel CPUs with RDRAND.

2.5 Extraction for /dev/random

We treat rndpool as a circular buffer with findex and rindex tracking the front and back respectively, both start at position 0 during initial initalisation.

To extract randomness from the pool we use kcf_rnd_get_bytes(); this is a non blocking call and it will return EAGAIN if there is insufficient randomness available (ie rndbyte_cnt is less than the request size) and 0 on success.

It calls rnd_get_bytes() with the rndpool_lock held, the lock will be released by rnd_get_bytes() on both sucess and failure cases.  If the number of bytes requested of rnd_get_bytes() is less than or
equal to the number of available bytes (rnbyte_cnt) then we call rndc_getbytes() immediately, ie we use the randomness from the pool.   Otherwise we release the rndpool_lock and call rngprov_getbytes() with the number of bytes we want, if that still wasn't enough we loop picking up as many bytes as we can by successive calls, if at any time the rnbyte_cnt in the pool is less than 20 bytes we wait on the read condition variable (rndpool_read_cv) and try again when we are woken up.

rngprov_getbytes() finds the first available provider that is plugged into KCF and calls its KCF_OP_RANDOM_GENERATE function.  This function is also used by the KCF timer for scheduled mixing (see later discussion).  It cycles through each available provider until either there are no more available or the requested number of bytes is available.  It returns to the caller the number of bytes it retreived from all of the providers combined.

If no providers were available then rngprov_getbytes() returns an error and logs and error to the system log for the administrator.  A default configuration of Solaris (and the one required by FIPS 140-2 security target) has at least the 'swrand' provider.  A Solaris instance running on SPARC S3 cores (T2 through M6 inclusive) will also have the n2rng provider configured and available, this is true even for Solaris instances in an LDOM.

2.6 KCF Random Providers

KCF has the concept of "hardware" and "software" providers.  The terminology is a legacy one from before hardware support for cryptographic algorithms and random number generation was available as unprivileged CPU instructions.

It really now maps to "hardware" being a provider that as a specific device driver, such as n2rng  and "software" meaning CPU instructions or some other pure software mechanism.  It doesn't mean that there is no "hardware" involved since on Intel CPUs with the RDRAND instruction calls are in the swrand provider but it is regarded as a "software" provider.

2.6.1 swrand: Random Number Provider

All Solaris installs have a KCF random provider called "swrand". This provider periodically collects unpredictable input and processes it into a pool or entropy, it implements its own mixing (distinct from that at the kcf level), extraction and generation algorithms.

It uses a pool called srndpool of 256 bytes and a leftover buffer of 20 bytes.

The swrand provider has two different entropy sources:

1. By reading blocks of physical memory and detecting if changes occurred in the blocks read.

Physical memory is divided into blocks of fixed size.  A block of memory is chosen from the possible blocks and hashed to produce a digest.  This digest is then mixed into the pool.  A single bit from
the digest is used as a parity bit or "checksum" and compared against the previous "checksum" computed for the block.  If the single-bit checksum has not changed, no entropy is credited to the pool.  If there is a change, then the assumption is that at least one bit in the block has changed.  The possible locations within the memory block of where the bit change occurred is used as a measure of entropy. 

For example,

if a block size of 4096 bytes is used, about log_2(4096*8)=15 bits worth of entropy is available.  Because the single-bit checksum will miss half of the changes, the amount of entropy credited to  the pool is doubled when a change is detected.  With a 4096 byte block size, a block change will add a total of 30 bits of entropy to the pool.

2. By measuring the time it takes to load and hash a block of memory and computing the differences in the measured time.

This method measures the amount of time it takes to read and hash a physical memory block (as described above).  The time measured can vary depending on system load, scheduling and other factors.  Differences between consecutive measurements are computed to come up with an entropy estimate.  The first, second, and third order delta is calculated to determine the minimum delta value.  The number of bits present in this minimum delta value is the entropy estimate.

3. Additionally on x86 systems that have the RDRAND instruction we take entropy from there but assume only 10% entropic density from it.  If the rdrand instruction is not available or the call to use it fails (CF=0) then the above two entropy sources are used. Initalisation of swrand

Since physical memory can change size swrand registers with the Solaris DR subsystem so that it  can update its cache of the number of blocks of physical memory when it either grows or shrinks.

On initial attach the fips_rng_post() function is run.

During initalisation the swrand provider adds entropy from the high resolution time since boot and the current time of day (note that due to the module load system and how KCF providers register these values will always be different from the values that the KCF rndpool is initalised with). It also adds in the initial state of physical memory, the number of blocks and sources described above.

The first 20 bytes from this process are used as the XKEY and are also saved as the initial value of previous_bytes for use with the FIPS 186-2 continuous test.

Only after all of the above does the swrand provider register with the cryptographic framework for both random number generation and seeding of the swrand generator. swrand entropy generation

The swrand_get_entropy() is where all the real work happens when the KCF random pool calls into swrand.  This function can be called in either blocking or non blocking mode. The only difference between blocking and non blocking is that the later will return EAGAIN if there is insufficient entropy to generate the randomness, the former blocks indefinitely.

A global uint32_t entropy_bits is used to track how much entropy is available.

When a request is made to swrand_get_entropy() we loop until we have the available requested amount of randomness.  First checking if the number of remaining entropy in srndpool is below 20 bytes, if it is then we block waiting for more entropy (or return EGAIN if non blocking mode).

Then determine how many bytes of entropy to extract, it is the minimum of the total requested and 20 bytes.  The entropy extracted from the srndpool is then hashed using SHA1 and fed back into the pool starting at the previous extraction point.  We ensure that we don't feed the same entropy back into the srndpool at the same position, if we do then the system will force a panic when in FIPS 140 mode or log a warning and return EIO when not in FIPS 140 mode.

The FIPS 186-2 Appendix 3 fips_random_inner() function is then run on that same SHA1 digest and the resulting output checked that each 20 byte block meets the continous RNG test - if that fails we panic or warn as above.

We then update the output buffer and if continue the loop until we have generated the requested about.  Before swrand_get_entropy() returns it zeros out the used SHA1 digest and any temp area and releases the srndpool mutex. lock. Adding to the swrand pool

The swrand_seed_random () function is used to add request adding entropy from an external source via the KCF random_add_entropy() call. If is called from KCF (ie something external to swrand itself) synchronously then the entropy estimate is always 0.  When called asynchronously we delay adding in the entropy until the next mixing time.

The internal swrand_add_entropy() call deals with updating srndpool it does do by adding and then mixing the bytes while holding the srndpool mutex lock.  Thus the pool is always mixed before returning. Mixing the swrand pool

The swrand provider uses the same timeout mechanism for mixing that is described above for the KCF rndpool for adding new entropy to the the srndpool using the sources described above.

The swrand_mix_pool() function is called as a result of the timeout or an explicit request to add more entropy.

To mix the pool we first add in any deferred bytes, then by sliding along the pool in 64 bit chunks hash the data from the start upto this point and  the position we are long the pool with SHA1.  Then XOR the resulting hash back into the block and move along.

2.6.2 n2rng random provider

This applies only to SPARC processors with either an S2 core (T2, T3, T3+) or to S3 core (T4, T5, M5, M6) both CPU families use the same n2rng driver and the same on chip system for the RNG.

The n2rng driver provides the interface between the hyper-privilged access to the RNG registers on the CPU and KCF.

The driver performs attach time diagnostics on the hardware to ensure it continues operating as expected.  It determines that it is operating
in FIPS 140-2 mode via its driver.conf(5) file before its attach routine has completed. The full hardware health check in conjunction with the  hypervisor only when running in the control domain. The FIPS 140 checks are always run regardless of the hypervisor domain type.  If the FIPS 140  POST checks fail the driver ensures it is deregistered with KCF.

If the driver is suspended and resumed it reconfigures and re-registers with KCF.  This would happen on a suspend/resume cycle or during live  migration or system reconfiguration.

External seeding of n2rng is not possible from outside of the driver, and it does not provide the seed_random operation to KCF.

This algorithm used by n2rng is very similar to that of swrand, if loops collecting entropy and building up the requested number of bytes checking that each bit of entropy is different from the previous one, applying the fips_random_inner() function and then checking the resulting processed bytes differs from the previous set.

The entropy collection function n2rng_getentropy() is the significant difference between it and swrand in how they provide random data requests to KCF callers.

n2rng_getentropy() returns the requested number of bytes of entropy by uses hypervisor calls to hv_rng_data_read() and providing error checking to ensure we can retry on certain errors but eventually give up after a period of time or number of failed attempts at reading from the hypervisor.   The function hv_rng_data_read() is a short fragment of assembler code that reads a 64 bit value from the hypervisor RNG register (HV_RNG_DATA_READ 0x134) and is only called by n2rng_getentropy() and the diagnostic routine called at driver attach and resume time.

3.0 FIPS 186-2: fips_random_inner()

We will discuss this function here because it is common to swrand, n2cp, the /dev/urandom implementation as well as being used in userspace function fips_get_random.

It is a completely internal to Solaris function that can't be used outside of the cryptographic framework.

    fips_random_inner(uint32_t *key, uint32_t *x_j, uint32_t *XSEED_j)

It computes a new random value, which is stored in x_j; updates XKEY.  XSEED_j is additional input.  In principle, we should protect XKEY, perhaps by placing it in non-paged memory, but we aways clobber XKEY with fresh entropy just before we use it.  And  step 3d irreversibly updates it just  after we use it.  The only risk is that if an attacker captured the state while the entropy generator was broken, the attacker could predict future  values. There are two cases:

  1. The attack gets root access to a live system.  But there is no defense against that that we can place in here since they already have full control.
  2. The attacker gets access to a crash dump.  But by then no values are being generated.

Note that XSEED_j is overwritten with sensitive stuff, and must be zeroed by the caller.  We use two separate symbols (XVAL and XSEED_j) to make each step match the notation in FIPS 186-2.

All parameters (key, x_j, XSEED_j) are the size of a SHA-1 digest, 20 bytes.

The HASH function used is SHA1.

The implementation of this function is verified during POST by fips_rng_post() calling it with a known seed.  The POST call is performed before the swrand module registers with KCF or during initalisation  of any of the libraries in the FIPS 140 boundary (before their symbols are available to be called by other libraries or applications).

4.0 /dev/urandom

This is a software-based generator algorithm that uses the random bits in the cache as a seed. We create one pseudo-random generator (for /dev/urandom) per possible CPU on the system, and use it, kmem-magazine-style, to avoid cache line contention.

4.1 Initalisation of /dev/urandom

kcf_rnd_init() calls rnd_alloc_magazines() which  setups up the empty magazines for the pseduo random number pool (/dev/urandom). A separate magazine per CPU is configured up to the maximum number of possible (not available) CPUs on the system, important because we can add more
CPUs after initial boot.

The magazine initalisation discards the first 20 bytes so that the rnd_get_bytes() function will be using that for comparisons that the next block always differs from the previous one.  It then places the next  20 bytes into the rm_key and next again 20 bytes into rm_seed.  It does this for each max_ncpus magazine.  Only after this is complete does kcf_rnd_init() return back to kcf_init().  Each of the per CPU magazines has its own state which includes hmac key, seed and previous value, each also has its own rekey timers and limits.

The magazines are only used for the pseduo random number pool (ie servicing random_get_pseduo_bytes() and /dev/urandom) not for random_get_bytes() or /dev/random.

Note that this usage is preemption-safe; a thread entering a critical section remembers which generator it locked and unlocks the same one; should it be preempted and wind up running on a different CPU, there will be a brief period of increased contention before it exits the critical section but nothing will melt.

4.2 /dev/urandom generator

At a high level this uses the FIPS 186-2 algorithm using a key extracted from the random pool to generate a maximum of 1310720 output blocks before rekeying.  Each CPU (this is CPU thread not socket or core) has its own magazine.

4.3 Reading from /dev/urandom

The maximum request size that will be services for a single read(2) system call on /dev/urandom is 133120 bytes.

Reads all come in via the kcf_rnd_get_pseduo_bytes() function.

If the requested size is considered a to be large, greater than 2560 bytes, then instead of reading from the pool we tail call the generator directly by using rnd_generate_pseudo_bytes().

If the CPUs magazine has sufficient available randomness already we use that, otherwise we call the rnd_generate_pseudo_bytes() function directly.

rnd_generate_pseduo_bytes() is always called with the cpu magazine mutex already locked and it is released when it returns.

We loop through the following until the requested number of bytes has been built up or an unrecoverable error occurs.

rm_seed is reinitialised by xoring in the current 64 bit highres time, from gethrtime() into the prior value of rm_seed.  The fips_random_inner() call is then made using the current value of rm_key and this new seed.

The returned value from fips_random_inner() is then checked against our previous return value to ensure it is a different 160bit block.  If that fails the system panics when in FIPS 140-2 mode or returns EIO if FIPS mode is not enabled.

Before returning from the whole function the local state is zero'd out and the per magazine lock released.

5.0 Randomness for key generation

For asymmetric key generation inside the kernel a special random_get_nzero_bytes() API is provided.  It differs from random_get_bytes() in two  ways, first calls the random_get_bytes_fips140() function which only returns once all FIPS 140-2 initalisation has been completed.  The  random_get_bytes() function needs to be available slightly earlier because some very early kernel functions need it (particularly setup of the VM system and if ZFS needs to do any writes as part of mounting the root filesystem).  Secondly it ensures that no bytes in the output have the 0 value, those are replaced with freshly extracted additional random bytes, it continues until the entire requested length is entirely made up of non zero bytes.

A corresponding random_get_nzero_pseduo_bytes() is also available for cases were we don't want 0 bytes in other random sequences, such as session keys, nonces and cookies.

The above to functions ensure that even though most of the random pool is available early in boot we can't use it for key generation until the full FIPS 140-2 POST and integrity check has completed, eg on the swrand provider.

6.0 Userspace random number

Applications that need random numbers may read directly from /dev/random and /dev/urandom. Or may use a function implementing
the FIPS 186-2 rng requirements.

The cryptographic framework libraries in userspace provide the following internal functions:

    pkcs11_get_random(), pkcs11_get_urandom()
    pkcs11_get_nzero_random(), pkcs11_get_nzero_urandom()

The above functions are available from the libcryptoutil.so library but are Private to Solaris and MUST not be used by any 3rd party code - see the attributes(5) man page for the Solaris interface taxonomy. Similar to the kernel space there are pkcs11_get_nzero_random() and pkcs11_get_nzero_urandom() variants that ensure none of the bytes are zero.  The pkcs11_ prefix is because these are Private functions mostly used for the implementation of the PKCS#11 API.  The Solaris Private ucrypto API does not provide key generation functions.

The pkcs11_softtoken C_GenerateRandom() function is implemented by calling pkcs11_get_urandom().

When pkcs11_softtoken is performing key generation C_GenerateKey() or C_GenerateKeyPair() it uses pkcs11_get_random for persistent (token) keys and pkcs11_get_urandom() for ephemeral (session) keys.

The above mentioned internal functions generate random numbers in  the following way.

While holding the pre_rnd_mutex (which is per userspace process) pkcs11_get_random() reads in 20 byte chunks from /dev/random and calls fips_get_random() on that 20 bytes and continutes a loop building up the output until the caller requested number of bytes are retrived or an unrecoverable error occurs (in that case it will kill the whole process using abort() when in FIPS 140-2 mode).

fips_get_random() performs a continous test by comparing the bytes taken from /dev/random. It then performs a SHA1 digest of those bytes and calls fips_random_inner().  It then again performs the byte by byte continous test.

When the caller requested number of bytes have been read and post processed the pre_rnd_mutex is released and the bytes returned to the caller
from pkcs11_get_random().

The initial seed and XKEY for fips_random_inner() are setup during the initalisation of the libcryptoutil library before the main() of the application  is called or any of the functions in libcryptoutil are available. XKEY is setup by feeding the the current high resolution time into seed48() and  drand48() functions to create a buffer of 20 bytes that is then digested through SHA1 and becomes the initial XKEY value.  XKEY is then updated when fips_random_inner() is called.

pkcs11_get_urandom() follows exactly the same algorithm as pkcs11_get_random() except that the reads are from /dev/urandom instead of /dev/random.

When a userspace program forks pthread_at_fork handlers ensure that requests to retreive randomness are locked out during the fork.

I hope this is useful and/or interesting insight into how Solaris generates randomness.

Update 2013-09-12 I was asked about how this applies to Illumos: To the best of my knowledge [ I have not read the Illumos source the following is based on what I remember of the old OpenSolaris source ] most of what I said above should apply to Illumos as well.  The main exceptions are that the fips_random_inner(), POST and some of the continuous checks don't exist, neither does the Intel RDRAND support. The source or the n2rng driver, random(7D), kcf  and swrand were available as part of OpenSolaris.  Not that Illumos may have changed some of this so please verify for yourself.

Tuesday Jun 11, 2013

Monitoring per Zone Filesystem activity

Solaris 11 added a new at fsstat(1M) monitoring command that provided the ability to view filesystem activity at the VFS layer (ie filesystem independent).  This command was available in Solaris 11 Express and the OpenSolaris releases as well.

Here is a simple example of looking at 5 one second intervals of writes to all ZFS filesystems

$ fsstat zfs 1 5
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
3.79M 3.39M 1.77M 1003M 1.24M  6.16G 49.9M  425M 2.01T  160M  467G zfs
    0     0     0    55     0    416     6     0     0 42.7K 21.3M zfs
    0     0     0     4     0     18     0     0     0 44.0K 22.0M zfs
    0     0     0     5     0     28     0     0     0 44.0K 22.0M zfs
    0     0     0     4     0     18     0     0     0 40.3K 20.1M zfs
    0     0     0   260     0  1.08K     0     0     0 36.7K 18.4M zfs

In Solaris 11.1 support was added for per Zone and aggregated information so now we can very quickly determine which zone it is that is contributing to the operations, for example:

$ fsstat -Z zfs 1 5
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
3.79M 3.39M 1.77M  998M 1.24M  6.13G 49.4M  424M 2.01T  159M  467G zfs:global
   62    55    39 4.85M   250  33.2M  575K  157K  138M 8.66K 11.6M zfs:s10u9
  114    91    33  115K   362   429K 3.80K  133K  180M  116K 87.2M zfs:ltz
    0     0     0 1.06K     0  3.44K    14    71 12.9K     6 1.43K zfs:global
    0     0     0     2     0      8     0     1    1K     0     0 zfs:s10u9
    0     0     0     0     0      0     0     0     0 41.7K 20.9M zfs:ltz
    0     0     0     4     0     18     0     0     0     0     0 zfs:global
    0     0     0    51     0    398     6     0     0     0     0 zfs:s10u9
    0     0     0     0     0      0     0     0     0 42.6K 21.3M zfs:ltz
    0     0     0     4     0     18     0     0     0     0     0 zfs:global
    0     0     0     0     0      0     0     0     0     0     0 zfs:s10u9
    0     0     0     0     0      0     0     0     0 44.0K 22.0M zfs:ltz
    0     0     0     5     0     28     0     0     0     0     0 zfs:global
    0     0     0     0     0      0     0     0     0     0     0 zfs:s10u9
    0     0     0     0     0      0     0     0     0 43.9K 22.0M zfs:ltz
    0     0     0     4     0     18     0     0     0     0     0 zfs:global
    0     0     0     0     0      0     0     0     0     0     0 zfs:s10u9
    0     0     0     0     0      0     0     0     0 38.9K 19.5M zfs:ltz

Thursday Apr 04, 2013

mdb ::if

The Solaris mdb(1) live and post-mortem debugger gained a really powerful new dcmd called ::if in the Solaris 11.1 release.

As a very quick example of how powerful it can be here is a short one liner to find all the processes that are running with their real and effective uid

> ::ptree | ::if proc_t p_cred->cr_ruid <> p_cred->cr_uid | ::print proc_t p_user.u_comm

Or a similar one to find out all the priv aware processes, this time showing some output:

> ::ptree | ::if proc_t p_cred->cr_priv.crpriv_flags & 0x0002 | ::print proc_t p_user.u_comm
p_user.u_comm = [ "su" ]
p_user.u_comm = [ "nfs4cbd" ]
p_user.u_comm = [ "lockd" ]
p_user.u_comm = [ "nfsmapid" ]
p_user.u_comm = [ "statd" ]
p_user.u_comm = [ "nfs4cbd" ]

The new ::if is very powerful and can do much more advanced things, like substring comparison, than my simple examples, but I choose examples that are useful to me and relevant to security.

Wednesday Feb 20, 2013

Generating a crypt_sha256 hash from the CLI

When doing a completely hands off Solaris installation the System Configuration profile needs to contain the hash for the root password, and the optional inital non root user.

Unfortunately Solaris doesn't currently provide a simple to use command for generating these hashes, but with a very simple bit of Python you can easily create them:


import crypt, getpass, os, binascii

if __name__ == "__main__":
    cleartext = getpass.getpass()
    salt = '$5$' + binascii.b2a_base64(os.urandom(8)).rstrip() + '$'

    print crypt.crypt(cleartext, salt)




« August 2015