Behavior Driven Infrastructure

One problem I'm wrestling in my day job at Web Engineering is: how do you know when a system you are building is ready?

When we build a new system, it goes through the following steps:

  1. Jumpstart
    Installs the OS and sets up basic configuration, like hostname, domainname, network.
  2. Puppet
    System specific configuration
  3. Manual steps
    This includes things which are too system dependent to automate, like creating a separate zpool for application data on external storage

For me it has been enough to review the puppet logs to determine if the system has been correctly configured, but for my colleagues who aren't using puppet on a daily basis, it isn't. They have been asking "how do we know if a system is ready?", and I've realized that "review the puppet logs" isn't really a helpful answer for most people. What if you have forgotten to add a node definition for the system, and you get the default node configuration. Then puppet will tell you everything is configured correctly - which is partly true: the things puppet has been told to configure are configured, but what about the stuff I forgot to tell it about?

So I've been thinking about using the same approach as I use when I write code: Behavior Driven Development. I.e. you start by specifying the behavior of the program you are developing, after that you start you start to code. This has the benefit of easily letting you known when you are done. If your code pass all the behavior tests, then you can release it.

Translating this to Solaris installs isn't that hard, instead of describing program behavior you describe (operating) system behavior. You can use the same tools as you do for development, and I've been using cucumber for my Ruby on Rails projects, so it is what I picked for my initial testing. Cucumber uses natural language to describe the behavior you want, which makes it easy for non-programmers to understand what it is testing.

When you write the definitions, you should not use technical language, like: "ssh to the host weblogs and grep for an passwd(4) entry for the user martin in /etc/passwd" instead use something like "I should be able to ssh to weblogs, and log in as the user martin", which is the behavior you want. Cucumber then takes that definition and translates it into step-by-step instructions which can be validated.

This is how it can look when you run it:

martin@server$ cucumber
Feature: sendmail configure
  Systems should be able to send mail

  Scenario: should be able to send mail                  # features/weblogs.sfbay.sun.com/mail.feature:5
    When connecting to weblogs.sfbay.sun.com using ssh   # features/steps/ssh_steps.rb:12
    Then I want to send mail to "martin.englund@sun.com" # features/steps/mail_steps.rb:1

Feature: NIS client
  Systems on SWAN should be NIS clients

  Scenario: should be able to match entries in NIS    # features/weblogs.sfbay.sun.com/nis.feature:4
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then I want to lookup "xuan" in the passwd table   # features/steps/nis_steps.rb:1
    And I want to lookup "onnv" in the hosts table     # features/steps/nis_steps.rb:1

  Scenario: should be able to make lookups through NIS # features/weblogs.sfbay.sun.com/nis.feature:9
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then I want to lookup "xuan" through nsswitch.conf # features/steps/nis_steps.rb:5

Feature: SSH access
  SSH should be configured

  Scenario: ssh user access                            # features/weblogs.sfbay.sun.com/ssh.feature:4
    Given a user named "martin"                        # features/steps/ssh_steps.rb:3
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then the connection should succeed                 # features/steps/ssh_steps.rb:28

  Scenario: no lingering default OpenSolaris user      # features/weblogs.sfbay.sun.com/ssh.feature:9
    Given a user named "jack" with password "jack"     # features/steps/ssh_steps.rb:7
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then the connection should fail                    # features/steps/ssh_steps.rb:32

5 scenarios (5 passed)
13 steps (13 passed)

This makes it really easy to see if the behavior of the system is what you expect. All green means it is ready!

The stuff I am working on at the moment is to make the failures understandable by a non-programmer. For example when a scenario fails (and it succeeds to log in to a system where it should have failed), it looks like this:

  Scenario: no lingering default OpenSolaris user      # features/weblogs.sfbay.sun.com/ssh.feature:9
    Given a user named "jack" with password "jack"     # features/steps/ssh_steps.rb:7
    When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12
    Then the connection should fail                    # features/steps/ssh_steps.rb:28
      expected not nil, got nil (Spec::Expectations::ExpectationNotMetError)
      ./features/steps/ssh_steps.rb:29:in `/\^the connection should succeed$/'
      features/weblogs.sfbay.sun.com/ssh.feature:12:in `Then the connection should succeed'

Failing Scenarios:
cucumber features/weblogs.sfbay.sun.com/ssh.feature:9 # Scenario: no lingering default OpenSolaris user

5 scenarios (1 failed, 4 passed)
13 steps (1 failed, 12 passed)

It is not obvious that expected not nil, got nil means that it could log in when it shouldn't be able to, so I am working on some custom rspec matchers to generate better error messages.

Once I've gotten a bit beyond playing around with this, I will publish the source if someone is interested in it.

Comments:

definately interested!!!

Posted by duritong on November 05, 2009 at 10:07 PM PST #

It's great to see someone else attempting behaviour-driven systems administration with Cucumber. :-)

I've been working on a project the last few months called cucumber-nagios (http://auxesis.github.com/cucumber-nagios) that outputs the results of your Cucumber features in the Nagios plugin format, so you can essentially use your features as Nagios checks.

Funnily enough, last weekend I started expanding cucumber-nagios's library of Cucumber step definitions to cover SSH. I have a blog post lined up in the next day covering the new SSH steps i've added, but i'm not trying to steal your thunder. :-)

I'd certainly be interested in seeing your step definition source. From the features you posted above, i'm guessing your Given steps populate some sort of instance variable that gets passed around between steps?

Also, in terms of your Then steps, maybe rephrasing the question you're asking might be easier for people to understand? i.e. instead of talking about specific files such as /etc/passwd, /etc/nsswitch.conf, you say 'Then the local user "jack" should exist', and 'Then the network user "xuan" should exist'. That way people without knowledge of your operating system's configuration structure can understand what's going on.

Anyhow, thanks for the post - I hope it gets people thinking!

Posted by Lindsay Holmwood on November 05, 2009 at 10:41 PM PST #

Pushing the 'Infrastructure as Code' paradigm to the next level.

I'd made a 'Puppet == test suite' analogy before, but hadn't realised that it's more 'Puppet == build tool'.

Please push this out as a project if you can.

Posted by Dick Davies on November 06, 2009 at 12:01 AM PST #

You're probably not going to like this, but it works:

package EVERYTHING. Everything. Including system configuration.

For those systems that need special configuration that deviates from the standard, make separate packages.

Then add them with JumpStart(TM) in the postinstall.

Game over.

It works, even for very large number of servers (tens of thousands of systems, in LOM environment). And it's deterministic. And CMM level 2 and onwards compliant.

Posted by UX-admin on November 06, 2009 at 03:31 AM PST #

Post a Comment:
Comments are closed for this entry.
About

martin

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today