By martin on Nov 05, 2009
One problem I'm wrestling in my day job at Web Engineering is: how do you know when a system you are building is ready?
When we build a new system, it goes through the following steps:
Installs the OS and sets up basic configuration, like hostname, domainname, network.
System specific configuration
- Manual steps
This includes things which are too system dependent to automate, like creating a separate zpool for application data on external storage
For me it has been enough to review the puppet logs to determine if the system has been correctly configured, but for my colleagues who aren't using puppet on a daily basis, it isn't. They have been asking "how do we know if a system is ready?", and I've realized that "review the puppet logs" isn't really a helpful answer for most people. What if you have forgotten to add a node definition for the system, and you get the default node configuration. Then puppet will tell you everything is configured correctly - which is partly true: the things puppet has been told to configure are configured, but what about the stuff I forgot to tell it about?
So I've been thinking about using the same approach as I use when I write code: Behavior Driven Development. I.e. you start by specifying the behavior of the program you are developing, after that you start you start to code. This has the benefit of easily letting you known when you are done. If your code pass all the behavior tests, then you can release it.
Translating this to Solaris installs isn't that hard, instead of describing program behavior you describe (operating) system behavior. You can use the same tools as you do for development, and I've been using cucumber for my Ruby on Rails projects, so it is what I picked for my initial testing. Cucumber uses natural language to describe the behavior you want, which makes it easy for non-programmers to understand what it is testing.
When you write the definitions, you should not use technical language, like:
"ssh to the host weblogs and
grep for an
passwd(4) entry for the user martin in
instead use something like
"I should be able to ssh to weblogs, and log in as the user martin",
which is the behavior you want.
Cucumber then takes that definition and translates it into step-by-step instructions which can be validated.
This is how it can look when you run it:
martin@server$ cucumber Feature: sendmail configure Systems should be able to send mail Scenario: should be able to send mail # features/weblogs.sfbay.sun.com/mail.feature:5 When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12 Then I want to send mail to "email@example.com" # features/steps/mail_steps.rb:1 Feature: NIS client Systems on SWAN should be NIS clients Scenario: should be able to match entries in NIS # features/weblogs.sfbay.sun.com/nis.feature:4 When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12 Then I want to lookup "xuan" in the passwd table # features/steps/nis_steps.rb:1 And I want to lookup "onnv" in the hosts table # features/steps/nis_steps.rb:1 Scenario: should be able to make lookups through NIS # features/weblogs.sfbay.sun.com/nis.feature:9 When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12 Then I want to lookup "xuan" through nsswitch.conf # features/steps/nis_steps.rb:5 Feature: SSH access SSH should be configured Scenario: ssh user access # features/weblogs.sfbay.sun.com/ssh.feature:4 Given a user named "martin" # features/steps/ssh_steps.rb:3 When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12 Then the connection should succeed # features/steps/ssh_steps.rb:28 Scenario: no lingering default OpenSolaris user # features/weblogs.sfbay.sun.com/ssh.feature:9 Given a user named "jack" with password "jack" # features/steps/ssh_steps.rb:7 When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12 Then the connection should fail # features/steps/ssh_steps.rb:32 5 scenarios (5 passed) 13 steps (13 passed)
This makes it really easy to see if the behavior of the system is what you expect. All green means it is ready!
The stuff I am working on at the moment is to make the failures understandable by a non-programmer. For example when a scenario fails (and it succeeds to log in to a system where it should have failed), it looks like this:
Scenario: no lingering default OpenSolaris user # features/weblogs.sfbay.sun.com/ssh.feature:9 Given a user named "jack" with password "jack" # features/steps/ssh_steps.rb:7 When connecting to weblogs.sfbay.sun.com using ssh # features/steps/ssh_steps.rb:12 Then the connection should fail # features/steps/ssh_steps.rb:28 expected not nil, got nil (Spec::Expectations::ExpectationNotMetError) ./features/steps/ssh_steps.rb:29:in `/\^the connection should succeed$/' features/weblogs.sfbay.sun.com/ssh.feature:12:in `Then the connection should succeed' Failing Scenarios: cucumber features/weblogs.sfbay.sun.com/ssh.feature:9 # Scenario: no lingering default OpenSolaris user 5 scenarios (1 failed, 4 passed) 13 steps (1 failed, 12 passed)
Once I've gotten a bit beyond playing around with this, I will publish the source if someone is interested in it.