New Features in XCP1050
By Bob Hueston on Nov 05, 2007
ServicetagsServicetags is part of the Sun Connection infrastructure. The basic idea is to enable customers to better track their Sun assets, and by communicating with Sun, determine what updates are available, what needs patching, etc.
Servicetags were introduced in Solaris 10. It's essentially a piece of software that runs on a server and can communicate the list of software products installed on the server, the product versions, patch levels, and so forth. Customers can then run a Java application on their workstation to discover Sun products throughout their datacenter, and at their discretion, send that list to Sun to register the products and/or check for updates.
In XCP1050, the servicetags software now also runs on the service processor. This allows customers to discover the hardware assets in their datacenter, including the machine type, part number, and serial number.
On new machines, servicetags are enabled by default; if you upgrade from XCP1041 or earlier, you'll need to enable servicetags manually. The commands to manage servicetags on the service processor are
showservicetags. The usage is very straightforward, for example:
You can download the discovery application here.
XSCF> setservicetag -c disable
Browser User InterfaceAnyone who used the Browser User Interface (called BUI, or Web User Interface) in XCP1041 or earlier probably found there were many tasks that could not be accomplished through the BUI, but required you to use the command line interface. In XCP1050, that all changed. Now, just about everything you could do through the command line can now be done through your web browser. A lot of hard work went into these BUI updates, and I think it really shows.
Fault LEDs and clearfaultIt might be surprising, but the most difficult aspects of collaborating with another company on a new product were things that seem the most trivial: bezel color, whether buttons in the Browser UI should have square or rounded corners, and when and how should LEDs blink. Of all these, I think LEDs were the most contentious.
Sun adheres to the ANSI/VITA 40-2003 Service Indicator Standard (SIS) for most of its products. Fujitsu, however, adheres to a different standard. The differences between the two standards are minor, but when a customer is managing a large number of systems, any variation in indicator standards can be a source of confusion. In XCP1040, we shipped with a compromise, meaning both Sun and Fujitsu were unhappy with the solution.
For XCP1050, we reworked the fault indicator policies for both companies. In fact, we added the ability of the firmware to tell if the server was Sun-branded or Fujitsu-branded, and based on the branding, it adhered to the respective company's fault LED standards. Now, on a Sun branded system for example, the fault LEDs adhere to a simple policy:
- If a FRU's (field replaceable unit) fault LED is on, then there is a fault in the chassis and it has been isolated to that specific FRU with very high confidence. In other words, if the fault LED is on, then we know for a fact the FRU is broken.
- If the chassis fault LED (on the front panel) is on, then there is a fault in the chassis somewhere.
fmadm faultywill identify the list of suspected FRUs.
Furthermore, on Sun branded systems, cycling chassis power can no longer be used to clear the fault LEDs for FRUs or the chassis. FRUs don't magically become "better" just because you cycled power; if the FRU was faulty, it's still faulty, so the fault LED shouldn't magically turn off. For Sun branded systems, the fault LEDs will remain on until the customer or service engineer actively clears the fault condition, by removing/replacing the faulty FRU, or by running the
clearfault could be used to mark a FRU as not faulty; however, almost all FRUs still required a chassis power cycle. This was so that the chassis could perform a power-on self test of the FRU before reconfiguring it into a running server. The last thing you want is someone to manually type
clearfault /CMU#0 and then discover that CMU#0 really was faulty and bring down the server.
XCP1050 was enhanced so that
clearfault could selectively initiate self test on many FRUs, without requiring a chassis power cycle. When you run
clearfault now, it will check to see if it is possible to run self test without disturbing the running system. Some FRUs cannot be tested during operation; other FRUs may be in use in such a way that self test cannot be performed. If the FRU is safe to be tested,
clearfault will initiate the self test, and if successful, the fault condition will be cleared. If
clearfault cannot test the FRU, it will be marked to be cleared at the next chassis power cycle.
A great deal of effort went into improving the fault detection, isolation, reporting, and service interface, to make it more accurate and more consistent with other Sun products.