Wednesday Mar 10, 2010

Monitoring the Sun Storage 7000 Appliance from Oracle Grid Control

Over the past few months I've blogged on various monitoring and alerting topics for Sun Storage 7000 Appliances. Besides my favorite of the blogs (Tweeting your Sun Storage 7000 Appliance Alerts), the culmination of this monitoring work is now available as the Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0, for use with the just shipped 2010.Q1 software release for the Sun Storage 7000 Appliance Family. Phew, that's a bit of a mouthful for a title but I'll just refer to it as the SS7000MPfOEMGC, does that help? Well, maybe not ;-)

Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller creates a coupling between the enterprise-wide monitoring provided by Oracle Grid Control and the monitoring and analytics provided by Sun Storage 7000 Appliances. If you are not familiar with Oracle Grid Control, there is a nice write-up within the Installation and Configuration Guide for Oracle Grid Control. In a nutshell, Oracle Grid Control aids in monitoring your vertical data center rather than simply being an aggregation of horizontal health information. The documentation describes it as software and the infrastructure it runs on but I would simply call it a "Vertical Data Center Monitoring Environment".

The goal of the plug-in to Oracle Grid Control is to facilitate a Database Administrator in their use of Sun Storage 7000 Appliances without attempting to reproduce the world-class analytics available within Sun Storage 7000 Appliances. In other words, the goal is to create a bridge between the world of Database Administration and the world of Storage Administration with just enough information so the two worlds can have dialog about the environment. Specifically, the Plug-in for Sun Storage 7000 Appliances is targeted at the following tasks:


  • Connecting Database deployments with Sun Storage 7000 resources that provide storage infrastructure to the database
  • Understanding the performance metrics of a Database from the perspective of the Appliance (what cache resources are being used for a database, what network resources and the performance being delivered, and how various storage abstractions are being used by the database)
  • Providing a Federated view of Sun Storage 7000 Appliances deployed in the environment (including storage profiles and capacities, network information and general accounting information about the appliances)
  • Providing detailed performance metrics for use in initial service delivery diagnostics (these metrics are used to have more detailed conversations with the Storage Administrator when additional diagnostics are required)

Let's take a look at one of the more interesting scenarios as a simple way of showing the plug-in at work rather than reproducing the entire Installation Guide in blog-form.

Download the Plug-in for Sun Storage 7000 Appliances, Unzip the downloaded file, and read the Installation Guide included with the plug-in.

Follow the instructions for installing, deploying the plug-in to agents and adding instances of Sun Storage 7000 Appliances to the environment for monitoring. Each instance added takes about 60 minutes to fully populate with information (this is simply the nature of this being a polling environment and the plug-in is set-up to monitor data sets that don't change often less frequently ... 60 minutes ... than data sets that do change frequently ... 10 minute intervals).

Once data is funneling in, all of the standard appliance-centric views of the information are available (including the individual metrics that the plug-in collects) as well as a view of some of the important high-level information presented on the home page for an instance (provided you are using Oracle Grid Control 10.2.0.5). Here is a view of a single appliance instance's home page:

Looking into the Metrics collected for an appliance brings you to a standard displays of single metrics (as shown below) or tables of related metrics (all standard navigation in Oracle Grid Controller for plug-in components).

Included in the plug-in for Sun Storage 7000 Appliances are 5 reports. Of these reports, 3 run against a single instance of a Sun Storage 7000 Appliance and are available from both the context of the single instance and the Oracle Grid Control Reports Tab while 2 run against all monitored instances of Sun Storage 7000 Appliances and are only available from the Reports Tab. Among the 5 Reports are 2 that combine information about Databases deployed against NFS mount points and Sun Storage 7000 Appliances that export those NFS mount points. The two reports are:


  • Database to Appliance Mapping Report - Viewable from a single target instance or the Reports Tab, this report shows databases deployed against NFS shares from a single Sun Storage 7000 Target Instance
  • Federated Database to Appliance Mapping Report - Viewable only from the Reports Tab, this report shows databases deployed against NFS shares from all monitored Sun Storage 7000 Appliances

Looking at the "Master" (top-level) Database to Appliance Mapping Report (shown below) you will see a "Filter" (allowing you to scope the information in the table to a single Database SID) and a table that correlates the filtered Database SID to Network File Systems shared by specific appliances along with the Storage IP Address that the share is accessed through, the appliance's Storage Network Interface and the name that the appliance is referred to as throughout this Grid Control instance.

From the Master report, 4 additional links are provided to more detailed information that is filtered to the appliance abstraction that is used by the Database SID. The links in the columns navigate in the following way:


  • Database Link - This link takes the viewer to a drill-down table that shows all of the files deployed on the shares identified in the first table. With this detail report, and administrator could see exactly what files are deployed where. The table also contains the three links identified next.
  • Network File System - Takes the viewer down to a detailed report showing metadata about the share created on the appliance, how the cache is used (ARC and L2ARC) for this share and general capacity information for the share.
  • Storage IP Address - Takes the viewer to the Metric Details that relate to the appliance configuration (serial number, model, etc...).
  • Storage Network Interface - Takes the viewer to metadata about the network interface as well as reports on the Network Interface KB/sec and NFS Operations Per Second (combined with the NFS Operations Per Second that are allocated to serving the share that the database resides on)

The detail reports for the Network File System and Storage Network Interface (both of which are not directly accessible from the Reports Tab) use a combination of current metrics and graphical time-axis data, as shown in the following report:

Wherever applicable, the Detail Reports drill further into Metric Details (that could also be accessed through an appliance instance target home page).

It is important to note that several of these reports combine a substantial amount of data into a single page. This approach can create rather lengthy report generation times (in worst case scenarios up to 5 minutes). It is always possible to view individual metrics through the monitoring home page. As metric navigation is much more focused and relates to a single metric, metric navigation always performs faster and is preferred unless the viewer is looking for a more complex assembly of information. With the reports, an administrator can view network performance and storage performance side by side which may be more helpful in diagnosing service delivery issues than navigating through single metric data points.

In addition to a substantial number of collected metrics there are several alerts that are generated on various appliance thresholds that can occur throughout the operation of target appliances.

Conclusion


Oracle Grid Control gives a fully integrated view of the "Vertical" data center, combining software infrastructure with hardware infrastructure (including storage appliances). Sun Storage 7000 Management Plug-in for Oracle Enterprise Manager 10g Grid Controller 1.0 presents Sun Storage 7000 Appliances within the vertical context and presents metrics and reports tailored specifically towards Sun Storage 7000 Appliances as viewed by a Database Administrator. For more information on the plug-in and software discussed in this entry:

Wednesday Dec 16, 2009

The SNMP Service on a Sun Storage 7000 Appliance

Without a doubt, SNMP rules the playground in terms of monitoring hardware assets, and many software assets, in a data center monitoring ecosystem. It is the single biggest integration technology I'm asked about and that I've encountered when discussing monitoring with customers.

Why does SNMP have such amazing staying power?


  • It's extensible (vendors can provide MIBs and extend existing MIBs)
  • It's simple (hierarchical data rules and really it boils down to GET, SET, TRAP)
  • It's ubiquitous (monitoring tools accept SNMP, systems deliver SNMP)
  • It operates on two models, real time (traps) and polling (get)
  • It has aged gracefully (security extensions in v4 did not destroy it's propagation)

To keep the SNMP support in the Sun Storage 7000 Appliances relatively succinct, I am going to tackle this in two separate posts. This first post shows how to enable SNMP and what you get "out of the box" once it's enabled. The next post discusses how to deliver more information via SNMP (alerts with more information and threshold violations).

To get more information on SNMP on the Sun Storage 7000 and to download the MIBs that will be discussed here, go to the Help Wiki on a Sun Storage 7000 Appliance (or the simulator):


  • SNMP - https://[hostname]:215/wiki/index.php/Configuration:Services:SNMP

Also, as I work at Sun Microsystems, Inc., all of my examples of walking MIBs on a Sun Storage 7000 Appliance or receiving traps will be from a Solaris-based system. There are plenty of free / open source / trial packages for other Operating System platforms so you will have to adapt this content appropriately for your platform.

One more note as I progress in this series, all of my examples are from the CLI or from scripts, so you won't find many pretty pictures in the series :-)

Enabling SNMP on the Sun Storage 7000 Appliance gives you the ability to:


  • Receive traps (delivered via Sun's Fault Manager (FM) MIB)
  • GET system information (MIB-II System, MIB-II Interfaces, Sun Enterprise MIB)
  • GET information customized to the appliance (using the Sun Storage AK MIB)

Enabling alerts (covered in the next article) extends the SNMP support by delivering targeted alerts via the AK MIB itself.

Enable SNMP


The first thing we'll want to do is log into a target Sun Storage 7000 Appliance via SSH and check if SNMP is enabled.


aie-7110j:> configuration services snmp
aie-7110j:>configuration services snmp> ls
Properties:
<status> = disabled
community = public
network =
syscontact =
trapsinks =

aie-7110j:configuration services snmp>

Here you can see it is currently disabled and that we have to set up all of the SNMP parameters. The most common community string to this day is "public" and as we will not be changing system information via SNMP we will keep it. The "network" parameter to use for us is 0.0.0.0/0, this allows access to the MIB from any network. Finally, I will add a single trapsink so that any traps get sent to my management host. The last step shown is to enable the service once the parameters are committed.


aie-7110j:configuration services snmp> set network=0.0.0.0/0
network = 0.0.0.0/0 (uncommitted)
aie-7110j:configuration services snmp> set syscontact="Paul Monday"
syscontact = Paul Monday (uncommitted)
aie-7110j:configuration services snmp> set trapsinks=10.9.166.33
trapsinks = 10.9.166.33 (uncommitted)
aie-7110j:configuration services snmp> commit
aie-7110j:configuration services snmp> enable
aie-7110j:configuration services snmp> show
Properties:
<status> = online
community = public
network = 0.0.0.0/0
syscontact = Paul Monday
trapsinks = 10.9.166.33

From the appliance perspective we are now up and running!

Get the MIBs and Install Them


As previously mentioned, all of the MIBs that are unique to the Sun Storage 7000 Appliance are also distributed with the appliance. Go to the Help Wiki and download them, then move them to the appropriate location for monitoring.

On the Solaris system I'm using, that location is /etc/sma/snmp/mibs. Be sure to browse the MIB for appropriate tables or continue to look at the Help Wiki as it identifies relevant OIDs that we'll be using below.

Walking and GETting Information via the MIBs


Using standard SNMP operations, you can retrieve quite a bit of information. As an example from the management station, we will retrieve a list of shares available from the system using snmpwalk:


-bash-3.00# ./snmpwalk -c public -v 2c isv-7110h sunAkShareName
SUN-AK-MIB::sunAkShareName.1 = STRING: pool-0/MMC/deleteme
SUN-AK-MIB::sunAkShareName.2 = STRING: pool-0/MMC/data
SUN-AK-MIB::sunAkShareName.3 = STRING: pool-0/TestVarious/filesystem1
SUN-AK-MIB::sunAkShareName.4 = STRING: pool-0/oracle_embench/oralog
SUN-AK-MIB::sunAkShareName.5 = STRING: pool-0/oracle_embench/oraarchive
SUN-AK-MIB::sunAkShareName.6 = STRING: pool-0/oracle_embench/oradata
SUN-AK-MIB::sunAkShareName.7 = STRING: pool-0/AnotherProject/NoCacheFileSystem
SUN-AK-MIB::sunAkShareName.8 = STRING: pool-0/AnotherProject/simpleFilesystem
SUN-AK-MIB::sunAkShareName.9 = STRING: pool-0/default/test
SUN-AK-MIB::sunAkShareName.10 = STRING: pool-0/default/test2
SUN-AK-MIB::sunAkShareName.11 = STRING: pool-0/EC/tradetest
SUN-AK-MIB::sunAkShareName.12 = STRING: pool-0/OracleWork/simpleExport

Next, I can use snmpget to obtain a mount point for the first share:

-bash-3.00# ./snmpget -c public -v 2c isv-7110h sunAkShareMountpoint.1
SUN-AK-MIB::sunAkShareMountpoint.1 = STRING: /export/deleteme

It is also possible to get a list of problems on the system identified by problem code:

-bash-3.00# ./snmpwalk -c public -v 2c isv-7110h sunFmProblemUUID
SUN-FM-MIB::sunFmProblemUUID."91e97860-f1d1-40ef-8668-dc8fb85679bb" = STRING: "91e97860-f1d1-40ef-8668-dc8fb85679bb"

And then turn around and retrieve the associated knowledge article identifier:

-bash-3.00# ./snmpget -c public -v 2c isv-7110h sunFmProblemCode.\\"91e97860-f1d1-40ef-8668-dc8fb85679bb\\"
SUN-FM-MIB::sunFmProblemCode."91e97860-f1d1-40ef-8668-dc8fb85679bb" = STRING: AK-8000-86

The FM-MIB does not contain information on severity, but using the problem code I can SSH into the system and retrieve that information:

isv-7110h:> maintenance logs select fltlog select uuid="91e97860-f1d1-40ef-8668-dc8fb85679bb"
isv-7110h:maintenance logs fltlog entry-005> ls
Properties:
timestamp = 2009-12-15 05:55:37
uuid = 91e97860-f1d1-40ef-8668-dc8fb85679bb
desc = The service processor needs to be reset to ensure proper functioning.
type = Major Defect

isv-7110h:maintenance logs fltlog entry-005>

Take time to inspect the MIBs through your MIB Browser to understand all of the information available. I tend to shy away from using SNMP for getting system information and instead write scripts and workflows as much more information is available directly on the system, I'll cover this in a later article.

Receive the Traps


Trap receiving on Solaris is a piece of cake, at least for demonstration purposes. What you choose to do with the traps is a whole different process. Each tool has it's own trap monitoring facilities that will hand you the fields in different ways. For this example, Solaris just dumps the traps to the console.

Locate the "snmptrapd" binary on your Solaris system and start monitoring:


-bash-3.00# cd /usr/sfw/sbin
-bash-3.00# ./snmptrapd -P
2009-12-16 09:27:47 NET-SNMP version 5.0.9 Started.

From there you can wait for something bad to go wrong with your system or you can provoke it yourself. Fault Management can be a bit difficult to provoke intentionally since things one thinks would provoke a fault are actually administrator activites. Pulling a disk drive is very different from a SMART drive error on a disk drive. Similarly, pulling a Power Supply is different from tripping over a power cord and yanking it out. The former is not a fault since it is a complex operation requiring an administrator to unseat the power supply (or disk) whereas the latter occurs out in the wild all the time.

Here are some examples of FM traps I've received through this technique using various "malicious" techniques on a lab system ;-)

Here is an FM Trap when I "accidentally" tripped over a power cord in the lab. Be careful when you do this so you don't pull the system off the shelf if it is not racked properly (note that I formatted this a little bit from the raw output):


2009-11-16 12:25:34 isv-7110h [172.20.67.78]:
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1285895753) 148 days, 19:55:57.53
SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-FM-MIB::sunFmProblemTrap
SUN-FM-MIB::sunFmProblemUUID."2c7ff987-6248-6f40-8dbc-f77f22ce3752" = STRING: "2c7ff987-6248-6f40-8dbc-f77f22ce3752"
SUN-FM-MIB::sunFmProblemCode."2c7ff987-6248-6f40-8dbc-f77f22ce3752" = STRING: SENSOR-8000-3T
SUN-FM-MIB::sunFmProblemURL."2c7ff987-6248-6f40-8dbc-f77f22ce3752" = STRING: http://sun.com/msg/SENSOR-8000-3T

Notice again that I have a SunFmProblemUUID that I can turn around and shell into the system to obtain more details (similarly to what was shown in the last section). Again, the next article will contain an explanation of Alerts. Using the AK MIB and Alerts, we can get many more details pushed out to us via an SNMP Trap, and we have finer granularity as to the alerts that get pushed.

Here, I purchased a very expensive fan stopper-upper device from a fellow tester. It was quite pricey, it turns out it is also known as a "Twist Tie". Do NOT do this at home, seriously, the decreased air flow through the system can cause hiccups in your system.


DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1285889746) 148 days, 19:54:57.46
SNMPv2-MIB::snmpTrapOID.0 = OID: SUN-FM-MIB::sunFmProblemTrap
SUN-FM-MIB::sunFmProblemUUID."cf480476-51b7-c53a-bd07-c4df59030284" = STRING: "cf480476-51b7-c53a-bd07-c4df59030284"
SUN-FM-MIB::sunFmProblemCode."cf480476-51b7-c53a-bd07-c4df59030284" = STRING: SENSOR-8000-26
SUN-FM-MIB::sunFmProblemURL."cf480476-51b7-c53a-bd07-c4df59030284" = STRING: http://sun.com/msg/SENSOR-8000-26

You will receive many, many other traps throughout the day including the Enterprise MIB letting us know when the system starts up or any other activities.

Wrap it Up


In this article, I illustrated enabling the SNMP Service on the Sun Storage 7000 Appliance via an SSH session. I also showed some basic MIB walking and traps that you'll receive once SNMP is enabled.

This is really simply the "start" of the information we can push through the SNMP pipe from a system. In the next article I'll show how to use Alerts on the system with the SNMP pipe so you can have more control over the events on a system that you wish to be notified about.

About

pmonday

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today