Thursday Oct 01, 2015

Changing an Asset's Name

I got a question about an incorrect asset name:

"I discovered a server, but when I discovered it, it was named according to its IP address because of a DNS issue. The incorrect name in the UI angers me. How do I fix it?"

This is relatively simple. In the navigation section, select the All Assets node at the root of the asset tree, then select the Managed Assets tab. Select the incorrectly named asset, and click the pencil icon to edit its properties (including name). Click finish to save the changes.

Thursday Sep 24, 2015

Best Practices for EC Backups

Ops Center has a backup and recovery feature for the Enterprise Controller - you can save the current EC state as a backup file, and restore the EC to that state using the file. It's an important feature, but I've seen a few folks asking for guidelines about how to use it. Every site is different, but here are some broad guidelines that we recommend:

  • Perform a backup at least once a week, and keep at least two backup files.
  • Once you've made a backup file, store it offsite or on a NAS share - don't keep it locally on the EC.
  • You can use a cron job to automate regular backups. Here's a sample cron job to perform a backup:
    0 0 * * 0 /opt/SUNWxvmoc/bin/ecadm backup -o /bigdisk/oc_backups -l /bigdisk/oc_backups
  • Remember that some files and directories are not part of the EC backup for size reasons: isos, flars, firmware images, and Solaris 8-10 and Linux patches.
    Firmware images are automatically re-downloaded in Connected Mode. Isos and flars can be re-imported. You can also do separate backups of your Ops Center libraries via Netbackup or the like.

Some folks have also asked if there's a good way to test the backup and recovery procedure, to make sure it's working. Well, there's really only one way to do it - do an EC backup, and also backup or clone the file systems. Then, uninstall and reinstall the EC, restore from the backup, and make sure that everything looks right.

Take a look at the Backup and Recovery chapter for more information about how to perform a backup.

Thursday Sep 17, 2015

Blank Credentials and Monitoring Delays

I saw a couple of unrelated but short questions this week, so I thought I'd answer them both.

"I tried to edit the credentials used by the Enterprise Controller to access My Oracle Support, but when I open the credentials window, the password field was blank, even though there should be an existing password. What's going on?"

So, naturally, once you've entered a password, we don't want to send that password back to the UI, because it'd be a security risk. In 12.3, though, the asterisks to indicate an existing password aren't showing up. So, your credentials are still there, and they won't be changed unless you specifically enter new credentials and save them.

"I was trying to make sure that file system monitoring was working correctly on a managed system. I made a file to push utilization up past the 90% threshold, which should've generated an incident. However, the incident didn't show up for almost an hour. Why is there a delay?"

You can edit the Alert Monitoring Rule Parameters in Ops Center. However, the thresholds that you set have to be maintained for a certain amount of time before an alert is generated. For a file system utilization alert, the default value for this delay is 45 minutes. You can edit the alert to change this threshold if needed.

Thursday Sep 10, 2015

Updating the OCDoctor on a Managed System

There was a new feature introduced in version 4.38 of the OCDoctor script which has been causing some confusion, so I thought I'd explain it a bit.

Beginning with version 4.38, when you run the OCDoctor script with the --update option on a managed system, the OCDoctor script looks for a newer version on the Enterprise Controller, rather than using external download sites. In connected mode, the Enterprise Controller runs a recurring job to download the latest OCDoctor, which the managed systems can then reach.

This makes updates more feasible if you're in a dark site, and minimizes external connections in other sites. However, if you've downloaded the OCDoctor manually on the EC, you will need to place the OCDoctor zip file in the /var/opt/sun/xvm/images/os/others/ directory on the Enterprise Controller so that managed systems can download it.

Thursday Sep 03, 2015

Installing Ops Center in a Zone

I got a question recently about an Ops Center deployment:

"I'm looking at installing an Enterprise Controller, co-located Proxy Controller, and database inside an Oracle Solaris 11 Zone. Is this doable, and are there any special things I should do to make it work?"

You can install all of these components in an S11 zone. There are a few things that you should do beforehand:

-Limit the ZFS ARC cache size in the global zone. Without a limit, the ZFS ARC can consume memory that should be released. The recommended size of the ZFS ARC cache given in the Sizing and Performance guide is equal to (Physical memory - Enterprise Controller heap size - Database memory) x 70%. For example:

  # limit ZFS memory consumption, example (tune memory to your system):
  echo "set zfs:zfs_arc_max=1024989270" >>/etc/system
  echo "set rlim_fd_cur=1024" >>/etc/system
  # set Oracle DB FDs
  projmod -s -K "process.max-file-descriptor=(basic,1024,deny)" user.root

Make sure the global zone has enough swap space configured. The recommended swap space for an EC is twice the physical memory if the physical memory is less than 16 GB, or 16 GB otherwise. For example:

  volsize=$(zfs get -H -o value volsize rpool/swap)
  if (( $volsize < 16 )); then zfs set volsize=16G rpool/swap; \
  else echo "Swap size sufficient at: ${volsize}G"; fi
  zfs list

In the non-global zone that you're using for the install, set the ulimit:

  echo "ulimit -Sn 1024">>/etc/profile

Finally, run the OCDoctor to check the prerequisites before you install.

Thursday Aug 27, 2015

How Many Systems Can Ops Center Manage?

I saw a question about how many systems you can manage through Ops Center. This is an important question when you're planning a deployment, or looking at expanding an existing deployment.

In general terms, an Enterprise Controller can manage up to 3,000 assets. A Proxy Controller can manage between 350 and 450 assets, although you'll get better performance if there are fewer assets.

The Sizing and Performance guide has more detailed information about the requirements and sizing guidelines for Ops Center.

Thursday Aug 13, 2015

Recovering After a Proxy Controller Crash

I saw a question recently about how to restore your environment if a remote Proxy Controller system fails. This is a good question, and there are a few facets to the answer, depending on your environment.

Recovery is easiest if you have recently backed up the Proxy Controller. The backup file includes asset data, so if you can restore the PC using the backup file, you should be golden.

If you don't have a backup of the Proxy Controller, it's going to take a bit more work. First, you have to migrate the dead PC's assets to a new PC. If you have automatic failover enabled, this happens automatically (hence the name); otherwise you can do it manually.

Then, you can install a new Proxy Controller (using the Linux or Solaris procedure), and migrate the assets to that PC.

Thursday Jul 30, 2015

Recovering LDoms From a Failed Server

I saw a recent question about Logical Domain recovery: If you have a control domain installed on a server and the server goes down with a hardware fault, what options do you have for recovering the logical domains from that control domain?

The answer is that you have options depending on how your environment is configured:

  • If you have the control domain in a server pool, and you have enabled automatic recovery on the LDom, you have the option of watching as the LDom is automatically brought back up on another control domain in the server pool.
  • If the control domain is in a server pool but you didn't enable automatic recovery, you can still manually migrate the guest by deleting the control domain asset. The guests will then be put in the Shutdown Guests list for the server pool, and you can bring it up on another control domain.
  • If you want to add the failed control back in, before you rediscover it and put it back in the server pool, you should log in and make sure that the guest OS isn't running, to avoid split brain issues.

    Take a look at the Recover Logical Domains from a Failed Server how-to for more information.

    Thursday Jun 11, 2015

    Providing Contact Info for ASR

    Ops Center includes a feature called Auto Service Request, which can automatically file service requests for managed hardware. However, I've seen a bit of confusion about how to get it running.

    First, the prereqs - to get ASR running, you need to be in connected mode, and you need to have a set of My Oracle Support (MOS) credentials entered in the Edit Authentications window. Your MOS credentials have to be associated with a customer service identifier (CSI) with rights over the hardware that you want to be enabled for ASR.

    Once you've got that, you'll click the Edit ASR Contact Information action in the Administration section. This opens a window where you specify the default contact information for your assets, which is used for all ASRs by default.

    If you have assets that need separate contact information, you can specify separate ASR contact information for an asset or a group of assets. That info is used in place of the default contact info.

    Finally, once you've got the contact info in the system, you click Enable ASR. This action launches a job to enable the assets for ASR, and it attempts to enable new assets for ASR when they're discovered. From then on, if a critical incident occurs on the hardware, ASR should create a service request for it.

    Take a look at the Auto Service Request chapter of the Admin Guide for more information.

    Thursday Jun 04, 2015

    Enterprise Controllers in Logical Domains

    I saw a few questions about installing Enterprise Controllers in Logical Domains, and what's possible with that sort of deployment. Here are some answers:

    "Is it supported to install the Enterprise Controller in a Logical Domain?"

    Yep. The Certified Systems Matrix lists the supported OSes for EC installation, and Oracle VM Server for SPARC is supported (as are some Oracle Solaris Zones).

    "Can you use Oracle Solaris Cluster to provide High Availability for an Enterprise Controller installed on a Logical Domain?"

    Yes, this is possible. It deserves its own post, so I'll go into more detail on it soon, but yes, it works.

    "If I have two Enterprise Controllers installed on Logical Domains, can I have EC 1 discover and manage the LDom for EC 2, and vice versa?"

    No. The Agent Controllers installed on EC and PC systems are different from standard Agents, and if you install an Agent from one EC on another EC's system, it's going to get confused.

    Thursday May 28, 2015

    Uploading and Deploying Oracle Solaris 11 Files

    I saw a question recently about uploading flat files, such as a config file, or tarballs to an Oracle Solaris 11 library and then deploy them to Oracle Solaris 11 servers. This is an easy task for Oracle Solaris 8, 9, or 10, but it's trickier to find with Oracle Solaris 11.

    Here are the steps to upload and deploy such files with Oracle Solaris 11 in Ops Center, using our software library for the content.

    1. Create an Oracle Solaris 11 pkg which contains the config files. Here's an example for how to do so:
    2. Add that pkg to the repository. (The above example also covers this step.)
    3. Sync Ops Center with the repository so that the new pkg is added to Ops Center's catalog of software.
    4. Create an Ops Center Oracle Solaris 11 Profile that installs the pkg created in Step 1.
    5. Apply the profile in an update plan to the target systems.

    For more information about OS Profiles, see the OS Updates chapter.

    Thursday May 21, 2015

    Special Database Options

    When you're installing Ops Center, you have two options for the product database: You can use an embedded database, that's automatically installed on the Enterprise Controller and managed by Ops Center, or you can use a remote database that you manage yourself.

    With regards to the customer-managed database, I saw an important question recently: When you install this database, do you have to enable any of the advanced or special features? Some folks want to use the bare minimum installation for security reasons.

    The answer here is that Ops Center only requires the base installation; no special features are used. As long as you're using one of the DB versions listed in the Certified Systems Matrix, you're golden.

    Thursday May 14, 2015

    Supported OS, LDom, and firmware versions

    A couple of weeks ago, I did a post about the supported versions for LDoms. As of Ops Center version 12.2.2, LDoms 3.2 is not supported, although the latest Oracle Solaris 11.2 SRU 8 comes with it. Well, this raised a couple of followup questions that I thought I should answer:

    "If LDoms 3.2 isn't supported, are new versions of Oracle Solaris that contain it supported?"

    The OS versions themselves are supported, yes. If you have a non-virtualized S11.2 OS, you can upgrade to SRU 8 without difficulty. It's only on LDoms systems where you should avoid SRUs that contain LDoms 3.2 until it's officially supported.

    "The Certified Systems Matrix recommends using the latest firmware for managed servers. Does the latest firmware have a minimum OS level?"

    No, the firmware and OS levels are independent. Even if you have an LDoms system that's using an earlier version of S11.2, updating the hardware underneath it to use the latest firmware shouldn't cause problems.

    EDIT: Ops Center 12.3 supports LDoms 3.2.

    Thursday May 07, 2015

    Clustered Ops Center installation

    Today's question from an Ops Center user:

    "Can we cluster Ops Center deployments using, say, Solaris Cluster? How would we do it?"

    Well, there are a couple of possible ways that you can install the Enterprise Controller so that it can fail over.

    The first method is to use the documented HA installation, which uses Oracle Clusterware, two or more Enterprise Controller systems, and a remote customer-managed database. The procedures for this kind of installation are documented in the Oracle Solaris and Linux install guides.

    The second is to install the Enterprise Controller in an LDom controlled by Oracle Solaris Cluster, and then have the Enterprise Controller fail over between hosts via Cluster. You can use an embedded database or a remote database with this solution.

    Thursday Apr 30, 2015

    Using Maintenance Mode

    So, after last week's post about blacklisting assets for Ops Center, a couple of people pointed out that there's another - probably easier - way of temporarily stopping an asset from generating ASRs if you're doing maintenance on it: Putting the asset in maintenance mode.

    Putting an asset in maintenance mode stops it from generating new incidents, so that when you power off or reconfigure it, Ops Center doesn't freak out. Ops Center doesn't stop managing the asset, and you can then disable maintenance mode when you're done.

    Bear in mind that Ops Center will also treat the asset as though it's about to go down: If you put a Proxy Controller in maintenance mode it can't run jobs, and if you put an Oracle VM Server for SPARC in maintenance mode Ops Center will try to migrate its guests to another system in the server pool, or stop them if no other system is available.

    Take a look at the Incidents chapter in the Feature Reference Guide for more information about maintenance mode.


    This blog discusses issues encountered in Ops Center and highlights the ways in which the documentation can help you


    « October 2015