Thursday Nov 19, 2015

Oracle VM Server for SPARC Support

I saw a couple of related questions about Oracle VM Server for SPARC 3.3 support recently:

Are T7 servers with Oracle VM Server for SPARC 3.3 supported in Ops Center?

Yep. You can take a look at the Certified Systems Matrix to see the full list of supported systems.

What if I want to downgrade systems to Oracle VM Server for SPARC 3.2?

That's not something we've tested or certified.

Are live migration and automatic recovery supported for OVMSS 3.3?

Yes, the full feature set for OVMSS is supported for version 3.3. One issue worth mentioning, though, is that the option for setting the maximum bandwidth for a guest doesn't work properly in 3.3; we're working on that issue.

Thursday Nov 12, 2015

Updating an Oracle Solaris 8 system

Last week, we talked a bit about how to update branded zones. I saw a question this week about upgrading Oracle Solaris 8 systems, and branded zones figure into the answer here too.

"I've got some legacy applications running on some Oracle Solaris 8 systems. I'd like to get them running on S11 if possible. What can I do to upgrade these systems?"

Solaris 8 is old enough that there isn't an upgrade path all the way to Oracle Solaris 11. If you have legacy applications running on Solaris 8, and you want to move to a newer OS, you have a couple of options.

If you can move your applications onto S11, then you can use Ops Center to provision new S11 systems and then get the applications running on the new system.

However, some applications might not work on S11. In that case, you could create a Flash Archive of the existing S8 system, and use that FLAR to make an S8 branded zone on an S10 system. It's not S11, but it's not too shabby.

Thursday Nov 05, 2015

Branded Zones Questions

Branded Zones are handy if you want to run Oracle Solaris 10 zones on top of an S11 platform. However, they do work a bit differently from other zones in Ops Center. I've received a couple of questions about branded zones that I thought I'd try to clear up:

"I have a Control Domain with an Oracle Solaris 10 branded zone on it. Can I upgrade the control domain without Ops Center getting confused by the branded zone?"

Yes. When you upgrade a control domain or OS with a branded zone on it, the branded zone is skipped by design.

"Okay, so how do I upgrade the branded zone itself?"

Patching on a branded zone is similar to patching a global zone. If the branded zone doesn't have an agent, you can switch to agent management using the Switch Management Access option, then patch the branded zone normally.

Thursday Oct 15, 2015

Checking the Health of an Ops Center System

I saw a general question about keeping Ops Center running:

"I want to do a regular diagnostic to make sure that my Enterprise Controller and Proxy Controller systems are healthy. What tools can I use to do that?"

There are a few tools that you can use for this purpose.

First, there's the UI. The Administration section in the UI shows the status of all Proxy Controllers, and the current status of the Enterprise Controller services.

Outside of the UI, there are a number of tools bundled with the OCDoctor script that you can use. The OCDoctor and its toolbox are in the /var/opt/sun/xvm/OCDoctor/ folder on the Enterprise Controller and Proxy Controllers. Here are some of the options and tools you can use:

  • --update  - You should run this first, to make sure that you've got the latest version of the OCDoctor.
  • --troubleshoot  - This option will provide troubleshooting information for some known issues. You can also add the --fix option to apply fixes for some of these issues.
  • --check-connectivity  - This option will check for connectivity issues.
  • toolbox/ -b  - This script checks for issues with your libraries.

These options should help you stay aware of how Ops Center is doing.

Thursday Oct 08, 2015

Agent versus Agentless Management

I've seen a couple of questions recently about the differences between Agent managed and Agentless assets, so I thought I'd explain the differences and the relative merits.

You can manage operating systems and virtualization technologies in one of two ways - by installing an Ops Center Agent on them, or by providing Ops Center with credentials that it can use to reach the asset. The agent is tailored to the system - there are separate types of Agents for Zones and LDoms.

If you don't want to have anything else installed on a system, agentless management can provide management and monitoring capabilities. However, there are some features that aren't available for assets that are managed agentlessly. There's a table in the OS Management chapter that explains what features are and are not available with agentless assets.

Thursday Oct 01, 2015

Changing an Asset's Name

I got a question about an incorrect asset name:

"I discovered a server, but when I discovered it, it was named according to its IP address because of a DNS issue. The incorrect name in the UI angers me. How do I fix it?"

This is relatively simple. In the navigation section, select the All Assets node at the root of the asset tree, then select the Managed Assets tab. Select the incorrectly named asset, and click the pencil icon to edit its properties (including name). Click finish to save the changes.

Thursday Sep 24, 2015

Best Practices for EC Backups

Ops Center has a backup and recovery feature for the Enterprise Controller - you can save the current EC state as a backup file, and restore the EC to that state using the file. It's an important feature, but I've seen a few folks asking for guidelines about how to use it. Every site is different, but here are some broad guidelines that we recommend:

  • Perform a backup at least once a week, and keep at least two backup files.
  • Once you've made a backup file, store it offsite or on a NAS share - don't keep it locally on the EC.
  • You can use a cron job to automate regular backups. Here's a sample cron job to perform a backup:
    0 0 * * 0 /opt/SUNWxvmoc/bin/ecadm backup -o /bigdisk/oc_backups -l /bigdisk/oc_backups
  • Remember that some files and directories are not part of the EC backup for size reasons: isos, flars, firmware images, and Solaris 8-10 and Linux patches.
    Firmware images are automatically re-downloaded in Connected Mode. Isos and flars can be re-imported. You can also do separate backups of your Ops Center libraries via Netbackup or the like.

Some folks have also asked if there's a good way to test the backup and recovery procedure, to make sure it's working. Well, there's really only one way to do it - do an EC backup, and also backup or clone the file systems. Then, uninstall and reinstall the EC, restore from the backup, and make sure that everything looks right.

Take a look at the Backup and Recovery chapter for more information about how to perform a backup.

Thursday Sep 17, 2015

Blank Credentials and Monitoring Delays

I saw a couple of unrelated but short questions this week, so I thought I'd answer them both.

"I tried to edit the credentials used by the Enterprise Controller to access My Oracle Support, but when I open the credentials window, the password field was blank, even though there should be an existing password. What's going on?"

So, naturally, once you've entered a password, we don't want to send that password back to the UI, because it'd be a security risk. In 12.3, though, the asterisks to indicate an existing password aren't showing up. So, your credentials are still there, and they won't be changed unless you specifically enter new credentials and save them.

"I was trying to make sure that file system monitoring was working correctly on a managed system. I made a file to push utilization up past the 90% threshold, which should've generated an incident. However, the incident didn't show up for almost an hour. Why is there a delay?"

You can edit the Alert Monitoring Rule Parameters in Ops Center. However, the thresholds that you set have to be maintained for a certain amount of time before an alert is generated. For a file system utilization alert, the default value for this delay is 45 minutes. You can edit the alert to change this threshold if needed.

Thursday Sep 10, 2015

Updating the OCDoctor on a Managed System

There was a new feature introduced in version 4.38 of the OCDoctor script which has been causing some confusion, so I thought I'd explain it a bit.

Beginning with version 4.38, when you run the OCDoctor script with the --update option on a managed system, the OCDoctor script looks for a newer version on the Enterprise Controller, rather than using external download sites. In connected mode, the Enterprise Controller runs a recurring job to download the latest OCDoctor, which the managed systems can then reach.

This makes updates more feasible if you're in a dark site, and minimizes external connections in other sites. However, if you've downloaded the OCDoctor manually on the EC, you will need to place the OCDoctor zip file in the /var/opt/sun/xvm/images/os/others/ directory on the Enterprise Controller so that managed systems can download it.

Thursday Sep 03, 2015

Installing Ops Center in a Zone

I got a question recently about an Ops Center deployment:

"I'm looking at installing an Enterprise Controller, co-located Proxy Controller, and database inside an Oracle Solaris 11 Zone. Is this doable, and are there any special things I should do to make it work?"

You can install all of these components in an S11 zone. There are a few things that you should do beforehand:

-Limit the ZFS ARC cache size in the global zone. Without a limit, the ZFS ARC can consume memory that should be released. The recommended size of the ZFS ARC cache given in the Sizing and Performance guide is equal to (Physical memory - Enterprise Controller heap size - Database memory) x 70%. For example:

  # limit ZFS memory consumption, example (tune memory to your system):
  echo "set zfs:zfs_arc_max=1024989270" >>/etc/system
  echo "set rlim_fd_cur=1024" >>/etc/system
  # set Oracle DB FDs
  projmod -s -K "process.max-file-descriptor=(basic,1024,deny)" user.root

Make sure the global zone has enough swap space configured. The recommended swap space for an EC is twice the physical memory if the physical memory is less than 16 GB, or 16 GB otherwise. For example:

  volsize=$(zfs get -H -o value volsize rpool/swap)
  if (( $volsize < 16 )); then zfs set volsize=16G rpool/swap; \
  else echo "Swap size sufficient at: ${volsize}G"; fi
  zfs list

In the non-global zone that you're using for the install, set the ulimit:

  echo "ulimit -Sn 1024">>/etc/profile

Finally, run the OCDoctor to check the prerequisites before you install.

Thursday Aug 27, 2015

How Many Systems Can Ops Center Manage?

I saw a question about how many systems you can manage through Ops Center. This is an important question when you're planning a deployment, or looking at expanding an existing deployment.

In general terms, an Enterprise Controller can manage up to 3,000 assets. A Proxy Controller can manage between 350 and 450 assets, although you'll get better performance if there are fewer assets.

The Sizing and Performance guide has more detailed information about the requirements and sizing guidelines for Ops Center.

Thursday Aug 13, 2015

Recovering After a Proxy Controller Crash

I saw a question recently about how to restore your environment if a remote Proxy Controller system fails. This is a good question, and there are a few facets to the answer, depending on your environment.

Recovery is easiest if you have recently backed up the Proxy Controller. The backup file includes asset data, so if you can restore the PC using the backup file, you should be golden.

If you don't have a backup of the Proxy Controller, it's going to take a bit more work. First, you have to migrate the dead PC's assets to a new PC. If you have automatic failover enabled, this happens automatically (hence the name); otherwise you can do it manually.

Then, you can install a new Proxy Controller (using the Linux or Solaris procedure), and migrate the assets to that PC.

Thursday Jul 30, 2015

Recovering LDoms From a Failed Server

I saw a recent question about Logical Domain recovery: If you have a control domain installed on a server and the server goes down with a hardware fault, what options do you have for recovering the logical domains from that control domain?

The answer is that you have options depending on how your environment is configured:

  • If you have the control domain in a server pool, and you have enabled automatic recovery on the LDom, you have the option of watching as the LDom is automatically brought back up on another control domain in the server pool.
  • If the control domain is in a server pool but you didn't enable automatic recovery, you can still manually migrate the guest by deleting the control domain asset. The guests will then be put in the Shutdown Guests list for the server pool, and you can bring it up on another control domain.
  • If you want to add the failed control back in, before you rediscover it and put it back in the server pool, you should log in and make sure that the guest OS isn't running, to avoid split brain issues.

    Take a look at the Recover Logical Domains from a Failed Server how-to for more information.

    Thursday Jun 11, 2015

    Providing Contact Info for ASR

    Ops Center includes a feature called Auto Service Request, which can automatically file service requests for managed hardware. However, I've seen a bit of confusion about how to get it running.

    First, the prereqs - to get ASR running, you need to be in connected mode, and you need to have a set of My Oracle Support (MOS) credentials entered in the Edit Authentications window. Your MOS credentials have to be associated with a customer service identifier (CSI) with rights over the hardware that you want to be enabled for ASR.

    Once you've got that, you'll click the Edit ASR Contact Information action in the Administration section. This opens a window where you specify the default contact information for your assets, which is used for all ASRs by default.

    If you have assets that need separate contact information, you can specify separate ASR contact information for an asset or a group of assets. That info is used in place of the default contact info.

    Finally, once you've got the contact info in the system, you click Enable ASR. This action launches a job to enable the assets for ASR, and it attempts to enable new assets for ASR when they're discovered. From then on, if a critical incident occurs on the hardware, ASR should create a service request for it.

    Take a look at the Auto Service Request chapter of the Admin Guide for more information.

    Thursday Jun 04, 2015

    Enterprise Controllers in Logical Domains

    I saw a few questions about installing Enterprise Controllers in Logical Domains, and what's possible with that sort of deployment. Here are some answers:

    "Is it supported to install the Enterprise Controller in a Logical Domain?"

    Yep. The Certified Systems Matrix lists the supported OSes for EC installation, and Oracle VM Server for SPARC is supported (as are some Oracle Solaris Zones).

    "Can you use Oracle Solaris Cluster to provide High Availability for an Enterprise Controller installed on a Logical Domain?"

    Yes, this is possible. It deserves its own post, so I'll go into more detail on it soon, but yes, it works.

    "If I have two Enterprise Controllers installed on Logical Domains, can I have EC 1 discover and manage the LDom for EC 2, and vice versa?"

    No. The Agent Controllers installed on EC and PC systems are different from standard Agents, and if you install an Agent from one EC on another EC's system, it's going to get confused.


    This blog discusses issues encountered in Ops Center and highlights the ways in which the documentation can help you


    « November 2015