Thursday Oct 08, 2015

Agent versus Agentless Management

I've seen a couple of questions recently about the differences between Agent managed and Agentless assets, so I thought I'd explain the differences and the relative merits.

You can manage operating systems and virtualization technologies in one of two ways - by installing an Ops Center Agent on them, or by providing Ops Center with credentials that it can use to reach the asset. The agent is tailored to the system - there are separate types of Agents for Zones and LDoms.

If you don't want to have anything else installed on a system, agentless management can provide management and monitoring capabilities. However, there are some features that aren't available for assets that are managed agentlessly. There's a table in the OS Management chapter that explains what features are and are not available with agentless assets.

Thursday Oct 01, 2015

Changing an Asset's Name

I got a question about an incorrect asset name:

"I discovered a server, but when I discovered it, it was named according to its IP address because of a DNS issue. The incorrect name in the UI angers me. How do I fix it?"

This is relatively simple. In the navigation section, select the All Assets node at the root of the asset tree, then select the Managed Assets tab. Select the incorrectly named asset, and click the pencil icon to edit its properties (including name). Click finish to save the changes.

Thursday Sep 24, 2015

Best Practices for EC Backups

Ops Center has a backup and recovery feature for the Enterprise Controller - you can save the current EC state as a backup file, and restore the EC to that state using the file. It's an important feature, but I've seen a few folks asking for guidelines about how to use it. Every site is different, but here are some broad guidelines that we recommend:

  • Perform a backup at least once a week, and keep at least two backup files.
  • Once you've made a backup file, store it offsite or on a NAS share - don't keep it locally on the EC.
  • You can use a cron job to automate regular backups. Here's a sample cron job to perform a backup:
    0 0 * * 0 /opt/SUNWxvmoc/bin/ecadm backup -o /bigdisk/oc_backups -l /bigdisk/oc_backups
  • Remember that some files and directories are not part of the EC backup for size reasons: isos, flars, firmware images, and Solaris 8-10 and Linux patches.
    Firmware images are automatically re-downloaded in Connected Mode. Isos and flars can be re-imported. You can also do separate backups of your Ops Center libraries via Netbackup or the like.

Some folks have also asked if there's a good way to test the backup and recovery procedure, to make sure it's working. Well, there's really only one way to do it - do an EC backup, and also backup or clone the file systems. Then, uninstall and reinstall the EC, restore from the backup, and make sure that everything looks right.

Take a look at the Backup and Recovery chapter for more information about how to perform a backup.

Thursday Sep 17, 2015

Blank Credentials and Monitoring Delays

I saw a couple of unrelated but short questions this week, so I thought I'd answer them both.

"I tried to edit the credentials used by the Enterprise Controller to access My Oracle Support, but when I open the credentials window, the password field was blank, even though there should be an existing password. What's going on?"

So, naturally, once you've entered a password, we don't want to send that password back to the UI, because it'd be a security risk. In 12.3, though, the asterisks to indicate an existing password aren't showing up. So, your credentials are still there, and they won't be changed unless you specifically enter new credentials and save them.

"I was trying to make sure that file system monitoring was working correctly on a managed system. I made a file to push utilization up past the 90% threshold, which should've generated an incident. However, the incident didn't show up for almost an hour. Why is there a delay?"

You can edit the Alert Monitoring Rule Parameters in Ops Center. However, the thresholds that you set have to be maintained for a certain amount of time before an alert is generated. For a file system utilization alert, the default value for this delay is 45 minutes. You can edit the alert to change this threshold if needed.

Thursday Sep 10, 2015

Updating the OCDoctor on a Managed System

There was a new feature introduced in version 4.38 of the OCDoctor script which has been causing some confusion, so I thought I'd explain it a bit.

Beginning with version 4.38, when you run the OCDoctor script with the --update option on a managed system, the OCDoctor script looks for a newer version on the Enterprise Controller, rather than using external download sites. In connected mode, the Enterprise Controller runs a recurring job to download the latest OCDoctor, which the managed systems can then reach.

This makes updates more feasible if you're in a dark site, and minimizes external connections in other sites. However, if you've downloaded the OCDoctor manually on the EC, you will need to place the OCDoctor zip file in the /var/opt/sun/xvm/images/os/others/ directory on the Enterprise Controller so that managed systems can download it.

Thursday Sep 03, 2015

Installing Ops Center in a Zone

I got a question recently about an Ops Center deployment:

"I'm looking at installing an Enterprise Controller, co-located Proxy Controller, and database inside an Oracle Solaris 11 Zone. Is this doable, and are there any special things I should do to make it work?"

You can install all of these components in an S11 zone. There are a few things that you should do beforehand:

-Limit the ZFS ARC cache size in the global zone. Without a limit, the ZFS ARC can consume memory that should be released. The recommended size of the ZFS ARC cache given in the Sizing and Performance guide is equal to (Physical memory - Enterprise Controller heap size - Database memory) x 70%. For example:

  # limit ZFS memory consumption, example (tune memory to your system):
  echo "set zfs:zfs_arc_max=1024989270" >>/etc/system
  echo "set rlim_fd_cur=1024" >>/etc/system
  # set Oracle DB FDs
  projmod -s -K "process.max-file-descriptor=(basic,1024,deny)" user.root

Make sure the global zone has enough swap space configured. The recommended swap space for an EC is twice the physical memory if the physical memory is less than 16 GB, or 16 GB otherwise. For example:

  volsize=$(zfs get -H -o value volsize rpool/swap)
  volsize=${volsize%G}
  volsize=${volsize%%.*}
  if (( $volsize < 16 )); then zfs set volsize=16G rpool/swap; \
  else echo "Swap size sufficient at: ${volsize}G"; fi
  zfs list

In the non-global zone that you're using for the install, set the ulimit:

  echo "ulimit -Sn 1024">>/etc/profile

Finally, run the OCDoctor to check the prerequisites before you install.

Thursday Aug 27, 2015

How Many Systems Can Ops Center Manage?

I saw a question about how many systems you can manage through Ops Center. This is an important question when you're planning a deployment, or looking at expanding an existing deployment.

In general terms, an Enterprise Controller can manage up to 3,000 assets. A Proxy Controller can manage between 350 and 450 assets, although you'll get better performance if there are fewer assets.

The Sizing and Performance guide has more detailed information about the requirements and sizing guidelines for Ops Center.

Thursday Aug 20, 2015

Editing or Disabling Analytics

There was a recent question thread about how you can tweak the OS analytics settings in Ops Center.

"Ops Center collects analytics data every 5 minutes and retains it for 5 days. Is it possible to edit these settings?"

You can edit the retention period but not the collection interval.

To edit the retention period, log into the UI. Click the Administration section, then click the Configuration tab for the EC, and select the Report Service subsystem.

The repsvc.daily-samples-retention-days property specifies the number of days to retain OS analytics data. You can edit this property, then restart the EC to make it take effect.

"Can I turn off data collection for OS analytics entirely?"

Yes, you can. Bear in mind that this requires you to edit a config file, so be very careful.

Go to the /opt/sun/n1gc/lib directory on the EC and find the XVM_SATELLITE.properties file. Edit it to uncomment this line:

#report.service.disable=true

Then, restart the Enterprise Controller.

Thursday Aug 13, 2015

Recovering After a Proxy Controller Crash

I saw a question recently about how to restore your environment if a remote Proxy Controller system fails. This is a good question, and there are a few facets to the answer, depending on your environment.

Recovery is easiest if you have recently backed up the Proxy Controller. The backup file includes asset data, so if you can restore the PC using the backup file, you should be golden.

If you don't have a backup of the Proxy Controller, it's going to take a bit more work. First, you have to migrate the dead PC's assets to a new PC. If you have automatic failover enabled, this happens automatically (hence the name); otherwise you can do it manually.

Then, you can install a new Proxy Controller (using the Linux or Solaris procedure), and migrate the assets to that PC.

Thursday Aug 06, 2015

Mixing Servers in a Server Pool

I saw a couple of questions recently about what kinds of servers you can group together in a server pool.

"Can I create a server pool with different types of servers, like T4 and T5, or T4 and M10?"

Yes, you can. As long as you're not trying to mix SPARC and x86, you can put different types of hardware in a server pool.

"Is it a good idea to make server pools like this?"

It will work, but performance won't be quite as good. To enable migration between hardware, the cpu-arch property needs to be set to generic, which means that not all of the hardware features are used. If you have the hardware to build completely homogeneous server pools, you'll get better performance.

The Server Pools chapter explains how to set up a server pool for whatever virtualization type you're using.

Thursday Jul 30, 2015

Recovering LDoms From a Failed Server

I saw a recent question about Logical Domain recovery: If you have a control domain installed on a server and the server goes down with a hardware fault, what options do you have for recovering the logical domains from that control domain?

The answer is that you have options depending on how your environment is configured:

  • If you have the control domain in a server pool, and you have enabled automatic recovery on the LDom, you have the option of watching as the LDom is automatically brought back up on another control domain in the server pool.
  • If the control domain is in a server pool but you didn't enable automatic recovery, you can still manually migrate the guest by deleting the control domain asset. The guests will then be put in the Shutdown Guests list for the server pool, and you can bring it up on another control domain.
  • If you want to add the failed control back in, before you rediscover it and put it back in the server pool, you should log in and make sure that the guest OS isn't running, to avoid split brain issues.

    Take a look at the Recover Logical Domains from a Failed Server how-to for more information.

    Thursday Jul 23, 2015

    Kernel Zones support in 12.3

    One of the new features in Ops Center 12.3 is support for Oracle Solaris kernel zones. I wanted to talk a bit about this, because there are some caveats, and a new document to help you with using this type of zone.

    Kernel zones differ from other zones in that they have a separate kernel and OS from the global zone, making them more independent. In Ops Center 12.3, you can discover and manage kernel zones. However, you can't migrate them, put them in a server pool, or change their configuration through the user interface.

    We put together a how-to that explains how you can discover existing kernel zones in your environment. You can also take a look at the What's New doc for more information about what's changed in 12.3.

    Thursday Jul 16, 2015

    New Books in 12.3

    One of the changes that we've made in Ops Center 12.3 is a change to the documentation library. We've divided the old Feature Reference Guide up into several smaller books so that it's easier to use:

    • Configure Reference talks about how to get the software working - discovering assets; configuring libraries, networks, and storage; and managing jobs.
    • Operate Reference talks about incidents, reports, hardware management, and OS management, provisioning, and updating.
    • Virtualize Reference describes the use and management of Oracle Solaris Zones, Oracle VM Servers for SPARC, and server pools.
    • Oracle SuperCluster Operate Reference covers the management of Oracle SuperCluster.

    The What's New doc has more information about these new books. You can find the new books by clicking Feature Reference on the main doc site.

    Thursday Jul 09, 2015

    New Virtualization Icons

    There's a change in the UI that I wanted to talk about, since it's been confusing some people after they upgrade to version 12.3. The icons that represent the different virtualization types, such as Oracle Solaris Zones or Logical Domains, have changed. Here are the new icons:

    We made this change because there were getting to be a lot of supported virtualization types, particularly now that Kernel Zones are supported. The new icons make it easier to differentiate between different types so that you know at a glance what sort of system you're dealing with.

    The other new features in version 12.3 are discussed in the What's New document.

    Thursday Jul 02, 2015

    Upgrading to 12.3

    Now that Ops Center 12.3 is out, you might be wondering how to upgrade to it. I thought I'd walk you through the major steps involved in upgrading, and direct you to the documents that go into more detail.

    First off, to upgrade directly to 12.3, you have to be using some variant of 12.2 - 12.2.0, 12.2.1, or 12.2.2. If you're on 12.1, you have to upgrade to 12.2 first. Here's a flowchart of the upgrade paths:


    The Upgrade guide also walks you through the other planning steps. It's a good idea to look at the release notes and the Oracle Solaris and Linux install guides before you upgrade, to make sure that you're aware of known issues and the latest system requirements.

    Once you've done your planning, you go to the Upgrade guide chapter that matches your environment - there are separate procedures for HA and non-HA environments, and separate procedures for upgrading from the command line or from the UI.

    These chapters will walk you through downloading the upgrade (you can get it through the UI, from the Oracle Technology Network, or from the Oracle Software Delivery Cloud), and applying the upgrade to the Enterprise Controller(s), Proxy Controllers, and Agents.

    About

    This blog discusses issues encountered in Ops Center and highlights the ways in which the documentation can help you

    Search

    Archives
    « February 2016
    SunMonTueWedThuFriSat
     
    1
    2
    3
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
         
           
    Today