Tuesday Dec 06, 2011

Announcing Release of Oracle Solaris Cluster 4.0!

We are very happy to announce the release of Oracle Solaris Cluster 4.0, the first release providing High Availability (HA) and Disaster Recovery (DR) capabilities for Oracle Solaris 11. This release comes within a few of the weeks of the release of Solaris 11.


Oracle Solaris Cluster 4.0 offers the best availability for enterprise applications with instant system failure detection for fastest service recovery. It includes out-of the box support for Oracle database and applications such as Oracle WebLogic Server and is pre-tested with Oracle Sun servers, storage and networking components. It is optimized to leverage the SPARC SuperCluster redundancy and reliability features and delivers the high availability infrastructure for the Oracle Optimized Solutions.


Oracle Solaris 11 Oracle Solaris Cluster on Solaris 11 offers an unified installation experience leveraging the Oracle Solaris Image Packaging System (IPS) for all the benefits that it bring. These include error-free software updates, automatic patch dependency resolution and automated installer for easy multi-node installation.


For a complete list of features and benefits, see  What's New in Oracle Solaris Cluster.'  Also, watch Bill Nesheim, VP Solaris Platform Engineering, in a webcast on Oracle Solaris Cluster 4.0 at 9AM PST. Stay tuned for more blog articles on the features. You can try the product out for evaluation, development and testing use at the Oracle Technology Network or for production use at the Oracle Software Delivery Cloud. We look forward to your feedback and inputs!


- Roma Baron, Sr. Program Manager, Solaris Cluster


- Meenakshi Kaul-Basu, Director, Solaris Cluster

Monday Jul 14, 2008

LDoms guest domains supported as Solaris Cluster nodes

Folks, when late last year we announced support for Solaris Cluster in LDoms I/O domains on this blog entry , we also hinted about support for LDoms guest domains. It has taken a bit longer then we envisaged, but i am pleased to report that SC Marketing has just announced support for LDoms guest domains with Solaris Cluster!!

So, what exactly does "support" mean here? It means that you can create a LDoms guest domain running Solaris, and then treat that guest domain as a cluster node by installing SC software (specific version and patch information noted later in the blog) inside the guest domain and have the SC software work with the virtual devices in the guest domain. The technically inclined reader would, at this point, have several questions pop into his head... How exactly does SC work with virtual devices? What do i have to do to make SC recognize these devices? Are there any differences between how SC is configured in LDoms guest domains, vs non-virtualized environments? Read-on below for a high level summary of specifics:

  • For shared storage devices (i.e. those accessible from multiple cluster nodes), the virtual device must be backed by a full SCSI LUN. That means, no file backed virtual devices, no slices, no volumes. This limitation is required because SC needs advanced features in the storage devices to guarantee data integrity and those features are available only for virtual storage devices backed by full SCSI LUNs.

  • One may need to use storage which is unshared (ie is accessed from only one cluster node), for things such as OS image installation for the guest domain. For such usage, any type of virtual devices can be used, including those backed by files in the I/O domain. However, for such virtual devices, make sure to configure them to be synchronous. Check LDoms documentation and release notes on how to do that. Currently (as of July 2008) one needs to add "set vds:vd_file_write_flags = 0" to the /etc/system file in the I/O domain exporting the file. This is required because the Cluster stores some key configuration information on the root filesystem (in /etc/cluster) and it expects that the information written to this location is written synchronously to the disks. If the root filesystem of the guest domain is on a file in the I/O domain, it needs this setting to be synchronous.

  • Network based storage (NAS etc.) is fine when used from within the guest domain. Check cluster support matrix for specifics. LDoms guest domains don't change this support.

  • For cluster private interconnect, the LDoms virtual device "vnet" can be used just fine, however the virtual switch which it maps must have the option "mode=sc" specified for it. So essentially, for the command ldm subcommand add-vsw, you would add another argument "mode=sc" on the command line while creating the virtual switch which would be used for cluster private interconnect inside the guest domains. This option enables a fastpath in the I/O domain for the Cluster heartbeat packets so that those packets do not compete with application network packets in the I/O domain for resources. This greatly improves the reliability of the Cluster heartbeats, even under heavy load, leading to a very stable cluster membership for applications to work with. Note however, that good engineering practices should still be followed while sizing your server resources (both in the I/O domain as well as in the guest domains) for the application load expected on the system.

  • With this announcement all features of Solaris Cluster supported in non-virtualized environments are supported in LDoms guest domains, unless explicitly noted in the SC release notes. Some limitations come from LDoms themselves, such as lack of jumbo frame support over virtual networks or lack of link based failure detection with IPMP in guest domains. Check LDoms documentation and release notes for such limitations as support for such missing features are improving all the time.

  • For support of specific applications with LDoms guest domains and SC, check with your ISV. Support for applications in LDoms guest domains is improving all the time, so check often.

  • Software version requirements. LDoms_1.0.3 or higher, S10U5 and patches 137111-01, 137042-01, 138042-02, and 138056-01 or higher are required in BOTH the LDoms guest domains as well as in the I/O domains exporting virtual devices to the guest domains. Solaris Cluster SC32U1 (3.2 2/08) with patch 126106-15 or higher is required in the LDoms guest domains.

  • Licensing for SC in LDoms guest domains follows the same model as those for the I/O domains. You basically pay for the physical server, irrespective of how many guest domains and I/O domains are deployed in that physical server.
  • This covers the high level overview of how SC is to be deployed inside the LDoms guest domains. Check out the SC Release notes for additional details, and some sample configurations. The whole virtualization space is evolving very rapidly and new developments are happening ever so quickly. Keep this blog page bookmarked and visit it frequently to find out how Solaris Cluster is evolving along with this space.

    Cheers!

    Ashutosh Tripathi
    Solaris Cluster Engineering

    Friday Apr 18, 2008

    Improving Sun Java System Application Server availability with Solaris Cluster


    Introduction

    Sun Java System Application Server is one of the leading middle-ware products in the market with its robust architecture, stability, and ease of use.  The design of the Application Server by itself has some  high availability (HA) features in the form of node agents (NA) which are spread on multiple nodes to avoid a single point of failure (SPoF).  A simple illustration of the design :




    However, as we can notice from the above block diagram, the Domain Administration Server (DAS) is not highly available. If the DAS goes down, then the administrative tasks cannot be done.  Despite the client connections being redirected to other instances of the cluster in case of an instance or NA failure or unavailability, an automated recovery would be desirable to reduce the load on the remaining instances of the cluster.  There are also the hardware, OS and network failure scenarios that needs to be accounted for in critical deployments, in which uptime is one of the main requirements.  

    Why is a High Availability Solution Required?

    A high availability solution is required to handle those failures which Application Server or for that matter any user-land application, cannot recover from, like network, hardware, operating  system failures, and human errors. Apart from these, there are other scenarios like providing continuous service even when OS or hardware upgrades and/or maintenance is done.

    Apart from failures, a high availability solution helps the deployment take advantage of other operating system features, like network level load distribution, link failure detection, and virtualization etc.,  to the fullest.

    How to decide on the best solution?

    Once customers decide that their deployment is better served by a high availability solution, they need to decide on which solution to choose from the market.  The answer to the following questions will help in the decision making:

    Is the solution very mature and robust?

    Does the vendor provide an Agent that is specifically designed for Sun Java System Application Server?

    Is the solution very easy to use and deploy?

    Is the solution cost effective?

    Is the solution complete? Can it provide high availability for associated components like
    Message Queue?

    And importantly, can they get very good support in the form of documentation, customer service and a single point of support?

    Why Solaris Cluster?


    Solaris Cluster is the best high availability solution for the Solaris platform available. It offers excellent integration with the Solaris Operating System and helps customers make use of new features introduced in Solaris without making modifications on their deployments.  Solaris Cluster supports applications running in containers, offers a very good choice of file systems that can be used, choices of processor architecture, etc.  Some of the  highlights include:

    Kernel level integration to make use of Solaris features like containers, ZFS, FMA, etc.

    A wide portfolio of agents to support the most widely used applications in the market.

    Very robust and quick failure detection mechanism and stability even during very high loads.

    IPMP - based network failure detection and load balancing.

    The same agent can be used for both Sun Java Application Server and Glassfish.

    Data Services Configuration Wizards for most common Solaris Cluster tasks.

    Sophisticated fencing mechanism to avoid data corruption.

    Detect loss of access to storage by monitoring the disk paths.

    How does Solaris Cluster Provide High Availability?

    Solaris Cluster provides high availability by using redundant components.  The storage, server and network card are redundant.  The following figure illustrates a simple two-node cluster which has the recommended redundant interconnects, storage accessible to both nodes, and public network interfaces each. It is important to note that this is the recommended configuration and the minimal configuration can have just one shared storage, interconnect, and public network interface.  Solaris Cluster even provides the flexibility of having a single-node cluster as well based on individual needs.

    LH =  Logical hostname, type of virtual IP used for moving IP addresses across NICs.

    RAID =  any suitable software or hardware based RAID mechanism that provides both redundancy and performance.

    One can opt to provide high availability just for the DAS alone or for the node agent as well. The choice is based on the environment. Scalability of the node agents is not a problem with high availability deployments, since multiple node agents can be deployed on a single Solaris Cluster installation. These node agents are configured in multiple resource groups, with each resource group having a single logical host, HAStoragePlus and agent node resource. Since node agents are spread over multiple nodes in a normal deployment, there is no need for additional hardware just because a  highly available architecture is being used.  Storage can be made redundant either with software or hardware based RAID.

    Solaris Cluster Failover Steps in Case of a Failure

    Solaris Cluster provides a set of sophisticated algorithms that are applied to determine whether to restart an application or to failover to the redundant node. Typically the IP address, the file system on which the application binaries and data reside, and the application resource itself are grouped into a logical entity called resource group (RG).  As the name implies, the IP address, file system, and application itself are viewed as resources and each one of them is identified by a resource type (RT) typically referred to as an agent. The recovery mechanism, i.e restart or fail over to another node is, determined based on a combination of time outs, number of restarts, and history of failovers. An agent typically has start, stop, and validate methods that are used to start, stop, and verify prerequisites every time the application changes state.  It also includes a probe which is executed at a predetermined period of time to determine application availability.

    Solaris Cluster has two RTs or agents for the Sun Java System Application Server.  The Resource Type SUNW.jsas is used for DAS, and SUNW.jsas_na for node agent. The probe mechanism involves executing the “asadmin list-domain” and “asadmin list-node-agents” command and interpreting the output to determine if the DAS and the node agents are in the desired state or not.  The Application Server software, file system, and  IP address are moved to the redundant node in case of a failover. Please refer to the Sun Cluster Data Service guide (http://docs.sun.com/app/docs/doc/819-2988) for more details.

    The following is a simple illustration of a failover in case of a server crash.
     

    In the previously mentioned setup, Application Server is not failed over to the second node if
    one of the NICs alone fails. The redundant NIC, which is part of the same IPMP group hosts the logical host to which the DAS and NA make use. A temporary network delay will be noticed for until the logical host is moved from nic1 to nic2.

    The Global File System (GFS) is recommended for Application Server deployments since there is very little write activity other than logs on the file system in which the configuration files and in specific deployments, binaries are installed. Because GFS is always mounted on all nodes, it results in better fail over times and quicker startup of Application server in case of a node crash or similar problems.

    Maintenance and Upgrades

    The same properties that help Solaris Cluster provide recovery during failures can be used to provide service continuity in case of maintenance and upgrade work. 

    During any planned OS maintenance or upgrade, the RGs are switched over to the redundant node and the node that needs maintenance is rebooted into non-cluster mode. The planned actions are performed and the node is then rebooted into the cluster.  The same procedure can be repeated for all the remaining nodes of the cluster.

    Application Server maintenance or upgrade depends on the way in which the binaries and the data and  configuration files are stored. 

    1.)Storing the binaries on the node's internal hard disk and storing the domain and node agent related files on the shared storage.  This method is preferable for environments in which frequent updates are necessary. The downside is the possibility of inconsistency in the application binaries, due to differences in patches or upgrades

    2.)Storing both the binaries and the data in the shared storage.   This method provides consistent data during all times but makes upgrades and maintenance without outages difficult.

    The choice has to be made by taking into account the procedures and processes followed in the organization.

    Other Features

    Solaris Cluster also provides features that can be used for co-locating services based on the concept of affinities. For example, you can use negative affinity to evacuate the test environment when a production environment is switched to a node or use positive affinity to move the Application Server resources to the same node on which database server is hosted for better performance etc.

    Solaris Cluster has an easy-to-use and intuitive GUI  management tool called Sun Cluster Manager, which can be used to perform most management taks.

    Solaris Cluster has an inbuilt telemetry feature that can be used to monitor the usage of resources like CPU, memory, etc.


    Sun Java Application server doesn't require any modification for Solaris Cluster as the agent is designed with this scenario in mind.

    The same agent can be used for Glassfish as well.

    The Message Queue Broker can be made highly available as well with the HA  for Sun Java Message Queue agent.

    Consistent with Sun's philosophy, the product is being open sourced in phases and the agents are already available under the CDDL license.

    An open source product based on the same code base is available for OpenSolaris releases called Open High Availability Cluster.  For more details on the product and community, please visit http://www.opensolaris.org/os/communities/ohac .

    The open-source product also has a comprehensive test suite that serves helps users test their deployment satisfactorily.  For more details, please read http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/Tests/.


    Summary

    For mission-critical environments, availability against all types of failures is a very important criterion.  Solaris Cluster is best designed to provide the highest availability for  Application Server by virtue of its integration with Solaris OS, stability, and having an agent specifically designed for Sun Java System Application Server.
     

    Madhan Kumar
    Solaris Cluster Engineering

    Tuesday Nov 21, 2006

    Sun Cluster HA Sun Java System Application Server - Configuration made easy

    The Sun Cluster 3.2 agent for the Sun Java System Application Server (version 8.2) enables two of its important components to be made highly available. The application server's configuration files have a number of settings that are inter-dependent and require careful editing when being changed by hand to make it work in sun cluster environment. This blog begins by outlining the key components of the application server and then provides the source for a tool designed to simplify the configuration of application server.

    The two important components of Sun Java Application Server which are made HA are:

    1. Domain Administration Server (DAS)
    The domain administration server (DAS) is the single process that manages the application server entities like node agents, standalone instances, application clusters, and their configurations.

    2. Node agents (NA)
    This is the component that makes spanning of a domain across machines possible. It is a standalone process that is started with or without manual intervention and it controls the life cycle of server instances it is responsible for. The server instances host the applications.

    A typical example configuration for the Sun Cluster HA Sun Java System Application Server would be as follows:

    In order to configure the above set up, you would require to modify:

    1. domain.xml
    In this file, you would need to modify the http-listeners, IIOP-listeners and the client-hostname under the server-config tag. This is to make the DAS listen on the failover IP. In addition to this, you would need to modify the client-hostname of the respective node-agents to make the node-agents listen on their respective failover IPs. This is to make DAS aware of the IPs on which NAs are bound to.

    2. nodeagent.properties
    In this file, you would need to modify the agent.client.host entity to make the node agent listen on their respective failover IPs. Make sure that it is the same as specified in the domain.xml for the respective node agents.

    3. das.properties (optional)
    This file needs to be changed, only if the value of the attribute "agent.das.host" is not the same as specified in domain.xml against the client-hostname tag of DAS.

    Sun Cluster Data service HA configuration guide for Sun Java System Application Server provides manual instructions for running Application Server in the Sun cluster environment. This configuration can be done using the script below. This script uses Perl v5.8.2 onwards and requires XPath modules which can be downloaded from http://search.cpan.org/CPAN/authors/id/M/MS/MSERGEANT/XML-XPath-1.13.tar.gz.

    Click this link to view and save the perl script.

    This script makes modifications to the above configuration files and creates a backup of the original configuration files. If you run this script more than once, the backed up files will be overwritten. Hence it is suggested to have a copy of the original file.

    Swathi,
    - Sun Cluster Engineering

    About

    mkb

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today