Sunday May 17, 2015

New white paper available: Providing High Availability to the OpenStack Cloud Controller on Oracle Solaris with Oracle Solaris Cluster

Oracle Solaris delivers a complete OpenStack distribution which is integrated with its core technologies such as Oracle Solaris Zones, the ZFS file system, and its image packaging system (IPS). OpenStack in Oracle Solaris 11.2 helps IT organizations to create an enterprise-ready Infrastructure as a Service (IaaS) cloud, so that users can quickly deploy virtual networking and compute resources by using a centralized web-based portal.

Of course any enterprise-class OpenStack deployment requires a highly available OpenStack infrastructure that can sustain individual system failures.

Oracle Solaris Cluster is deeply integrated with Oracle Solaris technologies and delivers high availability to Oracle Solaris based OpenStack services through a rich set of features.

The primary goals of the Oracle Solaris Cluster software are to maximize service availability through fine-grained monitoring and automated recovery of critical services and to prevent data corruption through proper fencing. The services covered include networking, storage, virtualization used by the OpenStack cloud controller and its own components.

Our team has created a new white paper to specifically explain how to provide high availability to the OpenStack cloud controller on Oracle Solaris with Oracle Solaris Cluster.

After describing an example for a highly available physical node OpenStack infrastructure deployment, a detailed and structured walk-through over the high availability cloud controller configuration is provided. It discusses each OpenStack component that runs on the cloud controller, with explicit steps on how to create and configure these components under cluster control. The deployment example achieves secure isolation between services, while defining all the required dependencies for proper startup and stopping between services which is orchestrated by the cluster framework.

The white paper is linked from the OpenStack Cloud Management page as well as from the Oracle Solaris Cluster Technical Resources page on the Oracle Technology Network portal.

Thorsten Früauf
Oracle Solaris Cluster Engineering

Thursday Oct 22, 2009

IDC white paper: Addressing Virtualization and High-Availability Needs with Sun Solaris Cluster

A white paper titled "Addressing Virtualization and High-Availability Needs with Sun Solaris Cluster",  written by IDC analyst, Jean Bozman, is now available!

What you will find from the paper includes:
- Businesses' High Availability needs in the Virtualized IT Environment and how Solaris Cluster addresses these requirements
- Why Virtualization Software and High Availability Software are being used to protect applications
- Worldwide Availability and Clustering Software Revenue, 2008-2013
- Integration of High Availability and Virtualization Use Cases
- Solaris Cluster Customer Snapshots

We look forward to your comments on the product.


Director, Solaris Cluster

Monday Mar 02, 2009

Why a logical IP is marked as DEPRECATED?

Reported Firewall Problems

We've had many questions, comments and complaints about IP address "problems" when using highly available services in a Sun Cluster environment. We found out that most, if not all of these were related to configurations where firewalls were configured between the service running on the cluster, and the clients connecting to the cluster.

So, what is the problem? The firewall administrators often make the assumption that a packet sent from a client to the logical IP address of an HA service will generate a response IP packet with exactly the same logical IP address as the source address. So, they configure an appropriate firewall rule and wonder why this rule does not work, i.e., instead there were IP packets coming back from an HA service that did not match this rule.

Then they start researching the network configuration on the cluster node that hosts the HA service and find out that the logical IP address used by that service was set to a state called "DEPRECATED". And they think this is the root cause of their problem - which (we think) is not the case.

How does Address Selection really work?

As address selection can become very complicated in complex network setups, the following will be true for the typical simple network setup found at most installations.

Let's look at the address selection for an outgoing packet a bit more closely. First we must make a distinction between TCP (RFC 793) and UDP (RFC 768). TCP is a connection-oriented protocol, i.e. a connection is established between a client and a service. Using this connection, source and target addresses are always used appropriately; in a Sun Cluster environment the source address of a packet sent by the service to a client will usually be the logical IP address of that HA service - but only if the client used the logical service address to send its request to the service.

So, this will not cause any problems with firewalls, because you know exactly which IP addresses will be used as source addresses for outgoing IP addresses.

Let's look into UDP now. UDP is a connectionless protocol, i.e., there is no established connection between a client and a server (service). A UDP-based service can choose its source address for outgoing packets by binding itself to a fixed address, but most services don't do this. Instead, they accept incoming packets from all network addresses configured. For those readers who are familiar with network programming, the typical code segment has the following lines in it:

struct sockaddr_in address;
address.sin_addr.s_addr = INADDR_ANY;
if (bind (..., (struct sockaddr \*) &address, ...) == 0)

Using this typical piece of code, the UDP service listens on all configured IP addresses, and the outbound source address is set by the IP layer and the choosing algorithm is complex and cannot be influenced. Details can be read in Infodoc 204569 (access on SunSolve for SPECTRUM contract holders only); but we think they are not that relevant here, except for this quote: "IP addresses associated with interfaces marked as DEPRECATED will not normally be used as source addresses by IP unless deprecated interfaces are all that is available, in which case they will be used."


So, now DEPRECATED comes into play. A DEPRECATED address will - normally - not be used as a source address! First, why does Sun Cluster set HA IP addresses, i.e. logical or floating addresses into state DEPRECATED? Because they are floating addresses - there is no guarantee that they will stay on one node. In failure situations an HA IP address will float to another node together with its service. Or if the administrator decides to migrate a service; or when the service is stopped, the logical IP address might disappear on one node.

Let's have a look at services where IP communication is initiated from a cluster node. E.g. a cluster node might try to mount an external NFS share on this node temporarily. Whether this is UDP or TCP based NFS would not matter in this case! The IP layer would choose a source address; it could be the logical IP address of an HA service that happens to run on the same system - if it were not DEPRECATED. Now, imagine the NFS mount is successful, is using the logical IP address and NFS transfers work fine. Now, the HA service that owns the HA IP address is switched to another node in the cluster; its IP address would also switch. What would happen to the NFS traffic between this node and the external NFS server? It would fail. Packets coming from the NFS server would reach a different node now; namely that of the HA service that switched, taking its IP address with it. (And the NFS client on the cluster node would fail as well.....)

So, that is the reason for setting the DEPRECATED flag on HA IP addresses; remember the quote above: "...marked as DEPRECATED will not normally be used...". Although not setting the DEPRECATED flag would improve the probability that the address potentially be used by the IP layer as a source address, there is no guarantee and at the end, this would not help. But the DEPRECATED flag helps to prevent major problems on cluster nodes.

The Solution

Back to the original question: how can I make my firewall rules work? There are 4 possibilities - in prioritized order, best practice first:

  1. change your firewall rules to accept all possible addresses from the nodes where packets could be originating from;
  2. change your service, by binding only to the HA service IP address - which is only possible if its configuration lets you do this or if you have access to the source code;
  3. move your HA service into a Solaris 10 container, that only uses the logical IP address; in this configuration the logical IP address will always be used as source address, even though it is in state DEPRECATED;
  4. try to manipulate the decision process of the IP layer - which is a very bad idea.

To summarize

Sun Cluster sets the DEPRECATED flag on HA service IP addresses by design and it is a good thing, as it prevents strange problems with IP based clients on cluster nodes to happen. Not setting it, would not solve the problems reported.

Hartmut Streppel
Principal Field Technologist
Systems Practice

Wednesday Dec 03, 2008

Open HA Cluster with HA-xVM demo

Following our recent Open CLARC inception review of the Open HA Cluster (OHAC) agent for xVM Hypervisor guest domains, I thought a small 5 minute demo would be of interest.

Please hit the download key from the demo link to stream the demo, which can be found at the HA-xVM project page.

The demo is based on Solaris Express Community Edition build 86 and Solaris Cluster Express 2/08 simply to reflect the cheat sheet from the link above. The purpose of the demo was to show the following:

  • Show that an OHAC resource group managing an xVM guest domain via an OHAC resource can switch an xVM guest domain from one node to another using live migration. What's not shown is that the OHAC interconnects are used for the live migration.
  • Show that an OHAC managed xVM guest domain can survive a node crash.

In particular, within the demo RG1 manages xVM domain domu1. RG1 is then switched from node podio2 to node podio1 and domu1 is subsequently live migrated between the two nodes. While RG1/domu1 is online on podio1 that node podio1 is crashed via "uadmin 2 1". OHAC automatically detects that failure and restarts domu1 on podio2.

Neil Garthwaite
Solaris Cluster Engineering

Monday May 12, 2008

Invitation to LinuxTag 2008 in Berlin

From 28th till 31th of May 2008, Berlin will become the center of Europe's Linux and open source movement. LinuxTag attracts over 10,000 IT decision-makers, trade visitors, developers, media representatives, and other interested visitors from 30 countries.

With over 80 exhibitors of free projects, LinuxTag is the world's largest platform for open-source software projects, offering you the chance to meet directly with their developers and talk about the latest trends.

For more than a decade, LinuxTag has featured its proven crowd magnets, the high-caliber lectures, conferences and tutorials, and the Business and Government Conference.

Hartmut Streppel and Thorsten Frueauf will give a full day tutorial about Open HA Cluster and Flying Containers on Wednesday, 28th May 2008.

Thorsten Frueauf will give a presentation about "Hochverfügbarkeit mit Open HA Cluster" on Saturday, 31th May 2008, within the OpenSolaris track. Check the agenda.

Come by the Sun booth and talk to us, we will present and demonstrate Open HA Cluster on a single-node cluster, and discuss any cluster related subject of interest to you!

Here is your personal invitation in German (thanks to Deirdré Straughan for her patience when recording me):

Looking forward to meeting you in Berlin!

Thorsten Früauf
Solaris Cluster Engineering


Oracle Solaris Cluster Engineering Blog


« July 2016