Converged Load-Balancer (CLB)


Brief Overview of Converged Load-Balancer

Converged Load-Balancer, is designed to provide high-availability to applications deployed on Sun Communication Application Server, aka sailfin. It will distribute requests across the instances in cluster to increase throughput of the system. It will also fail-over request from unhealthy/inaccessible instances to healthy and available instances.

Features of Converged Load-Balancer

  1. It can handle both SIP(S) and HTTP(S) requests.
  2. It can load-balance converged application (application having web and sip components) and pure web applications
  3. It will distribute the load across healthy and available instances in cluster, thus enabling a higher throughput from the system
  4. It maintains stickiness of requests in a session to single instance till serving instance is healthy, available and enabled
  5. It will fail-over request from unhealthy/unavailable/disabled instances to instances which are healthy, available and enabled
  6. It supports round-robin and consistent-hash algorithm for load-balancing
  7. It supports Data-Centric-Rule(DCR) to extract hash-key used in consistent-hash algorithm

Deployment Topology

Converged load-balancer currently supports only self load-balancing cluster. Below figure illustrates self load-balancing topology.

Converged Load-Balancer Deployment

The above deployment contains:
  • Hardware IP sprayer : This distributes request evenly across all instances in sailfin cluster.
  • Sailfin Cluster : Sailfin cluster having converged load-balancer component.
In case user does not have Hardware IP sprayer, the request can be forwarded to any one instance in sailfin cluster. The converged load-balancer component on that instance will ensure that requests are distributed across the cluster. However that instance will be single point of failure. The presence of Hardware IP sprayer ensures high availability.

Note : Sailfin does not support two-tier deployment as of now. However there is no such restriction put on admin commands. User can still create two tier deployment using admin commands. Two-tier deployment may not function correctly.

Functioning of Converged Load-Balancer

Below illustration depicts the functioning of converged load-balancer

CLB Functionality
  • Step 1 : Client request comes to IP Sprayer
  • Step 2 : IP sprayer selects any of the sailfin instances in the cluster and forwards request to that instance, which in above illustration is instance X.
  • Step 3 : CLB on Instance X based on configured algorithm selects an instance to service the request. Then it forward the request to that instance, which in above illustration is instance Y. This can be same instance, i.e. instance X, as well. In such case Step 3 and 4 are bypassed.
  • Step 4 : CLB on Instance Y receives the request. It finds out that request is already proxied from another instance. Without any further processing, it passes the request to the container so that it can be serviced. CLB then sends the response back to CLB on Instance X.
  • Step 5 : CLB on Instance X in turn sends the response back to IP sprayer
  • Step 6 : IP sprayer sends the response back to the client

Algorithms

Converged load-balancer currently supports two load-balancing algorithm
  1. Round-robin : Instance to service new request are selected in a round robin fashion. 
  2. Consistent-hash : Instance to service new request is selected based on hash-key extracted from request.

In both cases, once a session is established, sticky information is set on response. Subsequent requests will have that sticky information. Thus subsequent requests part of same session will stick to same instance.

There are two possible configuration:

  • Configuration 1: This should be used when only pure web applications are deployed. The user does not provide a dcr-file.
    • Round-robin algorithm for all http requests
    • Consistent-hash algorithm for all sip requests, in case converged/pure-sip applications are deployed. The hash-key used to select instance is extracted using from-tag,to-tag,call-id parameters of the request.
  • Configuration 2: This should be used when converged/pure-sip applications are deployed on the application server. The user must provide a dcr-file in this case, to extract hash key from sip and http requests. If dcr-file is not provided, sip and http requests part of same session may be serviced by different instances.
    • Round-robin algorithm for http requests belonging to pure web applications
    • Consistent-hash algorithm for http requests belonging to converged applications
      • If any dcr rule matches http request, hash-key is extracted using that rule
      • If none of the rules matches http request, hash key is extracted using remote host and port of the http request
    • Consistent-hash algorithm for sip request belonging to converged/pure-sip applications
      • If any dcr rule matches sip request, hash-key is extracted using that rule
      • If none of the rules matches sip request, hash key is extracted using from-tag,to-tag,call-id parameters of the sip request

Note : For complete details of converged load-balancer, please go through functional specifications of converged load-balancer.

How to configure converged load-balancer

Some common points to remember before a user start with configuration of converged load-balancer
  1. The converged load-balancer uses xml file converged load-balancer xml to get cluster information. If this file is not present, CLB will not initialize and will return error. The request will not be serviced.
  2. The converged load-balancer xml is generated when auto-commit attribute of converged-load-balancer element under availability-service is set to true. Till it set to false, no xml file is generated.
  3. It is recommended that attribute auto-commit should be set to true, once user is done with complete configuration of cluster. This includes instances creation and application deployment. Otherwise it will result in unnecessary reconfiguration with any change to cluster.
  4. It is recommended to start cluster after auto-commit is set to true and all cluster configuration is done. Please refer to point 1 above to understand why this is recommended.
  5. Only instances with lb-enabled attribute set to true will participate in load-balancing. The disabled instances are not considered for load-balancing. The command to enable/disable instance is still not available. It will be provided shortly. Till then user can do it using set command. Any new instance created in cluster will have lb-enabled set to false. User has to set this attribute to true.
  6. The cluster heartbeat-enabled attribute must be set to true. This is set to true by default, when user creates a cluster.

Using default-cluster

A default-cluster already exists in the domain created using cluster profile. The default-cluster is pre-configured as self load-balancing cluster. User has to just create instances and deploy applications and a self load-balancing cluster is ready for use. Please follow steps below :
  1. User must ensure that all instances are created in cluster and all applications are deployed to cluster. This is not a mandatory step. However this is a recommended step.
  2. Set lb-enabled attribute for all instances of default-cluster to true
    • Command : asadmin> enable-converged-lb-server default-cluster
  3. Set auto-commit attribute to true
    • Command : asadmin > set default-cluster.availability-service.converged-load-balancer.auto-commit=true
  4. Start default-cluster. If it was already running, it will now have working converged load-balancer. There is no need to restart the cluster

Creating converged-load-balancer on already existing cluster

The converged-load-balancer element does not exist under cluster config and user is starting afresh with converged load-balancer configuration. Please follow steps below :

Option 1 : A single step process
  1. User must ensure that all instances are created in cluster and all applications are deployed to cluster. This is not a mandatory step. However this is a recommended step.
  2. Create converged load-balancer using a single command
    • Command : asadmin > create-converged-lb --target <cluster-name> --autocommit=true --selfloadbalance=true --lbenableallinstances=true --configfile <converged-load-balancer-xml> <converged-load-balancer-name>
  3. Start the cluster. If cluster was already running, please restart the cluster.
Option 2 : Multi-step process with details about what exactly happens in background with Option 1
  1. User must ensure that all instances are created in cluster and all applications are deployed to cluster. This is not a mandatory step. However this is a recommended step.
  2. Create a converged load-balancer config
    • Command : asadmin> create-converged-lb-config <clb-config-name>
    • For example : asadmin> create-converged-lb-config test-clb-config
  3. Create a converged load-balancer reference to the cluster
    • Command : asadmin> create-converged-lb-ref --clbconfig  <clb-config-name>  --selfloadbalance=true --lbenableallinstances=true <cluster-name>
    • For example : asadmin> create-converged-lb-ref --clbconfig  test-clb-config  --selfloadbalance=true --lbenableallinstances=true test-cluster
  4. Create converged load-balancer for the cluster
    • Command : asadmin> create-converged-lb --clbconfig <clb-config-name> --configfile <converged-load-balancer-xml-for-cluster> --autocommit=true --target <cluster-name> <converged-load-balancer-name>
    • For example : asadmin> create-converged-lb --clbconfig test-clb-config --configfile converged-load-balancer.xml --autocommit=true --target test-cluster test-cluster-clb
  5. Start the cluster. If cluster was already running, please restart the cluster.
Note:
  1. The above command does not show the default options to be provided with each command, for example user name, password file etc.
  2. Above mentioned way of configuration is just one way of configuring converged load-balancer. The used commands has many more options. Please look at man page of each command for all possible options.


Data Centric Rules(DCR)

Data centric rules are used extract the hash-key from the request. The extracted hash-key is used to select an instance to service the request under consistent-hash algorithm.

Sample DCR file

Below is a sample DCR file :
<?xml version="1.0" encoding="ISO-8859-1"?>
<user-centric-rules>
    <sip-rules>
        <if>
            <session-case>
                <equal>ORIGINATING</equal>
                <if> 
                    <header name="ConferenceName"
                            return="request.ConferenceName">
                        <exist/>
                    </header>  
                </if>
            </session-case>
        </if>
    </sip-rules>
    <http-rules> 
        <request-uri parameter="ConferenceName" return="parameter.ConferenceName">
            <exist/>
        </request-uri>
    </http-rules>  
</user-centric-rules>

Above sample DCR file define following rules :
  1. For SIP(S) request, for an originating request look for header named ConferenceName, and value of that header will be used as hash key.
  2. For HTTP(S) request, look for parameter named ConferenceName, and value of that parameter will be used as hash key.

Configuring DCR 

A user can setup DCR in following manner :
  1. At the time of converged load-balancer creation using a single command.
    • Command : asadmin > create-converged-lb --target <cluster-name> --autocommit=true --selfloadbalance=true --lbenableallinstances=true --configfile <converged-load-balancer-xml> --dcrfile <dcr-file-name> <converged-load-balancer-name>
  2. At the time of converged load-balancer config creation
    • Command : asadmin> create-converged-lb-config --dcrfile <dcr-file-name> <clb-config-name>
  3. After creation of converged load-balancer config
    • Command :  asadmin> set-dcr-file --dcrfile <dcr-file-name> <clb-config-name>
Note:
  1. The above command does not show the default options to be provided with each command, for example user name, password file etc.

Comments:

Hi,

I took a very quick look at this post. Just curious. Are you using Shoal for your clustering?

https://shoal.dev.java.net/

Paul

Posted by Paul Sterk on March 26, 2008 at 04:56 AM PDT #

Hi Kshitiz,

Looks great! Two quick questions...

1) On step 4, why are you going back to "Instace X" from "Instance Y"? Wouldn't it be better to respond directly from "Instance Y"?

2) How does sticky loadbalancing work in CLB?

Nazrul

Posted by Nazrul on March 26, 2008 at 12:20 PM PDT #

Paul, Yes the CLB uses Shoal for cluster member discovery and health monitoring.

Cheers
Shreedhar

Posted by Shreedhar on March 27, 2008 at 06:58 AM PDT #

This is pretty cool. Being some what familiar with the SIP protocol I am curious to know how the following is handled?
The IP Sprayer receives an invite from alice. The Sprayer forwards that to Instance X. What happens when alice send the ACK and the Sprayer forwards it to Instance Y.

I assume this is where you are using Shoal to replicate session information across the cluster?

Posted by James Lorenzen on March 27, 2008 at 02:55 PM PDT #

Hi Nazrul,

1. If connection is over TCP, same connection need to be used to send response back to client. Yes, in case UDP or broken TCP connection, responses can be send back directly to client. However to keep it uniform response follow same path as followed by incoming request

2. Stickiness to an instance is maintained using cookies.

2.1 If algorithm is round-robin, CLB adds a header proxy-beroute to request. This is added to response as cookie BEROUTE. Subsequent request will have this cookie and its value with be used to maintain stickiness.

2.2 If algorithm is consistent-hash, CLB adds a header proxy-bekey to request. This is added to response as cookie BEKEY. Subsequent request will have this cookie and its value with be used to maintain stickiness.

Hope this answers your query.

Thanks,
Kshitiz

Posted by Kshitiz Saxena on March 31, 2008 at 09:23 PM PDT #

Hi James,

As you mentioned, IP sprayer can send it to any instance.

Lets assume in your case, Instance X will service the request based on consistent-hash algorithm.

Instance X receives INVITE request, processes and send response back to user. This response will have BEKEY set. On subsequent request, ACK request in your case, will have that BEKEY. Now IP sprayer send the request to Instance Y. Instance Y will extract that BEKEY. Using that key, it will figure out this request need to be serviced by Instance X. It will forward the request to Instance X, which will service the request and send response back to the client.

No session replication is needed for CLB decision, as CLB is stateless. Its decision is based on hash key. The selected instance based on hash key will be same whichever may be receiving instance in the cluster. If selected instance is same as receiving instance, then it is serviced locally and response it send back to client. If selected instance is different from receiving instance, then receiving nstance will forward request to selected instance where request will be serviced and response send back to client.

Thanks,
Kshitiz

Posted by Kshitiz Saxena on March 31, 2008 at 09:36 PM PDT #

@Kshitiz
I had never heard of BEKEY. Is it a header identified in the rfc?
Then the assumption is third party clients much copy BEKEY into the ACK. Out of curiosity has this been tested with the JAIN SIP API?

Posted by James Lorenzen on April 01, 2008 at 12:33 AM PDT #

Hi James,

BEKEY is not added as separate header. It is added as part of record-route header. This header is sent back as Route header on subsequent request. To get complete set of header where this value can be set, please refer to appendix section of CLB functional spec.

Thanks,
Kshitiz

Posted by Kshitiz Saxena on April 01, 2008 at 12:51 AM PDT #

This is interesting. Is it correct to assume this works only when the protocol uses a response to be sent back? (and hence would work for HTTP)?

The first time a message arrives, and its response sent back, IP sprayer extracts the sticky information from the response's "header". This sticky information is then "somehow" added to the subsequent messages. That "somehow" could be a http DCR. Did I understand that correct?

In the case when the first message's response suggests which instance the subsequent messages need to go to, for example in your example steps 3 and 4. what happens when instance y goes down while instance x decides to send the message to Y?

thanks,
Kiran B.

Posted by Kiran Bhumana on April 29, 2008 at 03:11 PM PDT #

Hi Kiran,

I am not sure what you mean by "this works only when the protocol uses a response to be sent back". In any case, HTTP is a supported protocol by CLB.

IP sprayer does not play any role in maintaining stickiness. This role is played by CLB.

When a new request comes in, CLB will extract key based on algorithm used and add it as an header to request. Say if it is a http request with round-robin as load-balancing algorithm, proxy-bekey header is added as request header with value having information of selected instance. This http request, if it creates a session, then above proxy header is added as a cookie to response. So subsequent request will have this cookie. This cookie value will be used by CLB to maintain the stickiness.

If instance y goes down when instance x has already selected instance y to service request, then that request will fail. However after sometime, instance x will become aware that instance y is down and will not select that instance to service request. Till the time instance y is detected as failed, there will be few failures.

Thanks,
Kshitiz

Posted by Kshitiz Saxena on April 29, 2008 at 09:56 PM PDT #

Very interesting !!!!

Please clarify my doubt on network topology mentioned above. I found it bit confusing that one instance of CLB passing request to another CLB (instnace x to y). Am I correct that this is multi level load balancing, Otherwise request would have gone to request processor straight from CLB instance x ?

Also, is it mandatory that all nodes in a cluster should be running on sailfin? What if I want to add any 3rd party application server in cluster as leaf node i.e. request processor?

Thanks
/Raj

Posted by Raj on May 14, 2008 at 10:41 AM PDT #

Hi Raj,

The topology deployed above is a self load-balancing cluster. It is a single tier where each instance acts as CLB as well as request processor. Presence of CLB ensures that request are routed to correct instance even if IP Sprayer routes request to incorrect instances. This topology provides for high availability of CLB as well as request processor, i.e., there are no single point of failure. It may also provide better utilization of system resources.

Two-tier topology is also possible, however it is not thoroughly tested.

CLB has dependency on sailfin's admin framework, GMS, grizzly etc. Hence as of now it will work only in sailfin deployment.

In future, CLB may be available as independent component so that it can service any application server.

Thanks,
Kshitiz

Posted by Kshitiz Saxena on May 14, 2008 at 01:34 PM PDT #

[Trackback] JSR 289 went final just about a month back. And we have the first implementation of JSR 289 at alpha stage available now. SailFin V1 Alpha has been released.

Posted by Binod's Blog on August 13, 2008 at 05:45 AM PDT #

[Trackback] Today we are releasing our first Grizzly 2.0 release, which we are working on since almost 2 years. Get ready for the revolution! But how this Grizzly things started?

Posted by Jean-Francois Arcand's Blog on April 21, 2009 at 07:54 AM PDT #

Kshitiz, the degisn of CLB is pretty good, do you have benchmark data about CLB?

Posted by David zhang on January 20, 2010 at 11:28 AM PST #

Hi David,

CLB benchmark data is only available internally. It can also be made available to customers buying support from Sun.

Thanks,
Kshitiz

Posted by kshitiz on January 20, 2010 at 12:00 PM PST #

Hello Kshitiz,

I have a converged application to deploy that needs message broadcasting by using Comet techniques (Reverse Ajax). Those techniques rely on keeping HTTP requests pending on the server side, until the server has something to notify, only then the response is sent (long polling scenario).

For those techniques to succeed in massive scalability, an asynchronous server is needed (Servlet 3.0), or using an equivalent substitute framework, like Grizzly, for example. Basically I need a non-blocking server, that would not apply a thread-per-request model.

Now, finally comes the question :-) : Imagining I get Grizzly to run on my clustered SGCS, and my Comet app to run there... Is CLB a non-blocking server ? Is it long-lived HTTP connections friendly as to cluster Comet apps ? In case YES, where I can find docs. with specs. stating this ?

Sorry for the longish question, and thank you very much in advance for your patience!!!

Posted by guest on June 30, 2011 at 02:20 AM PDT #

CLB is not a server. It is just a module which runs on top of grizzly. So you have all benefits of grizzly when you have enabled CLB on a sailfin cluster. Hope this answers your query.

Thanks,
Kshitiz

Posted by Kshitiz on July 06, 2011 at 05:38 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

kshitiz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today