OIM Clustering: Keeping separate environments separate
By Kenn Jones-Moderator-Oracle on Apr 22, 2014
Oracle Identity Manager 11g incorporates several clustering technologies in order to ensure high-availability across its different components. Several of these technologies use multicast to discover other cluster nodes on the same subnet. For testing and development purposes, it is common to have multiple distinct OIM environments co-existing on the same subnet. In that scenario, it is essential that the distinct environments utilise separate multicast addresses, so that they do not talk to each other – if they do, they will confuse one another, and many things can go wrong. This problem is less common with production environments, since best practice dictates that the production environment should be on a separate subnet from development and test, and multicast traffic cannot transverse subnet boundaries without special configuration.
Overview of OIM Clustering
Here’s a rough diagram of the clustering components inside OIM:
Quartz Scheduler Cluster
Data Caching Cluster
Application Server Cluster
There are three basic layers of clustering in OIM:
- Application Server Clustering: This is the clustering layer of the underlying Java EE Application Server (Oracle WebLogic or IBM WebSphere). This is responsible for replication of the JNDI tree, EJBs, HTTP sessions, etc.
- Data Caching: This provides in-memory caching of data to improve performance, while ensuring that database updates made on one node are propagated promptly to the others. OIM uses OSCache (OpenSymphony Cache) as the underlying technology for this.
- Scheduler Clustering: This is used to ensure that in a cluster each execution of a scheduled job only runs on one node. Otherwise, if a job is scheduled to start at 9am, every node in the cluster might try to start it at the same time, resulting in multiple simultaneous executions of that job
Clustering layers present in older versions only
- In OIM 11gR1, and 11gR2 base release, OIM used EclipseLink data caching, which included its own multicast clustering layer. From OIM 220.127.116.11.0 onwards, while EclipseLink is still being used for data access, its caching features are no longer used, so this form of multicast clustering is no longer present.
- As well as using JGroups for OSCache, OIM 9.x also used JGroups for a couple of additional functions (forcibly stopping scheduled tasks and diagnostic dashboard JMS test.) In OIM 11g, JGroups is now used for OSCache only.
Underlying technologies used
Different clustering components in OIM use different technologies:
|Application Server Cluster||Unicast or Multicast||Consult Application Server documentation:|
(OIM 18.104.22.168.x and earlier only)
||Multicast is only used to find other nodes in the cluster. With WLS, JNDI connections are opened between the nodes for the cache coordination traffic. On WebSphere, RMI is used instead.|
||Unlike other clustering components, Quartz does not use direct network communication between the nodes. Database tables are used for inter-cluster communication|
Relevant Configuration Settings
I’m only going to talk about the OIM-specific clustering settings here. So I won’t go into the configuration of the WebLogic/WebSphere clustering layer, only the data cache and scheduler clustering layers. All configuration relevant to these can be found in the /db/oim-config.xml file in MDS. So let’s discuss the settings in this file which are relevant to clustering.
|<cacheConfig clustered=”…”>||Must be set to true in a clustered install, and false for a single-instance install. This controls whether OSCache operates in a clustered mode.|
|<cacheConfig>/<xLCacheProviderProps multicastAddress=””>||Multicast address which is used for OSCache. (Also used by EclipseLink in versions 22.214.171.124.x and earlier; the same address is used for both.) Make sure this address is unique for each distinct OIM environment on the same subnet.|
|<xLCacheProviderProps>/<properties>||Can be used to manually override JGroups configuration used by OSCache. Not recommended.|
|<schedulerConfig clustered=”…”>||Must be set to true in a clustered install, and false for a single-instance install.|
|<schedulerConfig multicastAddress=”…”>||In OIM 9.x, JGroups was used to forcibly stop jobs. In OIM 11g, a different mechanism is used instead. This configuration setting is a left-over from OIM 9.x, and is now ignored. However, to avoid confusion, it is recommended to set this to the same multicastAddress as the xLCacheProviderProps above.|
|<deploymentConfig>/<deploymentMode>||In a clustered install, should be set to clustered; in a single instance, should be set to simple. This is used to control whether EclipseLink operates in a clustered mode.|
|<SOAConfig>/<username>||As its name implies, this is the username used by OIM to login to SOA. However, in OIM 126.96.36.199.0 and earlier, it also serves an additional purpose – on WebLogic, this username is used by EclipseLink clustering for inter-node communication. By default, this is weblogic; if you have renamed the weblogic user, you must change it; you are free to use another user if you wish, so long as they are a member of the Administrators group. (On WebSphere, this user is used for OIM-SOA integration only, not for EclipseLink clustering.)To change this, see “2.6 Optional: Updating the WebLogic Administrator Server User Name in Oracle Enterprise Manager Fusion Middleware Control (OIM Only)”. (If step 11 in those steps gives you a permissions error, just skip that step.)|
|<SOAConfig>/<passwordKey>||This is the name of the CSF Credential which stores the password for the <SOAConfig> user. You should never change this setting in oim-config.xml from its default of SOAAdminPassword, but you will need to change the corresponding CSF entry whenever you change that user’s password.|
What can go wrong
As I’ve mentioned, it is important that you have the correct clustering configuration for your environment. If you do not, many things can go wrong. I don’t propose to provide an exhaustive list of potential problems in this blog post, but just give one example I recently encountered at a customer site.
This customer was preparing to go live with Oracle Identity Manager 188.8.131.52. As part of their pre-production activities, they needed to document and test the procedure for periodic change of the weblogic password. They began by their testing by changing the weblogic password in one of their development environments. Restarting the OIM managed server, they saw this message multiple times in their WebLogic log: <Authentication of user weblogic failed because of invalid password>. They also found that the WEBLOGIC user in OIM was locked.
What went wrong here? Well, several things were wrong in this environment:
- They had <SOAConfig>/<username> set to weblogic, but they had not updated the SOAAdminPassword credential in CSF to the new weblogic password. This customer does not currently use any of the OIM functionality which requires SOA, so they normally leave their SOA server down, including for this test. You would think therefore that the <SOAConfig> would not be relevant to them; but, as I have pointed out above, it is also used for EclipseLink clustering.
- Even though their development environments were single instance installs, they all had <deploymentConfig>/<deploymentMode> set to cluster instead of simple. As a result, EclipseLink clustering was active even though it did not need to be.
- <cacheConfig>/<xLCacheProviderProps multicastAddress=””> was set to the same address in multiple development environments on the same subnet. As a result, even though these environments were meant to be totally separate, they were formed into a single EclipseLink cluster.
So, what would happen, was that this environment (let’s call it DEV1) at startup would initialise EclipseLink clustering (since <deploymentConfig>/<deploymentMode> is set to cluster.) It would then add itself to the multicast group configured in <cacheConfig>/<xLCacheProviderProps multicastAddress=””>. At this point, DEV1 becomes visible to the other development environments (say DEV2 and DEV3). DEV2 tries to login to DEV1 over T3, using the <SOAConfig>/<username> user (weblogic) and the SOAAdminPassword password from CSF. However, the weblogic password having changed, both DEV2 and DEV3 will receive an invalid credential error, and DEV1 will experience <Authentication of user weblogic failed because of invalid password>. Setting <deploymentConfig>/<deploymentMode> to simple resolved this.
All site content is the property of Oracle Corp. Redistribution not allowed without written permission