Tuesday Oct 22, 2013

Minimum percentage of free physical memory that Linux requires for optimal performance

Recently, we have been getting questions about this percentage of free physical memory that OS require for optimal performance, mainly applicable to physical compute nodes.

Under normal conditions you may see that at the nodes without any application running the OS take (for example) between 24 and 25 GB of memory.
The Linux system reports the free memory in a different way, and most of those 25gbs (of the example) are available for user processes.
IE: Mem: 99191652k total, 23785732k used, 75405920k free, 173320k buffers

The MOS Doc Id. 233753.1 - "Analyzing Data Provided by '/proc/meminfo'" - explains it (section 4 - "Final Remarks"):
Free Memory and Used Memory
Estimating the resource usage, especially the memory consumption of processes is by far more complicated than it looks like at a first glance. The philosophy is an unused resource is a wasted resource.The kernel therefore will use as much RAM as it can to cache information from your local and remote filesystems/disks. This builds up over time as reads and writes are done on the system trying to keep the data stored in RAM as relevant as possible to the processes that have been running on your system. If there is free RAM available, more caching will be performed and thus more memory 'consumed'. However this doesn't really count as resource usage, since this cached memory is available in case some other process needs it. The cache is reclaimed, not at the time of process exit (you might start up another process soon that needs the same data), but upon demand.

That said, focusing more specifically on the percentage question, apart from this memory that OS takes, how much should be the minimum free memory that must be available every node so that they operate normally?
The answer is: As a rule of thumb 80% memory utilization is a good threshold, anything bigger than that should be investigated and remedied.

Wednesday Sep 25, 2013

Adding NTP Servers to Exalogic Switches

Sometimes there are some misconfigurations on the Exalogic machines as for some reasons compute nodes may have more specified addresses for NTP than the switches. So here are a few steps to add those in the switches.

To add another NTP server to the Infiniband/Gateway switches:
- Login on the switch ILOM BUI (preferably MSIE)
- Select the Configuration tab
- Select the Clock subtab
- Enter the NTP IP servers addresses appropriately

To configure two NTP servers on the the Cisco Ethernet Switch, please take a look at http://docs.oracle.com/cd/E18476_01/doc.220/e18478/spreadsheet.htm#BIIGDJEA - "5.4.1 Configuring the Cisco Ethernet Switch".

Note that, besides adding servers, you can also modify these configurations at your convenience.

Wednesday Aug 21, 2013

Resource allocation on vServers

A few facts to keep in mind.

The number of vCPUs to existent vServer cannot be modified. Currently vDC resource management (changing vCPU, memory, etc.) after a vServer has been created is not supported in Exalogic Virtual environments. However, a customer vServer type with required memory can be created and used instead of the default VM types.

Note that the Server pool management is handled by Exalogic, manual management of it is not available. Also, keep in mind that there is no way to control which physical host is assigned a given vServer. The scheduler algorithms are sophisticated but it is fine-grained and there is no reason to assume that the scheduler will not be maximally efficient. EMOC will look at the resources allocated to the vServer that is being planned on creating (CPUs/memory), then look at what is available on each of the compute nodes and make a decision on where to place the vServer. System administrators can try to use Distribution groups and separate the vServers they have to run across different Oracle VM Servers.

If need to increase Root File System and Swap Space of an Exalogic Guest Virtual Machine, then the MOS Note 1575790.1 is useful for that purpose.

Monday Jul 29, 2013

Coherence on Exalogic: dealing with the multiple network interfaces

Recently, we worked an incident where error messages like the following were being thrown when starting the Coherence servers after an upgrade of EECS:
Oracle Coherence GE (thread=Thread-3, member=n/a): Loaded Reporter configuration from "jar:file:/u01/app/fmw_product/wlserver_103/coherence_3.7/lib/coherence.jar!/reports/report-group.xml"
Exception in thread "Thread-3" java.lang.IllegalArgumentException: unresolvable localhost at
Caused by: java.rmi.server.ExportException: Listen failed on port: 8877; nested exception is:
java.net.SocketException: Address already in use ...
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: is not a local address
at com.tangosol.net.InetAddressHelper.getLocalAddress(InetAddressHelper.java:117)

It is a very known fact that Exalgic has several network interfaces (bond/eth 0,1,2, etc). The logic that Coherence uses when deciding what interface to connect to, specifically to support machines with multiple network interfaces as well as enhancements to allow the localaddress to be specified as a netmask to make configuration across larger clusters easier, makes important (even more than on previuous releases of Coherence) to make sure that the tangosol.coherence.localhost parameter is specified appropriately. From that IP address (or properly mapped host address) the desired network interface to be used can easily be found and then the Coherence cluster would work fine on it.

Thursday Jun 20, 2013

Exachk 2.2.2 released - Now Includes Support for Exalogic Solaris Environments

This new Exachk 2.2.2 release includes several new improvements and features.

The following are additional checks as part of this new 2.2.2 release for Solaris:
Compute Nodes
- Hardware and Firmware Profile
- Software Profile
- NTP Synchronization
- DNS Setup
- Correct Slot Installation of IB Card for Solaris
- Subnet Manager
- Root Partition Usage Limit for Solaris
- Lockd Configuration for Solaris Compute Node
- ib_ipoib Module for Solaris
- ib_sdp Module for Solaris
- IP Configuration - net0 and bond0
- Recent Reboot Info for Solaris
- Probe Based IPMP for Solaris
- Swap Space for Solaris
- Free Physical Memory for Solaris
- MTU for Solaris
- IPMP Configuration for Solaris
- Fault Management Log for Solaris
- BIOS Settings
- NFS Mount Point - Version for Solaris
- Hostname Consistency with DNS on the Physical Compute Node
- NFS Mount Point - Attribute Caching for Solaris
- NFS Mount Point - Rsize Wsize for Solaris
- NIS domain (YPBind) for Solaris

Also, the following checks have been enhanced as part of this new 2.2.2 release:
Compute Nodes
- Connectivity To OVMM
- MTU Value for Infiniband Interface
- Hostname Matches the DNS on the Physical Compute Node
- Non-sequential Even-numbered Gateway Instance
- NTP Configuration for Switch Nodes Matches Physical Compute Nodes
- NTP Configuration for Switch Nodes Matches Oracle VM Servers
- Hostname Matches the DNS on Oracle VM Server
- Hostname Matching with DNS on Switches
- NTP Configuration for ZFS Matches Oracle VM Servers
- NTP Configuration for ZFS Matches Physical Compute Nodes
Multiple Components
- MTU for InfiniBand Link in Control vServers

Exachk is available via MOS Doc Id. 1449226.1

Thursday Mar 28, 2013

ClassCastException thrown when running Coherence with Exabus IMB enabled

Today I worked a Service Request which was a Coherence issue on an Exalogic platform. It is a very interesting issue (at least for me ).

An exception message like the following is thrown when running Coherence with IMB on a WLS server:
java.lang.ClassCastException: com.oracle.common.net.exabus.util.SimpleEvent
at com.tangosol.coherence.component.net.MessageHandler$EventCollector.add(MessageHandler.CDB:6)
at com.oracle.common.net.infinibus.InfiniBus.emitEvent(InfiniBus.java:323)
at com.oracle.common.net.infinibus.InfiniBus.handleEventOpen(InfiniBus.java:502)
at com.oracle.common.net.infinibus.InfiniBus.deliverEvents(InfiniBus.java:468)
at com.oracle.common.net.infinibus.EventDeliveryService$WorkerThread.run(EventDeliveryService.java:126)

The cause of this problem is that Coherence runs into a classloading issue when:
- using to enforce the child-first classloading
- coherence.jar is both in the system classpath and application classpath
- and Exabus IMB is enabled

In newer versions of WLS (12c), coherence.jar is in system classpath, so by default Coherence classes will be loaded from the system classpath. For situations where is required child first class loading semantics, and should be specified over configuration inside weblogic.xml to change the classloading order.

To solve this, add the following into weblogic.xml:

Friday Jan 18, 2013

Two New Exalogic ZFS Storage Appliance MOS Notes

This week I have closed 2 Service Requests related to the ZFS Storage Appliance and I created My Oracle Support (MOS) notes from both of them as, despite they were not complicated issues and the SRs were both closed in less than one week, these procedures were still not formally documented on MOS. Below can be seen the information about these created documents.

MOS Doc Id. 1519858.1 - Will The Restart Of The NIS Service On The ZFS Storage Appliance Affect The Mounted Filesystems?

On this case, for a particular reason it was necessary to restart the NIS service. So, if for any reason, the NIS service needs to be restarted on the ZFS Storage Appliance, will the mounted filesystems be affected during the restart?

The default cluster configuration type of the ZFS storage appliance is active-passive and the storage nodes are supposed to be mirrored, so the restart of NIS should not be causing any issues; it can be done.

Note that restart of NIS should be done on the active storage head. Restarting the NIS itself will not cause any ZFS failover from Active to Passive.

In general terms, even in the event of a storage node failure, the appliance will automatically fail over to the other storage node. Under that condition, an initial degradation in performance can be expected because all of the cached data on the failed node is gone, but this effect decreases as the new active storage node begins caching data in its own SSDs.

MOS Doc Id. 1520223.1 - Exalogic Storage Nodes Hostnames Are Displayed Incorrectly

This was not the first time I saw something like this, so decided to create a note because clearly is a problem that may affect to more than one Exalogic user.

The Exalogic storage node hostnames displayed on the BUI were different than the ones displayed when accessing the node through SSH or ILOM.

This happens because for any reason the hostname is misconfigured on the ZFS Storage Appliance.

To solve this problem, it is necessary to set the system name and location accordingly on the Storage Appliance nodes BUI:
1. Login on the ZFS Storage Appliance BUI
2. Go to the "Configuration" tab, and select the "Services" subtab
3. Under the "Systems Settings" section, click on "System Identity"
4. Set the system name and location accordingly

Monday Jan 07, 2013

Exalogs - Exalogic Diagnostic Collection Tool now available on My Oracle Support

What is Exalogs?
Exalogs is a command-line tool for gathering logs, diagnostics, environment/configuration information, and other data from the following components in an Exalogic virtual rack:
- Enterprise Manager Ops Center Enterprise Controller
- Enterprise Manager Ops Center Proxy Controllers
- Oracle VM Manager
- Database VM (RDBMS)
- Compute nodes (Domain0/Dom0's)
- ZFS storage appliance
- InfniBand switches

You can target Exalogs at either the entire rack or for individual components.

Note that Exalogs is not a health-check or monitoring/reporting tool, and it is not a replacement for Exachk health-check tool.

Exalogs is available for download immediately from My Oracle Support using MOS Note 1512323.1 and further information (including a very complete README file) can be seen on that note.

Wednesday Dec 26, 2012

New Oracle Enterprise Manager 12c Book Now Available

We are pleased to announce that this new Oracle Enterprise Manager 12c book has been published and is now available, both as a printed book and as an e-book.

This is the first published EM12c book in the world.

It is a solid introduction to EM12c and is written in easy to understand English.

For further information and purchase, please go to http://www.packtpub.com/oracle-enterprise-manager-12c-cloud-control/book

Enjoy the world of Enterprise Manager!

Wednesday Dec 12, 2012

Managed servers getting down regularly by Node Manager. WAD?

Recently I have been working on a service request where several instances were running, and several technologies were being used, including SOA, BAM, BPEL and others.

At a first glance, this may seem to be a Node Manager problem. But on this situation, the problem was actually at JMS - Persistent Store level. Node Manager can automatically restart Managed Servers that have the "failed" health state, or have shut down unexpectedly due to a system crash or reboot. As a matter of fact, from the provided log files it was clear that the instance was becoming unhealthy because of a Persistent Store problem.

So finally, the problem here was not with Node Manager as it was working as designed, and the restart was being caused by the Persistent Store. After this Persistent Store problem was fixed, everything went fine.

This particular issue that I worked was on an Exalogic machine, but note that this may happen on any hardware running Weblogic.

Thursday Aug 16, 2012

Oracle Exalogic Elastic Cloud Handbook on sale

The Oracle Exalogic Elastic Cloud Handbook from Oracle Press is now on sale.

Below there are some links with further information about the book and purchasing:
- Oracle Press
- Amazon.com
- Amazon Kindle Edition
- Safari Books Online

Monday Jun 11, 2012

Exalogic Exachk - Exalogic Health Check Tool now available on My Oracle Support

This is a new tool for Exalogic which automates the process of determining whether an Exalogic system is in a healthy state.

Exachk is available via MOS Doc Id. 1449226.1

Check the Quick Start Guide (available in the MOS Doc. Id. mentioned above) for further informarion.

Information about the Exachk healtcheck tool for Exadata can be seen at MOS Doc Id. 1070954.1

Friday May 18, 2012

CONNECTION_REFUSED messages on load balancing in Weblogic with OHS or Apache

In the last months I have had to work on some issues related to load balancing. It is very important to understand how the layers interact between them and where specific settings must be done.

Some people gets upset with the fact that OHS/Apache do load balancing even to servers that are shutdown and may be losing transactions.

This document provides very good tips about how many Production critical issues can be resolved just by setting the appropriate values for some parameters.

Personally, I think that the DynamicServerList parameter (which is in fact the first one mentioned on the document linked above) is particularly important to understand. As can be seen at this documentation from Oracle:
In a clustered environment, a plug-in may dispatch requests to an unavailable WebLogic Server instance because the DynamicServerList is not current in all plug-in processes.
DynamicServerList=ON works with a single Apache server (httpd daemon process), but for more than one server, such as StartServers=5, the dynamic server list will not be updated across all httpd instances until they have all tried to contact a WebLogic Server instance. This is because they are separate processes. This delay in updating the dynamic server list could allow an Apache httpd process to contact a server that another httpd process has marked as dead. Only after such an attempt will the server list will be updated within the proxy. One possible solution if this is undesirable is to set the DynamicServerList to OFF.
In a non-clustered environment, a plug-in may lose the stickiness of a session created after restarting WebLogic Server instances, because some plug-in processes do not have the new JVMID of those restarted servers, and treat them as unknown JVMIDs.
To avoid these issues, upgrade to Apache 2.0.x and configure Apache to use the multi-threaded and single-process model, mpm_worker_module.

Also, this Oracle documentation provides inportant information about "Failover, Cookies, and HTTP Sessions", and "Tuning to Reduce Connection_Refused Errors".

As can be seen at this Apache document, the MaxRequestsPerChild directive sets the limit on the number of requests that an individual child server will handle during its life.

Note that mod_proxy and related modules implement a proxy/gateway for Apache HTTP Server, supporting a number of popular protocols as well as several different load balancing algorithms. Third-party modules can add support for additional protocols and load balancing algorithms.

On Oracle Forums I also found a very interesting thread:
The error which you are getting is a common which can be fixed by increasing the "AcceptBackLog" value by 25% until error disappears from weblogic console (Path: Servers => => Configuration tab=> Tuning sub-tab.) and setting the value to ON for "KeepAlive" in the httpd.conf which should take care of your issue.
Topic: Tuning Connection Backlog Buffering
Search for "KeepAliveEnabled":
Also here is a link which would be helpful to understand some common issue which occurs when using a plug-in and there are solutions:

May transactions be affected because of this?
Certainly yes, but it depends on how your application is developed. A good practice would be to create a bunch of transactions and track them to check if some are missed or not. This Transaction and redelivery in JMS article may be helpful.

Wednesday Apr 18, 2012

REP-0178 - "Reports Server cannot establish connection" Error Message

During this last week I saw an interesting thread in Oracle Forums about this error message, and wanted to share the findings that I got to answer in the forum thread:
REP-0178: Reports Server [server_name] cannot establish connection

This problem may occur when a wrong rwclient is picked. Perhaps the environment is not set appropriately before rwclient is called or the rwclient.bat/rwclient.sh in /bin is found and used, but it is only a template that upon install allowed to create the valid rwclient.bat/rwclient.sh in the /config/reports/bin (in fact when the instance was actually configured).

So you can try using the appropriate rwclient.bat/rwclient.sh as it calls rwclient.exe/rwclient after setting the environment. Either set in the PATH the directory /config/reports/bin before /bin or specify the full path to rwclient.bat/rwclient.sh.

Another possibility can be that the services were started as root and therefore some log files have been created with the user root. Hence there is no more write access to theses log files for the basic Oracle user (which is the owner of the installation):

If that's the case, then try to change the owner of the following log files to ORACLE user (which is the owner of the installation):
And then restart the report server and run again the rwclient.sh command to generate the reports.

Monday Mar 05, 2012

Exalogic Elastic Cloud Software Version 2.0 Released

The Oracle Engineered Systems Community is pleased to announce the availability of Exalogic Elastic Cloud Software (EECS) version 2.0, offering the following new features and enhancements:
  • A Layer 7 application traffic management software component called Oracle Traffic Director, which features extremely high performance HTTP load balancing (reverse proxy), HTTPS termination via Intel AES cryptography support, load balancing, rate throttling, connection limiting, logging and other advanced features
  • Support for secure application isolation using InfiniBand Partitions, a technology that allows the implementation of virtual firewalls on Exalogic in combination with Oracle Traffic Director
  • Improved Exabus implementation - which greatly improves system performance and provides new optimized integration with Oracle Coherence and Oracle Tuxedo, as well as performance enhancements for WebLogic Server and all other Oracle Linux and Solaris applications

  • As of February 16, 2012 all new Exalogic X2-2 configurations are being shipped with the EECS 2.0 firmware, device drivers, operating system images and utilities loaded on shared storage. Customers running existing Exalogic X2-2 systems at any previous EECS patch level will also be able to update their systems to EECS 2.0 via an upgrade kit expected to be released shortly.

Principal Technical Support Engineer in the Engineered (Systems) Enterprise Support Team - EEST.
Former member of the Coherence and Java Technologies Support Teams.


« March 2015