jeudi févr. 21, 2008

why the Common Agent Container (CACAO) take a long time to start or stop ?

One question we have time to time is about the time taken by the container to start.

The answer in 99.9 % of the case is : because of one module deployed inside.

0.1 % percent left are due to sub-processes as explain here.

The Common Agent Container only loaded with its core modules never take more then

few seconds to start (include the start of the underlying jvm ).

For instance on a host (2 x UltraSPARC-IIIi  (1280 MHz) - 2.50GB Mem)

the container initialisation takes 4~5 secondes. A big part of this is taken by the creation

and the start of modules.

As an example here are statistics displayed using a Dtrace script.

                   modules - time (secs/milliseconds)
    startAdaptorsAndConnectors -             312
 registerAdaptorsAndConnectors -            1375
          register all modules -           82003
               container start -           86893

In this case DTrace is really not the recommended way to take stats

as the jvm is really slowed down but it gives you an idea of how the time is dispatched.

Everything in the container start sequence (this is also true for the shutdown) is done sequentially:

container initialisation

-> connectors/adaptors creation

-> start modules in order (dependency order)

-> start connectors (connections are then allowed)

-> stop modules in order

-> stop connectors

container finalization


The benefit of this is simplicity and to ease the life of module developers. A module cannot (may not)

be disturbed while it is performing tricky actions like initislisation and cleanup. As connectors

are not opened, no request can be received.

The disadvantage of this is if somebody take a long time to complete, everything is blocked waiting.

 Initialization and finalisation of a module (start and stop method) are supposed to be as short

as possible and bounded in time. If a module has to do perform some actions which may block

(db connection, remote host connection, execution of interactive process ...). The developper should

implement timeouts and/or use additional thread dedicated to "dangerous" phases.

By doing this there will be no risk to block or slow down the container. And all deployed modules

will enjoy beeing started on time.


Knowing that, here is what you can do to identify the root cause of a slow start or stop of the container. 

When this happen on Solaris 10 and above, the cli command "/usr/sbincacaoadm start" will take

a long time to return. On other systems (Solaris9, Linux, hp-ux, windows) the CLI will return but

the container will take a long time to be ready to serve. issuing the "cacaoadm status" command

you will see messages like

default instance is DISABLED at system startup.
Daemon is running but not available. Try again as it might be starting.


Just having a look to the log file should help you to find out what happened.

First set a high level on the following filters.

cacaoadm set-filter --persistent com.sun.cacao.element=ALL

cacaoadm set-filter --persistent com.sun.cacao.container.impl=ALL


The container initialization would be logged but also (the most interesting part)  the beginning

and the end of each modules initialisation. You have to look for entries from unsynchronizedDeployModule method

and entries from the ElementSupport support class. Here is an example with the RMI module

 Feb 21, 2008 6:08:49 PM com.sun.cacao.container.impl.ContainerPrivate unsynchronizedDeployModule
FINER: Add the descriptor Module name: com.sun.cacao.rmi
Descriptor full path:/usr/lib/cacao/lib/tools/template/modules/com.sun.cacao.rmi.xml
 Description: This module manages the secure and unsecure RMI connectors.
 Version: 1.0
 Module class: com.sun.cacao.rmi.impl.RMIModule
 Initial admin State: UNLOCKED
 Heap: 100
 Instances Descriptor:
 Dependencies: []
 Parameters: {}
 Cacao version supported: 2.1
 Private paths: []
 Public paths: [file:../../../../lib/cacao_rmi.jar]
 Library paths: []
 Ignored at startup: false
 Enable on demand: false
Feb 21, 2008 6:08:50 PM com.sun.cacao.element.ElementSupport setAdministrativeState
FINE: Administrative state change to UNLOCKED : com.sun.cacao:type=ModuleManager,instance="com.sun.cacao.rmi"
Feb 21, 2008 6:08:50 PM com.sun.cacao.element.ElementSupport setAdministrativeState
FINE: Administrative state change to UNLOCKED : com.sun.cacao:type=module,instance="com.sun.cacao.rmi"

As you can see the start of this module begun at 6:08:49 to end more or less at 6:08:50.

You know now that the RMI module took around 1 second to start.

An administrative state set to UNLOCK for a module means 'started' 

Looking at the log file like this should give you the name of the culprit.

High level filter can be time and disk consuming. When your investigation is completed

you should set them back to their default value (in our case null as only com.sun.cacao is set by default).

cacaoadm set-filter --persistent com.sun.cacao.element=null

cacaoadm set-filter --persistent com.sun.cacao.container.impl=null


mardi févr. 05, 2008

How to get Common Agent Container/detached jvm dump

One of the frequently asked question is : My Jvm just crash but I cannot find any core dump of log.

All this may depend on host and/or Jvm configuration. The Common Agent Container is a service and

then run in background. It is not as easy as with console application to get any traces.

There are several methods I've used, here they are, just pick the one you prefer (some are not working on all jdks).

Note that if you are used to use java debugger like the one in NetBeans it may be useless
for you to read this page :-)

  • getting Jvm stack output using truss/strace.
Just go here.

  • enabling core file dump on the host.
By default on systems like Solaris, the dump of core file is disabled.
Use this with caution, enabling core file can cause security issue and/or fill up your disk.
It is recommended to set everything back to default after investigation is done.
  • On Solaris :
See coreadm(1M) man page for details.
    #coreadm -e log
    #coreadm -e process
    #coreadm -e global
    #coreadm -G all
    #coreadm -i /var/tmp/core.%f.%p.%t
    #coreadm -g /var/tmp/core.%f.%p.%t
    #coreadm -u

It is even better to only change the configuration for the running jvm.
    #coreadm -p /var/tmp/core.%f.%p.%t -P all  <jvm pid>
  • On Linux :
  • set core file limit in /etc/profile.
Replace any line like
ulimit -S -c 0 > /dev/null 2>&1 
by something like
[ `/usr/bin/id -u` -eq 0 ] && ulimit -c unlimited
  • set core file pattern (cf proc(5) fro details)
echo 1 > /proc/sys/kernel/core_uses_pid
echo /var/tmp/core.%f.%p.%t > /proc/sys/kernel/core_pattern
you can also use the systctl command :
sysctl -w kernel.core_pattern=/var/tmp/core.%f.%p.%t

  • Setting Jvm error file path in the command line.
Starting from jdk 6 , you are allowed to set the path of the dump file. This is done with the
XX:ErrorFile option. You can then add this to the Common Agent Container java's flags.

#cacaoadm set-param java-flags=-XX:ErrorFile=/var/tmp/myDump `cacaoadm get-param java-flags --value`

  • Using DTrace to catch Jvm fault.

Using the fault probe of the Proc provider you can print the stack trace each time the jvm
triggers one. I've wrote a small DTrace script that you can use. For all running Jvms it will
print output like the following on case of failure. In that example I've hacked a jni call to trigger
a segmentation fault.  (get the script)

[java:  4213/  2] experiencing fault <address not mapped to object>, signal 11

              StubRoutines (1)



Emmanuel Jannetti blog


« juillet 2016