In my previous article "Java Caching for Oracle Applications 11i: Part 1" I talked about the basics of Java Caching and how it works conceptually.

My original intention with "Part 2" was to discuss how to diagnose issues with Java Caching, but I got a bit carried away with scripts and the like so ended up creating some Metalink notes instead.
Diagnosing Database Invalidation Issues
Here's the first one:
- Diagnosing database invalidation issues with Java Cache for eBusiness Suite (Metalink Note 455194.1)
As it says on the tin, this note covers diagnosing issues with Database Invalidation. The classic symptoms being that when Responsibilities are added to a user, they are not appearing immediately (but do after Apache is bounced).
Although the scripts are listed in the note, you can also download a soft copy via the link mentioned in the note.
Diagnosing Issues with Responsibility Assignments
This issue is not actually a Java Caching problem, but the symptoms
initially look similar to those described above. The key difference
here is that the affected responsibilities still do not appear after
Apache is bounced. For details, see:
- Existing Responsibility Does Not Appear after modifying the effective date (Metalink Note 458869.1)
More on NoClassDefFoundErrors
- Investigating NoClassDefFoundError in eBusiness 11i when users login (Metalink Note 455366.1)
Hopefully you will find these notes useful towards understanding and diagnosing any Java Cache issues you find yourself facing.
Related
- Java Caching for Oracle Applications 11i: Part 1
- Latest JVM Tuning Recommendations for Apps 11i
- Using JConsole to monitor Apps 11i JVMs
- Investigating java.lang.OutOfMemoryError with Apps 11i Middle Tier JVMs
Comments (35)
Ok.
So I have created a bayesian network to diagnose java caching problems and it is working remarkly well (the ROC / AUC are looking good for it).
Lets exchange notes if you are interested.
Sincerely,
-Mohsin
Posted by Mohsin Beg | September 21, 2007 2:11 PM
Posted on September 21, 2007 14:11
Mike/Steve, Thanks for this valuable information. Hope to see some tips/insiders in SSO registration information in Apps (Things hidden behind SSOSDK) and mystery of txkrun.pl
As always this blog is my inspiration.
Atul Kumar
Posted by Atul Kumar | September 21, 2007 5:32 PM
Posted on September 21, 2007 17:32
Very interesting notes!
We actually hit both "issues" with responsibilities and logon issue in our test environments. Usually the NoClassDefFoundError comes after cloning but since Apache bounce resolves it and it doesn't reappear after that we haven't been looking deeper into it.
Posted by Simon | September 22, 2007 1:07 AM
Posted on September 22, 2007 01:07
Hi Mike/Steve,
We occasionally hit NoClassDefFound errors with the environments. To fix this, we've found that if we remove the port values in the context file (s_fnd_cache_port_range) and leave it null - allowing the system to dynamically select ports, the problem vanishes away. (We run the application with a default of 1 core JVM - ie OACoreGroup of 1).
Would you have a clue on this ?
Thanks,
Rakesh.
Posted by Rakesh Tripathi | September 22, 2007 12:22 PM
Posted on September 22, 2007 12:22
Hello Mohsin, yes please. Will discuss with you offline
Atul, thanks for your thoughts, SSO registration/integration is certainly an area that needs to be unpicked a little I think
Simon and Rakesh, useful information so will make note and look into further
Posted by Mike Shaw | September 24, 2007 1:50 AM
Posted on September 24, 2007 01:50
Excellent notes. I hit almost all the problems before. Even I had resolved the problems, but I did not have the whole picture. These notes definitely help me understand the problems.
Posted by Tianhua Wu | September 25, 2007 10:30 AM
Posted on September 25, 2007 10:30
We had NoClassDefFoundError for the first login to almost every new cloned environment we created. What I do is to remove $COMMON_TOP/_pages/*, bounce the Apache, the problem will go away for goods. The interesting thing is that even before I start up the Apache the very first time, I had cleaned $COMMON_TOP/_pages/* , I still have the problem. I have to remove $COMMON_TOP/_pages/_oa__html again and bounce Apache, now the problem goes away for goods.
Posted by Tianhua Wu | September 25, 2007 10:43 AM
Posted on September 25, 2007 10:43
Hello Mike!
I seems that java caching problems remain in oAS 10g. We have missing responsibilities issues with OEBS Release 12 and similar problems with OCS 10.1.2 both based on oAS 10.1.2. I've read your metalink note 455194.1 "Diagnosing database invalidation issues with Java Cache for eBusiness Suite" but could not realise what should I do to avoid java caching problems in OEBS Release 12.
Regards,
Viacheslav Leichinsky
Posted by Viacheslav | September 28, 2007 3:35 AM
Posted on September 28, 2007 03:35
Hello Tianhua,
So far as I know, cloning does not clear out the JTF_PREFAB_WSH_POES_B table, so if this has entries relating to your source system, this could be causing temporary confusion for Java Cache. Next time you do a clone, try truncating the JTF_PREFAB_WSH_POES_B table (with all eBiz middle tier services shutdown) The JTF_PREFAB_WSH_POES_B table is repopulated when the services startup so there is no problem doing this on a shutdown system.
Let me know if this works and we can get that fixed
Posted by Mike Shaw | October 4, 2007 2:21 AM
Posted on October 4, 2007 02:21
Hello Viacheslav,
Release 11i and Release 12 both use pretty much the same underlying code with respect to Java Caching, so the diagnostic techniques will be very similar in both cases.
I would recommend you raise a SR with Oracle Support to have your specific case investigated. To give the investigations a head start, upload the data/script output from parts 1 through 4 of the "Detailed diagnostic steps" in Note 455194.1, as most of this information is the same for R12. The main differences are the config files name/locations and the specific patch numbers that are being listed.
Posted by Mike Shaw | October 4, 2007 2:30 AM
Posted on October 4, 2007 02:30
Thanks Mike, I will give it a shot and let you know.
Posted by Tianhua Wu | October 5, 2007 10:56 AM
Posted on October 5, 2007 10:56
I just ran into the NoClassDefFoundError for the first login issue this morning. I have cleaned up _pages and bounced Apache. It seems to work now.
Thanks,
Sriram
Posted by Sriram | October 6, 2007 4:05 PM
Posted on October 6, 2007 16:05
Hi Mike,
I have tried "truncating JTF_PREFAB_WSH_POES_B" in two clone envs, and it works, Thanks for the tip.
T Wu
Posted by Tianhua Wu | November 1, 2007 3:38 PM
Posted on November 1, 2007 15:38
Hi Mike,
I have encountered one problem lately and I suspect it might be related with Java Cache.
Basically, one a user update a profile, we do not see the changes even after boucing the Apache. However, it would eventually show in the apps after another period of time. If I bounced all applications, it would show immediately.
Unfortunately this only happens only in production, I could not play with it to decide which apps I should bounce at the same time. We are at 11.5.10.2 (ATG.RUP5) and 10.2.0.2 (DB).
I just want to know what your thought is?
Thanks,
T Wu
Posted by Tianhua Wu | March 26, 2008 1:18 PM
Posted on March 26, 2008 13:18
Hello Tianhua,
Your issue doesn't seem to directly relate to java cache and sounds more likely an issue with the processing of the records through the Workflow Java Deferred Agent Listener. Probably best plan is to go through Note 455194.1 and then raise a SR with Oracle Support
Hope this helps
Mike
Posted by Mike Shaw | March 28, 2008 2:41 AM
Posted on March 28, 2008 02:41
Thanks again, Mike! I will look into it.
T Wu
Posted by Tianhua Wu | March 28, 2008 12:53 PM
Posted on March 28, 2008 12:53
I am trying to understand the steps by which membership is attained in a distributed cache and the process by which
a coordinator for JOC is appointed.
Is there any document out there that explains this ?
Thanks
Ravi
Posted by Ravi Swaminathan | November 1, 2008 6:42 PM
Posted on November 1, 2008 18:42
Hello Ravi,
Ebiz utilizes the underlying technology delivered by the AS10g team, so at the level of deteail you are looking into you can refer to the standard documentation set for the Java Object Cache
This is described in the "Oracle Containers for J2EE Services Guide 10g Release 3 (10.1.3) " at http://download-west.oracle.com/docs/cd/B25221_03/web.1013/b14427/joc.htm
Hope this helps
Mike
Posted by Mike Shaw | November 2, 2008 11:53 PM
Posted on November 2, 2008 23:53
Hi Mike
I've only recently come across your articles. We have had an ongoing problem with our HTML forms since implementing 11i CU2. We are using HTML Quoting with the side bar navigation links. The problem is that we have missing links from that side bar navigation menu that we can not pinpoint as to why they disappear. The only thing that we have found to correc this is an Apache bounce. If the Apache bounce is not performed correctly, the html links do not reappear. This has been happening for the past 4 years. We have had numerous SRs and OWCs but to date, there has been no correction the problem. Is there any assistance that you can provide?
Thanks
Linda
Posted by linda | June 22, 2009 6:18 AM
Posted on June 22, 2009 06:18
Hello Linda,
I am not a Quoting expert, and believe it uses it's own product specific mechanism to use the underlying Java technology (from memory) so may not be the best person to help
Can you email me a SR number (or two) to take a look at and will see how best to approach the problem......
regards
Mike
Posted by Mike Shaw | June 22, 2009 6:47 AM
Posted on June 22, 2009 06:47
Hi Mike,
One of our customer has around 50 JVM's running , enabling Distributed Caching is causing performance issue since each JVM need to communicate with other 50 JVM's .
Do we have any option in 11.5.10 to group JVM's let us say 5 groups with each group of abount 10 JVM's . If we can do it then each JVM need to communicate with only 10 JVM's .
Regards
Santhosh
Posted by Santhosh | July 24, 2009 1:47 AM
Posted on July 24, 2009 01:47
Hello Santhosh,
You can only have one group, but 50 JVMs isn't an unusually large number of JVMs for a PROD instance so not sure why you would see an appreciable performance hit.
Probably best to raise a Service Request with ATG Support so we can take a look at the specifics of your case
regards
Mike
Posted by Mike Shaw | July 24, 2009 5:18 AM
Posted on July 24, 2009 05:18
Hi Mike,
We have 3 OACore JVMs per test instance - all on the same server, with only one appstier per instance - they all have their databases on a separate server with no firewall in between - so we have s_fnd_cache_port_range set to nothing.
Does setting s_fnd_cache_port_range mean that this port is allocated dynamically ?
We seem to get intermittent, but frequent problems with some of the test environments taking the ports which are defined for other environments in the xml file.
This issue Never occurs in production (Dedicated server - not sharing with any other envs)
When I run a netstat -anp | grep - the output shows that one of the other environments is listening on the port in question.
Do you think having s_fnd_cache_port_range to nothing maybe causing this ?
Thanks
Oli
Posted by oli | October 14, 2009 2:46 AM
Posted on October 14, 2009 02:46
Hello Oli,
Yes, if "s_fnd_cache_port_range" is not set then the port number is random.
From your description of the issue, then setting "s_fnd_cache_port_range" to a specific port range for each test environment would likely resolve that problem
Hope this helps
Mike
Posted by Mike Shaw | October 14, 2009 2:52 AM
Posted on October 14, 2009 02:52
Thanks very much Mike - we will try to implement this to see if it helps.
On the Server we only have Oracle Apps running.
cat /proc/sys/net/ipv4/ip_local_port_range gives
1024 65000
Is there a recommended/standard port number to use for the port range ?
I saw some articles giving examples of (portpool + 3000-portpool+3000+Max_num_JVMs)
Thanks very much for you help
Oli
Posted by Oli | October 14, 2009 3:50 AM
Posted on October 14, 2009 03:50
s_fnd_cache_port_range is discussed in Note 287176.1 "Oracle E-Business Suite 11i Configuration in a DMZ" but we do not suggest any specific port range to use, so you can come up with your own conventions.
Hope this helps
Mike
Posted by MIke Shaw | October 14, 2009 4:13 AM
Posted on October 14, 2009 04:13
Hi Mike Thanks for info:
In section
5.9: Enable Distributed Oracle Java Object Cache Functionality
How do you do this:Identify the number of java processes spawned by the concurrent manager tier. For eg: if there are 3 JVMs spawned by the ICM, take the number as 3 .
Thanks
Oli
Posted by oli | October 14, 2009 6:38 AM
Posted on October 14, 2009 06:38
5.9: Enable Distributed Oracle Java Object Cache Functionality
Identifiy the number of java processes spawned by the concurrent manager tier.?For eg: if there are 3 JVMs spawned by the ICM, take the number as 3 .
Add this to the number of oacore JVMs . In the example given above, the total number JVMs become 6 . So, six ports need to be opened in the firewall.
Do know how to identifiy the number of java processes spawned by the concurrent manager tier.?
Thanks
Oli
Posted by oli | October 14, 2009 6:57 AM
Posted on October 14, 2009 06:57
Hello Oli,
I was sure we had documented how to calculate this number, but had to dig hard to find it !
The technique is mentioned in Note 380490.1 "Oracle E-Business Suite Release 12 Configuration in a DMZ" which suggests:
You can use the 'pstree' command to check the number of java processes spawned by the concurrent manager parent process. For eg: pstree -p 26258 where 26258 is the process ID of the FNDSM process
Hope this helps
Mike
Posted by Mike Shaw | October 14, 2009 11:47 PM
Posted on October 14, 2009 23:47
Thanks very much for this Mike - one last thing...
pstree output loads of java processes - I piped it through wc -l and got 222 processes. I was expecting it to be a smaller number - any advice ?
Thanks - heres a small sample of the output
$ pstree -p 1103
FNDSM(1103)ââ¬âFNDCRM(1365)
ââFNDLIBR(1372)
ââFNDLIBR(1373)
ââFNDLIBR(1374)
ââFNDLIBR(1375)
ââFNDLIBR(1376)
ââFNDLIBR(1384)
ââFNDLIBR(1388)
ââFNDLIBR(1389)
ââFNDLIBR(1391)
ââFNDLIBR(1392)
ââFNDLIBR(1393)
ââFNDLIBR(1394)
ââFNDSCH(1369)
ââPOXCON(1242)
ââPOXCON(1243)
ââPOXCON(1244)
ââPOXCON(1245)
ââPOXCON(1246)
ââPOXCON(1247)
ââRCVOLTM(1239)
ââRCVOLTM(1240)
ââRCVOLTM(1241)
ââjava(1248)ââ¬â{java}(1249)
â ââ{java}(1250)
â ââ{java}(1251)
â ââ{java}(1252)
â ââ{java}(1253)
â ââ{java}(1254)
â ââ{java}(1255)
â ââ{java}(1256)
Posted by oli | October 15, 2009 12:56 AM
Posted on October 15, 2009 00:56
Hello Oli,
By the looks, the output is showing each thread as a sub-process to the main java process. So PID 1248 is the "real" process and 1249->1256 are threads which are shown as children
In other words, you should be able to just look at the "java" processes at the top of the tree and ignore the rest
Hope this helps
Mike
Posted by Mike Shaw | October 16, 2009 3:10 AM
Posted on October 16, 2009 03:10
Hi Mike, Thankyou this helps a lot.
I identified the FNDSM - process was spawning 27 real java processes - so I added the 3 for the OACORE JVMs and updated the context file for a port range of 30501-30530 - ran autoconfig - checked the profile options were updated. Then I restarted all the apps services
It seems the java processes are now using the ports I defined - but not exclusivley -Netsat shows connections between java processes through the ports I defined to other random ports. I increased the port range from 30 upto 250 (30750) to see if this allowed enough ports for the processes to just use the ones in the range but it gave the following:
>>
>> Edited to remove this section
>>
Is there something Im missing
Thanks very much
Oli
Posted by oli | October 16, 2009 7:03 AM
Posted on October 16, 2009 07:03
Hello Oli,
Sorry you are having these problems with your setup. Seems like you are at the stage where you need to raise a Service Request (SR) for Oracle Support to take a more detailed look into your specific configuration. Please go ahead and raise a SR but by all means let me know the SR number.
As an aside, you will notice I took out site specific data from your last update, as didn't think it prudent to publish
regards
Mike
Posted by Mike Shaw | October 16, 2009 7:33 AM
Posted on October 16, 2009 07:33
Thanks Mike - SR number 7756431.994
Regards
Oli
Posted by oli | October 19, 2009 12:16 AM
Posted on October 19, 2009 00:16
Hi Mike,
After the changes the issues went away for nearly a month - but once has cropped up again - SR 7756431.994 any ideas?
Thanks
Oli
Posted by oli | November 4, 2009 4:18 AM
Posted on November 4, 2009 04:18