obiee memory usage

Heap memory is a frequent customer topic.

This is quick refresher, oriented towards AIX, but the principles apply to other unix implementations.

Another aspect of system memory consumption on unix systems is described here.

1. 32-bit processes have a maximum addressability of 4GB; usable application heap size of 2-3 GB. 
On AIX it is controlled by an environment variable:
export LDR_CNTRL=....=MAXDATA=0x080000000   # 2GB ( The leading zero (ox08)     is deliberate, not required )
   1a. It is  possible to get 3.25GB  heap size for a 32-bit process using @DSA (Discontiguous Segment Allocation)
    export LDR_CNTRL=MAXDATA=0xd0000000@DSA  # 3.25 GB 32-bit only
        One side-effect of using AIX segments "c" and "d" is that shared libraries will be loaded privately, and not shared.
        If you need the additional heap space, this is worth the trade-off.  This option is frequently used for 32-bit java.
   1b. 64-bit processes have no need for the @DSA option.

2. 64-bit processes can double the 32-bit heap size to 4GB using:
export LDR_CNTRL=....=MAXDATA=0x100000000  # 1 with 8-zeros
    2a. But this setting would place the same memory limitations on obiee as a 32-bit process
    2b. The major benefit of 64-bit is to break the binds of 32-bit addressing.  At a minimum, use 8GB
export LDR_CNTRL=....=MAXDATA=0x200000000  # 2 with 8-zeros
    2c.  Many large customers are providing extra safety to their servers by using 16GB:
export LDR_CNTRL=....=MAXDATA=0x400000000  # 4 with 8-zeros

There is no performance penalty for providing virtual memory allocations larger than required by the application.

 - If the server only uses 2GB of space in 64-bit ... specifying 16GB just provides an upper bound cushion.
    When an unexpected user query causes a sudden memory surge, the extra memory keeps the server running.

3.  The next benefit to 64-bit is that you can provide huge thread stack sizes for

     strange queries that might otherwise crash the server. 
    nqsserver uses fast recursive algorithms to traverse complicated control structures.
    This means lots of thread space to hold the stack frames.
    3a. Stack frames mostly contain register values;  64-bit registers are twice as large as 32-bit
         At a minimum you should  quadruple the size of the server stack threads in NQSConfig.INI
         when migrating from 32- to 64-bit, to prevent a rogue query from crashing the server.  
        Allocate more than is normally necessary for safety.
    3b. There is no penalty for allocating more stack size than you need ...
          it is just virtual memory;   no real resources  are consumed until the extra space is needed.
    3c. Increasing thread stack sizes may require the process heap size (MAXDATA) to be increased.
          Heap space is used for dynamic memory requests, and for thread stacks.
          No performance penalty to run with large heap and thread stack sizes.
          In a 32-bit world, this safety would require careful planning to avoid exceeding 2GB usable storage.
     3d. Increasing the number of threads also may require additional heap storage.
          Most thread stack frames on obiee are allocated when the server is started,
          and the real memory usage increases as threads run work.

Does 2.8GB sound like a lot of memory for an AIX application server?

- I guess it is what you are accustomed to seeing from "grandpa's applications".
- One of the primary design goals of obiee is to trade memory for services ( db, query caches, etc)
- 2.8GB is still well under the 4GB heap size allocated with MAXDATA=0x100000000
- 2.8GB process size is also possible even on 32-bit Windows applications
- It is not unusual to receive a sudden request for 30MB of contiguous storage on obiee.
- This is not a memory leak;  eventually the nqsserver storage will stabilize, but it may take days to do so.

vmstat is the tool of choice to observe memory usage.  On AIX vmstat will show  something that may be 
startling to some people ... that available free memory ( the 2nd column ) is always  trending toward zero ... no available free memory.  Some customers have concluded that "nearly zero memory free" means it is time to upgrade the server with more real memory.   After the upgrade, the server again shows very little free memory available.

Should you be concerned about this?   Many customers are !!  Here is what is happening:

- AIX filesystems are built on a paging model.  
If you read/write a  filesystem block it is paged into memory ( no read/write system calls )
- This filesystem "page" has its own "backing store" on disk, the original filesystem block.
   When the system needs the real memory page holding the file block, there is no need to "page out".
   The page can be stolen immediately, because the original is still on disk in the filesystem.
- The filesystem  pages tend to collect ... every filesystem block that was ever seen since
   system boot is available in memory.  If another application needs the file block, it is retrieved with no physical I/O.

What happens if the system does need the memory ... to satisfy a 30MB heap
request by nqsserver, for example?

- Since the filesystem blocks have their own backing store ( not on a paging device )

  the kernel can just steal any filesystem block ... on a least-recently-used basis
  to satisfy a new real memory request for "computation pages".

No cause for alarm.   vmstat is accurately displaying whether all filesystem blocks have been touched, and now reside in memory.  

Back to nqsserver:  when should you be worried about its memory footprint?
Answer:  Almost never.   Stop monitoring it ... stop fussing over it ... stop trying to optimize it.
This is a production application, and nqsserver uses the memory it requires to accomplish the job, based on demand.

C'mon ... never worry?   I'm from New York ... worry is what we do best.

Ok, here is the metric you should be watching, using vmstat:

- Are you paging ... there are several columns of vmstat output

bash-2.04$ vmstat 3 3

System configuration: lcpu=4 mem=4096MB

kthr    memory             
page              faults        cpu   
----- ------------ ------------------------ ------------ -----------
 r  b    avm fre re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 0  0 208492  2600   0   0   0   0    0   0  13   45  73  0  0 99  0
 0  0 208492  2600   0   0   0   0    0   0   9   12  77  0  0 99  0
 0  0 208492  2600   0   0   0   0    0   0   9   40  86  0  0 99  0

is the "free memory" indicator that trends toward zero

  is "re-page".  The kernel steals a real memory page for one process;  immediately repages back to original process
"page in".   A process memory page previously paged out, now paged back in because the process needs it
"page out" A process memory block was paged out, because it was needed by some other process

Light paging activity ( re, pi, po ) is not a concern for worry.   Processes get started, need some memory, go away.

Sustained paging activity
 is cause for concern.   obiee users are having a terrible day if these counters are always changing.

Hang on ... if nqsserver needs that memory and I reduce MAXDATA to keep the process under control, won't the nqsserver process crash when the memory is needed?

Yes it will.
  It means that nqsserver is configured to require too much memory and there are  lots of options to reduce the real memory requirement.
 - number of threads
 - size of query cache
 - size of sort

But I need nqsserver to keep running.

Real memory is over-committed.
   Many things can cause this:
- running all application processes on a single server

   ... DB server, web servers, WebLogic/WebSphere, sawserver, nqsserver, etc.
  You could move some of those to another host machine and communicate over the network
  The need for real memory doesn't go away, it's just distributed to other host machines.
AIX LPAR is configured with too little memory.  
  The AIX admin needs to provide more real memory to the LPAR running obiee.
- More memory to this LPAR affects other partitions.
Then it's time to visit your friendly IBM rep and buy more memory.


Post a Comment:
Comments are closed for this entry.

Dick Dunbar
is an escalation engineer working in the Customer Engineering & Advocacy Lab (CEAL team)
for Oracle Analytics and Performance Management.
I live and work in Santa Cruz, California.
I'll share the techniques I use to detect, avoid and repair problems.


« July 2016