FAQ for Windchill on Solaris

FAQ for Windchill on Solaris

Introduction

I work in Sun's "ISV Engineering" team.  Our responsibilities include working with Sun's key ISV partners to port, tune and optimize industry leading applications on Sun's hardware and software stack, and to ensure that the latest solutions from the ISV's are certified on the latest products from Sun. One of the applications that I focus on is Windchill from  PTC.  As such, I am in frequent contact with PTC's R&D, Global Services, QA and customers, all of whom bring varying degrees of Solaris expertise.  This is a collection of questions and answers.

Section I: HARDWARE CONFIGURATION

1.  What server configurations are recommended?

Windchill will run on hardware ranging from laptops to large servers.  Solaris is typically used for installations where the user count ranges from 50 to several thousands of engineers.  While you can run a full Windchill installation on a single server, typical installations split the Oracle database and Windchill application tier onto different tiers.  

2. Does Sun recommend horizontal or vertical scaling for Windchill?

"Vertical scaling" is typically used at the database tier.  This strategy is where multiple CPU's are used in a single server.  For example, if four SPARC64 VI processors are required at the database tier, but you would like room for future expansion, a Sun SPARC Enterprise M5000 with eight CPU slots could be used.  Four slots would be populated, and four additional CPU slots would be available for future expansion.. 

"Horizontal scaling" is often used at the application tier.  This strategy is where multiple servers are combined to accomplish a larger workload.  Often, one Sun Fire T2000 is sufficient to meet the current requirements at the application tier, but the installation does not leave room for future expansion. This is OK because the Windchill application tier scales horizontally.  A drawback is that a "Winchill cluster" is somewhat more difficult to administer, so it is better if you have a rather savvy IT staff.  If you are using an ASP with limited Windchill experience, vertical scaling at the application tier may be a better approach.


3.  Is a "highly available" solution recommended?

Yes, the cost of engineering talent is to high to allow for long downtimes.  A "Windchill cluster" is highly available at the application tier.  If one node fails, users who were logged into Windchill on the failing node will loose their Windchill sessions.  When they attempt to reconnect, they will establish a session with one of the other nodes.

In addition to the Windchill cluster, we recommend an "active/active" Sun Cluster installation, as follows:

  • Oracle standalone is a potential single point of failure, and therefore we recommend Sun Cluster HA Oracle for fail over at the database tier.  (Oracle RAC is an alternative, but quite expensive, and not expected to scale well beyond two nodes).  With Sun Cluster HA Oracle, one node is actively running Oracle while the other node is passive.  Oracle will be launched on the passive node in the event of a failure.
  • There are several Windchill components that are single points of failure, including the Windchill master cache server, Aphelion and the background method server.  We recommend that these services be run on the passive Oracle node. On failure, a Sun Cluster agent can launch the Windchill services on the active Oracle node.

4. Does Sun publish server sizing recommendations for Windchill?

Yes, see https://www.sun.com/third-party/srsc/resources/ptc/PTCWindchill8.0T2000SizingGuide.pdf

5. What is the hardware configuration?

# prtdiag
# prtconf -pv | more


6. How can the disk devices and disk partition table be viewed?

# format
# prtvtoc /dev/dsk/c3t1d0s2


7. How much free disk space is available?

# df -h


8. How much disk space is a file/directory using?

# du -sh my_dir


9. Where did all of the space on this disk go?

# cd mountpoint (identify mount point with df -h)
# du -sk \* | sort -n
# cd biggest_dir
(repeat, working your way down the to offensive directories.)


10. What fibre channel devices are on line?

# luxadm probe



Section II: SOLARIS KERNEL SETTINGS FOR WINDCHILL

1. What version of Solaris is running?

# cat /etc/release 

2. What kernel tuning is necessay for Windchill?

Solaris 10 out of the box is well tuned.

3. Any tweaks?  Add this to /etc/system:

\* slow down the fsflush daemon
set autoup=600


4. Are there any T2000 specific settings?  Add this to /etc/system:

\* Sun recommended settings for T2000's running S10u3
set pcie:pcie_aer_ce_mask=0x1
set segkmem_lpsize=0x400000


5. Are there any Oracle kernel parameters?  Add this to /etc/system:

\* Oracle 10g Settings
set noexec_user_stack=1


6. SunCluster kernel parameters?  Add this to /etc/system:

\* Start of lines added by SUNWscr
set rpcmod:svc_default_stksize=0x6000
set ge:ge_intr_mode=0x833
\* Disable task queues and send all packets up to Layer 3
\* in interrupt context.
\* Uncomment the appropriate line below to use the corresponding
\* network interface as a Sun Cluster private interconnect. This
\* change will affect all corresponding network-interface instances.
\* For more information about performance tuning, see
\* http://www.sun.com/blueprints/0404/817-6925.pdf
\* set ipge:ipge_taskq_disable=1
set ce:ce_taskq_disable=1
\* End of lines added by SUNWscr

\* When you use the ce Sun Ethernet driver for public network connections
set ce:ce_reclaim_pending=1


7. What about the recommendation for Oracle /etc/system changes such as shmsys:shminfo_shmmax?

No longer required with Solaris 10.  Most are obsolete or dynamically adjustable.
Instead use:
   projadd -c "Oracle Project" -U oracle,root -K \\
   "project.max-shm-memory=(priv,6GB,deny)" -K \\
   "process.max-sem-nsems=(priv,2048,deny)" user.oracle    



Section III: PATCH LEVELS

1. What patches are on the system?

# showrev -p


2. How can the Solaris patch level be kept up to date using a GUI?

# updatemanger


3. How can the Solaris patch level be kept up to date using a CLI?

# smpatch



Section IV: SAR

1. Setting up sar for one minute samples and long term logging:

# su - sys
# crontab -l > /tmp/crontab.txt
# vi /tmp/crontab.txt

0 \* \* \* 0-6 /usr/lib/sa/sa1 60 60
# 20,40 8-17 \* \* 1-5 /usr/lib/sa/sa1
5 18 \* \* 1-5 /usr/lib/sa/sa2 -s 8:00 -e 18:01 -i 1200 -A
0 1 \* \* 2 /opt/sar_bk/mk_sar_bk.sh

# crontab /tmp/crontab.txt
# vi /opt/sar_bk/mk_sar_bk.sh

mkdir /opt/sar_bk/sar_`date +%Y%m%d`
cp /var/adm/sa/\* /opt/sar_bk/sar_`date +%Y%m%d`


Section V: MONITORING CPU ACTIVITY

1. What tools are used to monitor the current CPU activity level?  All of these work:

# prstat
# top
# xcpustate -disk &
# vmstat 10 10
# mpstat 10 10


2. How busy was the CPU after lunch?

# sar -u -s 13:00 -e 13:30


3. Which cores/CPU's are BUSY?

# mpstat



Section VI: PROCESSORS

1. Are the processors on line?

# psrinfo


2. How can you take a processor off line for performance analysis?

# psradm -f 16 (where "16" is the id of the processor)


3. How can you put the processor back on line?

# psradm -n 16 (where "16" is the id of the processor)



Section VII: RUN QUEUE

If you have more requests for processing than compute cycles, processes are scheduled in the run queue.  A large run queue indicates that you need to find more compute cycles (i.e. buy more/faster CPU's) or reduce the workload (i.e. application tuning)

1. How can you see the current run queue length?

# vmstat 10 10 (Watch the "r" column.)


2. How can you see the historical run queue length?

# sar -q  (specifically, look at runq-sz %runocc)



Section VIII: DISK ACTIVITY

1. Which disk is currently busy?

# iostat -mxPzn 10 10


2. Which disk were historically busy?

# sar -f /usr/adm/sa/sa13 -s 20:19 -e 21:19 -d  -i 3400 | grep -v , | grep -v .fp | grep -v md | more


Section IX: PROCESSES, THREADS, SYSTEM CALLS AND LOCKS

1. Which processes are busy?

# prstat

 2. Which processes are making a lot of system calls?

# prstat -m (watch the "SCL" system call column)

3. Which threads inside of a process are busy?

# prstat -L -p 1332

4. Which processes can benefit from a muli-core server like the T2000?

Processes with a significant number of threads or processes may benefit.  Windchill method servers and Tomcat both run well with a large number of cores and run well on the T2000.  (In contrast, Pro/E is not a good match for the T2000.) 

5. Which processes are have many threads (LWP's)?

# ps -e -o"nlwp,pid,args" | sort -n

6. What system calls is a process making?

# truss -c -p 1332

7. The "ps" command truncates the Java arguments.  How do you see the full list? 

# pargs 1332

8.  How can you see how much time each thread has taken

# ps -o"lwp,time,args" -L  -p 2282

9 Is there locking?

# plockstat -C -p 5992
# lockstat -CPD 5 sleep 10

10. If the process is locking on malloc, how can you use a threaded malloc?

# LD_PRELOAD=/usr/lib/libumem.so



11. How can you see the current stack trace of a process? 

# pstack 1332



Section X: MEMORY, VIRTUAL MEMORY AND SWAP SPACE

1. What swap devices are mounted?

# swap -l


2. How much swap space is used/remaining?

# swap -s
# vmstat 10 10

3. Is there pressure on the virtual memory system, currently?

vmstat 10 10 (watch the "sr" scan rate column.)

4. How much free memory and swap have been available, historically?

# sar -r  (freemem freeswap)

5. Was there pressure on the virtual memory system, historically?

# sar -g  (watch pgscan/s)


6. Which processes are using the most RAM? Sort processes by Resident Set Size

ps -e -o"rss,pid,args" | sort -n


7. Which processes are using the swap space? Sort processes by Virtual size

ps -e -o"vsz,pid,args" | sort -n



Section XI: IO

1. What files does a process have open?

# pfiles 4514 | grep /


2. Which disks are busy?

# iostat -mxPzn 10 10
# xcpustate -disk &



Section XII: NETWORK STATUS

1. Overall network status?

# netstat -i
# netstat -sPtcp


2. What ports are processes listening on?

# netstat -a | grep  LISTEN


3. What sockets does a process have open?  Here is an example that shows that Apache has port 80 open.

# pfiles 4514
   3: S_IFSOCK mode:0666 dev:310,0 ino:59012 uid:0 gid:0 size:0
      O_RDWR
        SOCK_STREAM
        SO_REUSEADDR,SO_KEEPALIVE,SO_SNDBUF(49152),SO_RCVBUF(49152),IP_NEXTHOP(0.0.192.0)
        sockname: AF_INET6 ::  port: 80


netstat -an | grep LIST | grep 1158

snoop

In my environment, running "netstat -s" on the Windchill
application tier reported thousand of tcpListenDrop, and therefore,
network tuning was required:

     #cat /etc/init.d/network-tuning
     /usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q 2048
     /usr/sbin/ndd -set /dev/tcp tcp_conn_req_max_q0 8192
     /usr/sbin/ndd -set /dev/udp udp_smallest_anon_port 8192
     /usr/sbin/ndd -set /dev/tcp tcp_smallest_anon_port 8192

     # ln -s /etc/init.d/network-tuning /etc/rc2.d/S99network-tuning

  Also needed to increased maxSockets in wt.properties:
    wt.method.rmi.maxSockets=800




SECTION XIII: RANDOM CRASHES

1. How can you detect random application crashes?

AppCrash
http://blogs.sun.com/gregns/



SECTION XIV: ORACLE CONFIGURATION AND TUNING

The following setting worked well for the test database any use patterns provided by PTC.  You mileage will vary.

    a)A 5 GB SGA was required for the test database:

        ALTER SYSTEM SET sga_max_size=5g SCOPE=spfile;
        ALTER SYSTEM SET sga_target=5g SCOPE=spfile;

        Total System Global Area 5368709120 bytes
        Fixed Size 2037688 bytes
        Variable Size 939526216 bytes
        Database Buffers 4412407808 bytes
        Redo Buffers 14737408 bytes


    b)Gather Statistics:

        exec DBMS_STATS.GATHER_SCHEMA_STATS ( OWNNAME=>'DUBLIN80M010',
        estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, CASCADE=>TRUE );


    c)System Statistics:

        execute dbms_stats.gather_system_stats('Start');


    d)Verify system statistics:

        select pname, pval1 from sys.aux_stats$ where sname =
        'SYSSTATS_MAIN'


    e)Increase cursors:

        ALTER SYSTEM SET open_cursors = 2500 SCOPE=SPFILE;


    f)Use push join union:

         alter system set "_push_join_union_view"=true scope=spfile;


    g)Add some indexes recommended by Oracle Enterprise Manager Advisors:

        CREATE INDEX "DUBLIN80M010"."MILESTONE_IDX$$_012C000B"
        ON "DUBLIN80M010"."MILESTONE"
        ("MARKFORDELETEA2",UPPER("NAME"))
         COMPUTE STATISTICS;

        CREATE INDEX "DUBLIN80M010"."DELIVERABLE_IDX$$_012C000C"
        ON "DUBLIN80M010"."DELIVERABLE"
        ("MARKFORDELETEA2",UPPER("NAME"))
        COMPUTE STATISTICS;

        CREATE INDEX "DUBLIN80M010"."PROJECTACTIVITY_IDX$$_012C000D"
        ON "DUBLIN80M010"."PROJECTACTIVITY"
        ("MARKFORDELETEA2",UPPER("NAME"))
        COMPUTE STATISTICS;

        CREATE INDEX "DUBLIN80M010"."MANAGEDBASELINE_IDX$$_012C000E"
        ON "DUBLIN80M010"."MANAGEDBASELINE"
        ("MARKFORDELETEA2",UPPER("NAME"))
        COMPUTE STATISTICS;

        CREATE INDEX "DUBLIN80M010"."WTORGANIZATION_IDX$$_012C000F"
        ON "DUBLIN80M010"."WTORGANIZATION"
        ("MARKFORDELETEA2",UPPER("NAME"))
        COMPUTE STATISTICS;

        CREATE INDEX "DUBLIN80M010"."PROJECTPLAN_IDX$$_012C0010"
        ON "DUBLIN80M010"."PROJECTPLAN"
        ("MARKFORDELETEA2",UPPER("NAME"))
        COMPUTE STATISTICS;

        CREATE INDEX "DUBLIN80M010"."REPORTTEMPLATE_IDX$$_012C0011"
        ON "DUBLIN80M010"."REPORTTEMPLATE"
        ("MARKFORDELETEA2",UPPER("NAME"))
        COMPUTE STATISTICS;

        CREATE INDEX "DUBLIN80M010"."WTPRODUCT_IDX$$_00740001"
        ON "DUBLIN80M010"."WTPRODUCT"
        ("IDA3CONTAINERREFERENCE","LATESTITERATIONINFO","MARKFORDELETEA2")
        COMPUTE STATISTICS;

    h) Analyze Oracle

        emctl start dbconsole
        http://scnode2:1158/em

    i) AWR and ASH reports
        sqlplus sys/manager as sysdba
        @$ORACLE_HOME/rdbms/admin/awrrpt.sql
        @$ORACLE_HOME/rdbms/admin/ashrpt.sql


SECTION XV: METHOD SERVER LOG

1.  Any fancy Unix commands to summarize the Method Server log?

# cat M\*log | grep Exception  | cut -d: -f 5- | sed -e 's/[0-9]\*_OutdoorProducts_Org1_Admin_ActionItem[-_0-9]\*/XXXX/' | sort | uniq -c | sort -n




Comments:

Interesting post. Do you know which J2EE version uses WindChill on Windows platform? Thank

Posted by Luigi BELLI on June 14, 2007 at 10:50 PM EDT #

This is a very good post!
Is there any stuff to check if Tomcat is behaving correctly? A count of netstat threads does give an indication of how busy it is. Anything else that comes to mind?

Posted by Prashant Lele on October 23, 2007 at 05:17 AM EDT #

This is great information. Thank you for sharing it.

Is similar information available for the T5210/T52209 and Windchill 8.0/9.0? We're considering moving from 4 and 8 CPU V880 which is 4 or 8 CPU sockets, and going to a single socket T5220 seems scary even though the M-Values and other information say the T5220 should fly.

Thanks again for information.

Posted by Joe Ward on April 11, 2008 at 01:39 AM EDT #

Nice Post....

In our case.. CPU utilization is ok...Medhode server(Quantity 2) max and Min heap size is 1GB, background server(Quantity 3) max and min heap is 1 GB.. I started load of 300 users but it was performing ok for 150 users but after that it allowed to 50 more users but response time for each hit was near to 2 minute.rest user was not able to login as i kept timeout for login as 1000 sec.Database is having 8GB RAM but fully occupied as SGA is set to that extand.. need help to set configuration to support more then 300 users.

Posted by sumit arneja on October 02, 2008 at 08:50 AM EDT #

I am working with Windchill PLM_7.0 please provide me some stuff that could help me in windchill -7 (PLM) installation.

Posted by Anuranjan Kumar on November 18, 2008 at 12:18 AM EST #

Hi Jeff,

Really good write-up with valuable insights. Request you to kindly clarify a small doubt;

Is the WindChill 9.0 Certified to be deployed on x86 Servers running Solaris 10 x86 version of the OS?

regards,
Sachin

Posted by Sachin Bhat on August 04, 2009 at 12:15 AM EDT #

Post a Comment:
  • HTML Syntax: NOT allowed
About

Jeff Taylor-Oracle

Search

Archives
« February 2015
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
20
21
22
23
24
25
26
27
28
       
       
Today