Monday Mar 14, 2011

New Role

I've been a bad person for not updating my blog for so log but was prevented by a heavy workload at the time. However I've now left the VOSJEC and have moved into engineering to support Exadata amongst other things. I shall therefore be blogging about my encounters on that platform and the others I shall be supporting..... so watch this space....if anyone actually reads my blog.... ;)

Friday Oct 23, 2009

OpenSolaris and a V100

Buried in the depths of my loft is a well used and abused SunFire V100 which I've been meaning to see if I could get working with OpenSolaris. It was fine and dandy with the the last build of Nevada I tried (125) so technically should work with OpenSolaris. However, there is one problem in that OpenSolaris Sparc is only bootable via WAN boot...which unfortunately the V100 doesn't support. So I thought I could be a little sneaky here (and totally unsupported of course) and hack OpenSolaris onto it. :) So knowing that you can do ZFS backup and restore my plan was to install OpenSolaris 0906 on a lab system at work and then take a backup so I could restore it on my home V100. All sounds feasible according to the OpenSolaris administration guide.

So from my lab box I took a snapshot and then dumped it down to a remote NFS server:

jack@v4u-v480b-gmp03:~# zfs snapshot -r rpool@0804

jack@v4u-v480b-gmp03:~# zfs list -t snapshot
NAME                             USED  AVAIL  REFER  MOUNTPOINT
rpool@0804                          0      -    65K  -
rpool/ROOT@0804                     0      -    19K  -
rpool/ROOT/opensolaris@install  58.5M      -  3.96G  -
rpool/ROOT/opensolaris@0804      218K      -  4.05G  -
rpool/dump@0804                     0      -  2.53G  -
rpool/export@0804                   0      -    21K  -
rpool/export/home@0804              0      -    21K  -
rpool/export/home/jack@0804         0      -  21.5K  -
rpool/swap@0804                     0      -  25.3M  -

jack@v4u-v480b-gmp03:~# zfs send -Rv rpool@0804 > /mnt/rpool.0804
sending from @ to rpool@0804
sending from @ to rpool/swap@0804
sending from @ to rpool/export@0804
sending from @ to rpool/export/home@0804
sending from @ to rpool/export/home/jack@0804
sending from @ to rpool/ROOT@0804
sending from @ to rpool/ROOT/opensolaris@install

One the remote NFS server I actually compressed the rpool dump down to 2g in size as I then rsync'd the image home.

On the home system I then booted from the network:

boot net -s

# mkdir /var/tmp/a

# zpool create -f -o version=14 -O version=3 -o failmode=continue -R /var/tmp/a -m legacy -o cachefile=/etc/zfs/zpool.cache rpool mirror c0t0d0s0 c0t2d0s0

NOTE: If you happen to boot from an OS which has a newer version of ZFS and create the pool with that newer version then you might have a compatibility issue as I did and end up with a panic as follows:

Boot device: /pci@9,600000/SUNW,qlc@2/fp@0,0/disk@w21000004cf9973a1,0:a  File and args:
SunOS Release 5.11 Version snv_111b 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
NOTICE: zfs_parse_bootfs: error 48
Cannot mount root on rpool/79 fstype zfs

panic[cpu3]/thread=180e000: vfs_mountroot: cannot mount root

000000000180b950 genunix:vfs_mountroot+320 (200, 1876000, 18a3800, 18d3800, 12588f8, 1876000)
  %l0-3: 0000000000000064 000000000001d4c0 0000000000001d4c 0000000001876100
  %l4-7: 0000000001258800 0000000000000600 0000000000000200 0000000000000800
000000000180ba10 genunix:main+c4 (1815000, 180bec0, 1815388, 1838440, 1, 18393d8)
  %l0-3: 000000000180e000 0000000070002000 0000000000000000 000000000183d800
  %l4-7: 0000000000000003 000000000180c000 0000000070002000 0000000000000003

skipping system dump - no dump device configured
rebooting...

Resetting ...

So the ENOTSUP 48 error from zfs_parse_bootfs() is telling us that the pool version is too new and hence the reason when you create the pool you'll need to specify the correct version for the restoring OS.

Let's mount the remote image

# mount xxx.xxx.xxx.xxx:/backup /mnt

cd /mnt

and then restore the pool data from the compressed backup:

bzcat rpool.0804.bz2 | zfs receive -Fdu rpool &

then set the bootfs flag to the correct root location and install the boot blocks (on both mirrors):

zpool set bootfs=rpool/ROOT/opensolaris rpool

installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t0d0s0
installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t2d0s0

finally we can init 6 and boot our system into OpenSolaris 0906

yay (in a sad a geeky way).

Saturday Oct 17, 2009

Solaris resource controls and Oracle CRS/RAC

In an earlier blog entry about performance considerations when upgrading solaris, I discussed using Oracle with resource controls instead of /etc/system settings but forget to include details about using with Oracle CRS (Cluster Ready Services). Oracle CRS is required for Oracle RAC configurations and you can run into this bug CR 5000940 - "project 0's rctls are not applied from /etc/project" when migrating to using resource controls instead of /etc/system entries. The impact of this is that you won't be able to start Oracle and get something like the following in the alert log:

WARNING: EINVAL creating segment of size 0x000000021e000000

fix shm parameters in /etc/system or equivalent

There is a workarounds described both in the bug report and in metalink document 435464.1 - "ENABLING SOLARIS PROJECT SETTINGS FOR CRS".

Monday Sep 28, 2009

Problems with Solaris and a DISM enabled Oracle Database

Resolving problems becomes ever more challenging in todays complex application environments and sometimes error messages can be misleading or confusing to the system administrator / DBA trying to resolve the issue. With my involvement with the VOSJEC (Veritas Oracle Sun Joint Escalation Center), we've seen a steady increase in the number of cases being raised about customers using Oracle with DISM enabled on Solaris systems and in the majority of cases disabling DISM appears to resolve the issue but with DISM use becoming more popular and making DBA's lives easier I personally don't feel as though that's the real right course of action. DISM essentially enables the DBA the ability to resize the SGA on the fly without taking down the database and maximize availability and avoid downtime.  See the following page from the Administering Oracle Database on Solaris documentation for further details.

For knowing about other DISM related issues then have a read of Solution 230653 : DISM Troubleshooting For Oracle9i and Later Releases as it gives some background of DISM and what can go wrong.

How do I know if the customer is using DISM or not...

\* There should be an ora_dism process running on the system so you can check from the ps listing (explorer / guds).

\* Check in the Oracle alert log as that should also mention the DISM process starting. It should also log the fact that the DISM process has died which means your Oracle shared memory segmented aren't locked and can cause other problems .... see Solution 230653 : DISM Troubleshooting For Oracle9i and Later Releases.

So, what the issue you here yourself asking, well several customers complained about Oracle batch jobs crashing or the entire database falling over with something like:

Sun Mar 15 14:53:28 2009
Errors in file /u01/app/oracle/admin/JSPSOL0/udump/jspsol03_ora_23220.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
SVR4 Error: 12: Not enough space
Additional information: 4
Additional information: 938866512
Additional information: -1

OR

Mon Aug 24 09:01:17 2009
Errors in file /opt/oracle/admin/MW/bdump/mw1_lgwr_18124.trc:
ORA-00340: IO error processing online log 1 of thread 1
ORA-00345: redo log write error block 167445 count 11
ORA-00312: online log 1 thread 1: '/dev/vx/rdsk/mwdbdg/redo11'
ORA-27070: async read/write failed
SVR4 Error: 12: Not enough space
Additional information: 1
Additional information: 167445
LGWR: terminating instance due to error 340

The first jump to cause would indicate that this is some kind of I/O issue and that perhaps we've run out of space on the backing storage layer. Which unfortunately in this case wasn't the reason as all filesystems had enough free space available. The next hit in our possible causes, is perhaps that we don't have enough swap space available as in using DISM you have to have enough swap space configured as DISM segments in use, see  Solution  214947 :   Solaris[TM] Operating System: DISM double allocation of memory for details. Again, the customer appeared to have the correct swap space configuration for the DISM segment currently in use.
Now, what I actually suspected was that CR 6559612 multiple softlocks on a DISM segment should decrement availrmem just once is coming into play here which if you read engineerings comments on it says can only happen if DISM isn't locked properly and we end up going down a different code path due to seg_pdisable being set to 1. There is another reason for seg_pdisable being set and that's when we're trying to handle page retirement. Coincidentally for the two situations mentioned above both had pending page retirements happening  due CR 6587140 -  "page_retire()/page_trycapture should not try to retire non relocatable kernel pages" fixed in KJP 141414-06 (latest 10) so there was a window for opportunity for CR 6559612 to happen every once and a while.

From what I can currently work out CR 6559612 - "multiple softlocks on a DISM segment should decrement availrmem just once" is a duplicate of CR 6603296 - "Multiple writes into dism segment reduces available swap" and also mentions duplicate CR 6644570 - "Application using large page sizes with DISM on Huron got pread to return error ENOMEM".

There seems to be a family of related issues which are all fixed as part of CR 6423097 - "segvn_pagelock() may perform very poorly" which includes the above:

6526804 DR delete_memory_thread, AIO, and segvn deadlock
6557794 segspt_dismpagelock() and segspt_shmadvise(MADV_FREE) may deadlock
6557813 seg_ppurge_seg() shouldn't flush all unrelated ISM/DISM segments
6557891 softlocks/pagelocks of anon pages should not decrement availrmem for memory swapped pages
6559612 multiple softlocks on a DISM segment should decrement availrmem just once
6562291 page_mem_avail() is stuck due to availrmem overaccounting and lack of seg_preap() calls
6596555 locked anonymous pages should not have assigned disk swap slots
6639424 hat_sfmmu.c:hat_pagesync() doesn't handle well HAT_SYNC_STOPON_REF and HAT_SYNC_STOPON_MOD flags
6639425 optimize checkpage() optimizations
6662927 page_llock contention during I/O

These are all fixed in the development release of Solaris Nevada build 91 but haven't been backported as yet. From looking at the bug reports it does seem likely that CR 6423097 and associated CR's will get fixed in Solaris 10 U8 (scheduled for release 9th October 2009).

So, that would lead us to the following suggestions:

1/ CR 6423097 mentions disabling large page support completely to workaround the issue which you could do and still continue to use DISM:

Workaround for nevada and S10U4 is to add to /etc/system:

set max_uheap_lpsize = 0x2000
set max_ustack_lpsize = 0x2000
set max_privmap_lpsize = 0x2000
set max_shm_lpsize = 0x2000
set use_brk_lpg = 0
set use_stk_lpg = 0
and reboot.

for S10U3 and earlier S10 releases, the workaround is to add to /etc/system:

set exec_lpg_disable = 1
set use_brk_lpg = 0
set use_stk_lpg = 0
set use_zmap_lpg = 0
and reboot

NOTE: this is \*not\* the same as disabling large page coalescing as this completely disables large page support. Disabling large page coalescing is sometimes required dependent on certain types of application workload so see 215536 : Oracle(R) 10g on Solaris[TM] 10 may run slow for further details.

2/ Disable DISM entirely for the short term and use ISM with Oracle whilst waiting for the release of Solaris 10 update 8.

Conclusion / Actions
-------------------------------------

The above is essentially a hit list of things to be aware of when seeing errors of that nature. If the customer isn't using DISM at all then that might take you down another  avenue. If it does then remember that "Applications Drive System Behavior", so you need to understand the architecture, when and where things run to aid in tying the loose ends of any problem together and be able to prove root cause or not. See my previously blog entry about dealing with performance issues as it essentially takes about the same troubleshooting process I use all the time in diagnosing complex problems.

Tuesday Aug 11, 2009

Multiple Oracle instances performance issue

Finally closed a VOS case that's been open for over a year which was related to high system consumption caused by running multiple Oracle RDBMS's on a single system. The observation was 80/90% system cpu consumption from mpstat 1 and the following from lockstat profiling:

Profiling interrupt: 67240 events in 2.168 seconds (31017 events/sec)

Count genr cuml rcnt nsec Hottest CPU+PIL Caller
-------------------------------------------------------------------------------
40920 61% ---- 0.00 987 cpu[7] fop_ioctl
40920 61% ---- 0.00 987 cpu[7] ioctl
40880 61% ---- 0.00 986 cpu[7] read_kstat_data
40248 60% ---- 0.00 1077 cpu[7] syscall_trap
38780 58% ---- 0.00 947 cpu[2] mutex_vector_enter
32478 48% ---- 0.00 947 cpu[5] kstat_hold_bykid
32477 48% ---- 0.00 947 cpu[5] kstat_hold
13516 20% ---- 0.00 1845 cpu[102] (usermode)
6466 10% ---- 0.00 1904 cpu[423] syscall_trap32
6169 9% ---- 0.00 926 cpu[3] kstat_rele
4738 7% ---- 0.00 1626 cpu[96] thread_start
2420 4% ---- 0.00 1359 cpu[135]+11 idle
2317 3% ---- 0.00 1464 cpu[135]+11 disp_getwork
2122 3% ---- 0.00 2764 cpu[101] fop_read
1388 2% ---- 0.00 2510 cpu[101] vx_read
1379 2% ---- 0.00 2509 cpu[101] vx_read1
1352 2% ---- 0.00 2503 cpu[101] vx_cache_read
1267 2% ---- 0.00 2459 cpu[418] trap
1215 2% ---- 0.00 3059 cpu[128] fop_write
1082 2% ---- 0.00 2339 cpu[418] utl0
-------------------------------------------------------------------------------

Originally I raised CR 6734910 - "kstat_hold doesn't scale well on large systems" to track this but it seemed as though Oracle could do a better job of utilizing the kstat interface which then resulted in Oracle addressing the issue in:

Oracle RDBMS Versions >= 10.2.0.1 but < 11.2 - Bug/Patch 8531434 - "Solaris: Excessive CPU by MMNL/CJQ0 when running multiple instances and cpus" (Doc ID 8531434.8)

 Versions >= 11.1.0.6 but < 11.1.0.7 -> Bug/Patch 8777336 - "multiple kstat calls while getting socket count and core count for v$osstat" (Doc ID 8777336.8)

11gR2 doesn't appear effected by this issue. So to avoid a performance hit whilst running multiple Oracle instances on a single host I'd recommend you patch an effected database or if that's not possible then use the workaround of tuning Oracle parameter _job_queue_interval (e.g. from 5 to 30) but potentially loose granularity of performance statistics.

Monday Apr 20, 2009

Observability and Diagnosis Techniques are the way forward

I've just closed down another escalation involving a performance comparison of an application (Finacle in this case) running on a 6 board SunFire 6900 and a 6 board SunFire E25K. Obviously we're not comparing apples and pears here but never the less the customer found that they noticed huge performance differences during core business hours with the \*same\* (and I use that term loosely as nothing entirely can occupy the same physical space in time or made up of identical particles....you know what I mean...although perhaps some clever physicists can argue against that?!). Anyhow, the customer made the argument that they were using the same application binaries on both platforms and were delivering the same  user load (and they still couldn't confirm that!) to both but were seeing a huge performance degradation (slow interactive response from the users perspective and high load) on the E25K domain which didn't actually surprise me as the 25K is a larger system and will have larger memory latencies due to the larger interconnect differences to the E6900. So, what we observed was multiple application processes consuming userland cpu resources and no userland lock contention via the prstat micro state accounting output. If we'd suspected a scalability issue with the application then we'd more than likely observe userland lock contention. One of the frequent flyers we see in the performance team are caused by applications using malloc in multi threaded SMP environments, see this interesting read for further details. Unfortunately that wasn't to be the case so we needed to take another direction in the observability game to prove to the customer the fundamental differences between the two customer platforms. The answer here was to use cpustat and the ability to show the difference between the two platforms using cycles per instruction calculations.

The following chapter from Solaris™ Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris explains the methodology (or read from Safari Books if you've an account or buy it from Amazon as the entire book is well worth a read).

8.2.7. Cycles per Instruction

The CPC events can monitor more than just the CPU caches. The following example demonstrates the use of the cycle count and instruction count on an Ultra-SPARC IIi to calculate the average number of cycles per instruction, printed last.

# cpustat -nc pic0=Cycle_cnt,pic1=Instr_cnt 10 1 | \\
awk '{ printf "%s %.2f cpi\\n",$0,$4/$5; }'
10.034 0 tick 3554903403 3279712368 1.08 cpi
10.034 1 total 3554903403 3279712368 1.08 cpi


This single 10-second sample averaged 1.08 cycles per instruction. During this test, the CPU was busy running an infinite loop program. Since the same simple instructions are run over and over, the instructions and data are found in the Level-1 cache, resulting in fast instructions.

Now the same test is performed while the CPU is busy with heavy random memory access:

# cpustat -nc pic0=Cycle_cnt,pic1=Instr_cnt 10 1 | \\
awk '{ printf "%s %.2f cpi\\n",$0,$4/$5; }'
10.036 0 tick 205607856 34023849 6.04 cpi
10.036 1 total 205607856 34023849 6.04 cpi


Since accessing main memory is much slower, the cycles per instruction have increased to an average of 6.04.

--

So, looking at the customer's data:

[andharr@node25k]$ grep total cpu\*
cpu2.out: 30.429 48 total 1453222115333 358247394908 4.06 cpi
cpu22.out: 30.523 48 total 1463632215285 347056691816 4.22 cpi
cpu222.out: 30.367 48 total 1395799585592 423393952271 3.30 cpi

[andharr@node6900]$ grep total cpu\*
cpu1.out: 31.038 48 total 1209418147610 522125013039 2.32 cpi
cpu11.out: 30.311 48 total 1194302525311 573624473405 2.08 cpi
cpu111.out: 30.408 48 total 1105516225829 552190193006 2.00 cpi

So the 25k is showing a higher cycles per instruction average than the 6900, so it does show a difference in performance between the two systems. This is more than likely due to memory latency difference on the 25k. If we actually look at the raw data for the busy times sorted by size for the sample period then you can see some very big differences in the largest cpi value between the systems:

[andharr@node6900]$ cat cpu1.out | awk '{print $6}' |sort -n | tail
6.30
6.37
7.14
7.16
7.26
7.35
8.17
8.27
8.36
8.91
[andharr@node6900]$ cat cpu11.out | awk '{print $6}' |sort -n | tail
6.67
6.71
6.77
6.80
7.21
7.70
7.72
8.40
9.21
11.92
[andharr@node6900]$ cat cpu111.out | awk '{print $6}' |sort -n | tail
6.26
6.39
6.65
6.93
6.99
7.25
7.81
8.65
8.81
9.32

[andharr@node25k]$ cat cpu2.out | awk '{print $6}' |sort -n |tail
26.65
26.86
26.99
28.71
29.48
30.06
30.87
32.93
34.05
34.36
[andharr@node25k]$ cat cpu22.out | awk '{print $6}' |sort -n |tail
31.35
31.82
32.32
33.16
34.03
35.00
38.51
47.69
50.19
51.04
[andharr@node25k]$ cat cpu222.out | awk '{print $6}' |sort -n |tail
26.03
26.71
26.90
27.31
27.45
28.29
28.42
29.28
32.30
35.40

Conclusion

So the fact that the E25k showed higher numbers didn't really indicate a good match with the Finacle application running on it. It would seem to suggest that Finacle is generating a memory latency sensitive workload which might not be entirely suited to the 25k platform. pbinding the Finacle application into specific processor sets might alleviate thread migration and some non-local memory accesses which may improve some performance but not entirely eradicate the memory latency issue from happening. The reason it's more noticeable on the E25K platform than the E6900 is that there is a greater distance to travel across the E25K interconnect than the E6900 ie a greater distance in terms of copper for the electrical signals to travel. MPO (Memory Placement Optimization) was introduced in Solaris 9 to help elevate latency in large scale NUMA configurations by attempting to keep LWP's local to it's "home" cpu/memory board, but in some cases cannot eliminate it for all workloads (in this case).

See the following documents for background information on MPO:

Solution 214954 : Sun Fire[TM] Servers: Memory Placement Optimization (MPO) and Solution 216813 : Sun Fire[TM] Servers: Memory Placement Optimization (MPO) Frequently Asked Questions (FAQ)

As my esteemed colleague Clive said, "Applications drive system behaviour" so we need to look at the application and how it interacts with the system rather than the other way round which always points me back to one of my first blog entries on the importance of understanding the application architecture and starting a top down approach. :)

Saturday Mar 28, 2009

Don't forget truss for diagnosing issues

Oracle came to us via the VOSJEC channel for help in diagnosing a customer getting the following error in the alert log during various DML queries:

ORA-27505: IPC error destroying a port
ORA-27300: OS system dependent operation:close failed with status: 9
ORA-27301: OS failure message: Bad file number
ORA-27302: failure occurred at: skgxpdelpt1
ORA-03135: connection lost contact

and

ORA-27509: IPC error receiving a message
ORA-27300: OS system dependent operation:recvmsg failed with status: 95
ORA-27301: OS failure message: Socket operation on non-socket
ORA-27302: failure occurred at: sskgxp_rcvms
Non critical error ORA-00001 caught while writing to trace file
"/opt/oracle/diag/rdbms/tacor3p/tacor3p2/trace/tacor3p2_ora_5758.trc"
Error message: SVR4 Error: 9: Bad file number

Interestingly enough Oracle commented that the customer was not running out of file descriptors and also had enough UDP ports available for their load. Oracle were also unable to replicate this in house but appeared to be consistent at the customer site and only happened when the customer enabled Oracle auditing with the syslog audit trail option. So why did Oracle believe this was an interoperability issue? Well they provided the following truss data:

27873: open("/var/run/syslog_door", O_RDONLY) = 0
->the bad code now opens the syslog file again, at free fd 0
<snip>
27873: door_info(0, 0xFFFFFFFF7FFF78D8) = 0
27873: getpid() = 27873 [27340]
27873: door_call(0, 0xFFFFFFFF7FFF78A8) = 0
27873: close(0) = 0
->the bad code now closes 0 since it opened the file at 0
<snip>
27873: close(0) Err#9 EBADF
->again, closes fd 0 incorrectly, it is no longer open to anyone
<snip>
27873: recvmsg(0, 0xFFFFFFFF7FFF6F88, 0) Err#95 ENOTSOCK
->SKGXP now tries to use its socket without knowing someone else closed it and fails
<snip>

So Oracle believed that the open("/var/run/syslog_door" seemed suspicious as the call didn't originate from the rdbms code and hence some help was required from Sun to progress. Unfortunately that limited truss output didn't actually show us the where in the process lifecycle we were going wrong so we needed to collect some more data from the customer to work it all out. With truss you can use the -u option to do user level function call tracing i.e:



truss -faeo /var/tmp/out -u a.out:: -u libc:: ./a.out

So, what did this new data actually show us:

5389: execve("/opt/oracle/product/11.1.0/bin/sqlplus", 0xFFFFFFFF7FFFFB38, 0xFFFFFFFF7FFFFB48) argc = 1
5389: argv: sqlplus
6085/1@1: -> libc:_open(0xffffffff7b3e7498, 0x0, 0x0, 0x0)
6085/1@1: -> libc:__open(0xffffffff7b3e7498, 0x0, 0x0, 0x0)
6085/1: open("/var/run/syslog_door", O_RDONLY) = 0
6085/1@1: <- libc:__open() = 0
6085/1@1: <- libc:_open() = 0
6085/1@1: -> libc:_door_info(0x0, 0xffffffff7fff7978, 0x0, 0x2000)
6085/1: door_info(0, 0xFFFFFFFF7FFF7978) = 0
6085/1@1: <- libc:_door_info() = 0
6085/1@1: -> libc:getpid(0x0, 0xffffffff7fff7978, 0x0, 0x2000)
6085/1: getpid() = 6085 [5389] 6085/1@1: <- libc:getpid() = 6085
6085/1@1: -> libc:_door_call(0x0, 0xffffffff7fff7948, 0x0, 0x2000)
6085/1: door_call(0, 0xFFFFFFFF7FFF7948) = 0
6085/1@1: <- libc:_door_call() = 0
6085/1@1: -> libc:_close(0x0, 0xffffffff7fff7948, 0x0, 0x2000)
6085/1: close(0) = 0
6085/1@1: <- libc:_close() = 0
6085/1@1: <- libc:syslogd_ok() = 1
6085/1@1: <- libc:vsyslog() = 0xffffffff7fff83c8
6085/1@1: -> libc:gettimeofday(0xffffffff7fff8540, 0x0, 0x0, 0x0)
6085/1@1: -> libc:_cancelon(0x0, 0x0, 0x0, 0x0)
6085/1@1: <- libc:_cancelon() = -19
6085/1@1: -> libc:_close(0x0, 0x1, 0x0, 0x0)
6085/1: close(0) Err#9 EBADF
6085/1@1: -> libc:_cerror(0x9, 0x1, 0x0, 0x0)
6085/1@1: -> libc:___errno(0x0, 0x0, 0x0, 0x0)
6085/1@1: <- libc:___errno() = 0xffffffff7b4fc9bc
6085/1@1: <- libc:_close() = -1
6085/1@1: -> libc:_canceloff(0xffffffffffffffff, 0x1, 0x0, 0x0)
6085/1@1: <- libc:_canceloff() = 0
6085/1@1: -> libc:times(0xffffffff7fff9460, 0x108c65000, 0x108c65, 0x108c00)
6085/1: times(0xFFFFFFFF7FFF9460) = 86657683
6085/1@1: -> libc:gethrtime(0x0, 0x108c00, 0x10a0c7, 0x400)
6085/1@1: -> libc:setjmp(0xffffffff7fffa7e0, 0x10a1025b0, 0x10a0bc4b0, 0x46100)
6085/1@1: -> libc:setjmp(0xffffffff7fffa5f0, 0x10a1025b0, 0x10a0bc4b0, 0x46100)
6085/1@1: -> libc:setjmp(0xffffffff7fffa5f0, 0x10a1025b0, 0x10a0bc4b0, 0x46100)
6085/1@1: -> libc:setjmp(0xffffffff7fffa5f0, 0x10a1025b0, 0x10a0bc4b0, 0x46100)
6085/1@1: -> libc:setjmp(0xffffffff7fffa5f0, 0x10a1025b0, 0x10a0bc4b0, 0x46100)
6085/1@1: -> libc:setjmp(0xffffffff7fffa5f0, 0x10a1025b0, 0x10a0bc4b0, 0x46100)
6085/1@1: -> libc:gettimeofday(0xffffffff7fffa620, 0x0, 0xffffffffffffffff, 0xfffffffffffffff8)
6085/1@1: -> libc:localtime_r(0xffffffff7fffa618, 0xffffffff7fffa5f4, 0x0, 0xec23d)
6085/1@1: -> libc:getsystemTZ(0xffffffff7fffa528, 0x0, 0x0, 0x0)
6085/1@1: -> libc:getenv(0xffffffff7b3e5b68, 0x0, 0x0, 0x0)

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/gen/syslog.c

384 /\*
385 \* Use a door call to syslogd to see if it's alive.
386 \*/
387 static int
388 syslogd_ok(void)
389 {
390 int d;
391 int s;
392 door_arg_t darg;
393 door_info_t info;
394
395 if ((d = open(DOORFILE, O_RDONLY)) < 0)
396 return (0);
397 /\*
398 \* see if our pid matches the pid of the door server.
399 \* If so, syslogd has called syslog(), probably as
400 \* a result of some name service library error, and
401 \* we don't want to let syslog continue and possibly
402 \* fork here.
403 \*/
404 info.di_target = 0;
405 if (__door_info(d, &info) < 0 || info.di_target == getpid()) {
406 (void) close(d);
407 return (0);
408 }
409 darg.data_ptr = NULL;
410 darg.data_size = 0;
411 darg.desc_ptr = NULL;
412 darg.desc_num = 0;
413 darg.rbuf = NULL;
414 darg.rsize = 0;
415 s = __door_call(d, &darg);
416 (void) close(d);
417 if (s < 0)
418 return (0); /\* failure - syslogd dead \*/
419 else
420 return (1);
421 }

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/gen/syslog.c

340 /\* output the message to the local logger \*/
341 if ((putmsg(LogFile, &ctl, &dat, 0) >= 0) && syslogd_ok())
342 return;
343 if (!(LogStat & LOG_CONS))
344 return;

so syslogd_ok returns 1 and then we return out of vsyslog():

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/gen/syslog.c

158 void
159 syslog(int pri, const char \*fmt, ...)
160 {
161 va_list ap;
162
163 va_start(ap, fmt);
164 vsyslog(pri, fmt, ap);
165 va_end(ap);
166 }


So we've returned from vsyslog() then into gettimeofday() and then our second close() returns EBADF.

6085/1@1: -> libc:gettimeofday(0xffffffff7fff8540, 0x0, 0x0, 0x0)
6085/1@1: -> libc:_cancelon(0x0, 0x0, 0x0, 0x0)
6085/1@1: <- libc:_cancelon() = -19
6085/1@1: -> libc:_close(0x0, 0x1, 0x0, 0x0)
6085/1: close(0) Err#9 EBADF
6085/1@1: -> libc:_cerror(0x9, 0x1, 0x0, 0x0)
6085/1@1: -> libc:___errno(0x0, 0x0, 0x0, 0x0)
6085/1@1: <- libc:___errno() = 0xffffffff7b4fc9bc
6085/1@1: <- libc:_close() = -1
6085/1@1: -> libc:_canceloff(0xffffffffffffffff, 0x1, 0x0, 0x0)
6085/1@1: <- libc:_canceloff() = 0
6085/1@1: -> libc:times(0xffffffff7fff9460, 0x108c65000, 0x108c65, 0x108c00)
6085/1: times(0xFFFFFFFF7FFF9460) = 86657683

So we've returned from syslog() and are now somewhere else in the code so nothing to do with the door_calls from my understanding. With that, Oracle development did some further digging and then announced they'd found a new defect in the Oracle code and was logged as Oracle bug 7519558. If Oracle hadn't have come back then I would have probably used the following Dtrace script to confirm where the EBADF was actually being returned from close():

root@hippy-1 # more badclose.d
syscall::close:return
/pid == $target && errno == EBADF/
{
@badf_ustack[execname,ustack(20)] = count();
stop_pids[pid] = 1;
}

syscall::rexit:entry
/stop_pids[pid] != 0/
{
printf("stopping pid %d", pid);
stop();
system("prun %d", pid);
stop_pids[pid] = 0;
}

END
{
printa(@badf_ustack);
}

root@hippy-1 # dtrace -ws badclose.d -c ./p
dtrace: script 'badclose.d' matched 3 probes
dtrace: allowing destructive actions
andy1
CPU ID FUNCTION:NAME
0 28735 rexit:entry stopping pid 27953
dtrace: pid 27953 exited with status 6
1 2 :END
p libc.so.1`_close+0x4
p`poo+0x8
p`main+0x4c
p`_start+0x5c
1

So, another happy ending for all, in that VOSJEC managed to get to the bottom of it using plain old truss rather than Dtrace. It's worth remembering that although Dtrace is a fantastic diagnosis tool, don't forget the other great tools available in Solaris in trying to observe and diagnose problems you're investigating! :)

Monday Jan 19, 2009

Performance considerations when upgrading Solaris

The biggest piece of advice I can give you about those of you about to upgrade with lots of custom tunables in /etc/system........read the manual (FTFM if you're feeling particularly vocal), no seriously, I mean it! :) You only have to read the Solaris tunables reference manual as it actually discusses upgrading to newer releases with older /etc/system tunables:

"We recommend that you start with an empty /etc/system file when moving to a new Solaris
release. As a first step, add only those tunables that are required by in-house or third-party
applications. Any tunables that involve System V IPC (semaphores, shared memory, and
message queues) have been modified in the Solaris 10 release and should be changed in your
environment. For more information, see “System V IPC Configuration” on page 21. After
baseline testing has been established, evaluate system performance to determine if additional
tunable settings are required."


So, that's a move it out of the way and start from scratch. :) Obviously speak to your application vendors about anything that is required to run the application but other than that, see how things go and only change when and where necessary otherwise you could run into other problems.

The only application which I'll make specific points about is Oracle as with Solaris 10 we've introduced resource controls so the shared memory / semaphore settings no longer need to be defined in /etc/system. See the Oracle installation guide  or  Solution  208623 :   Solaris[TM] 10 Operating System: System V Inter-Process Communication (IPC) resource controls for further details.

Friday Jan 16, 2009

Capturing data on Oracle "WARNING: aiowait timed out" events

Oracle "WARNING: aiowait timed out" error messages in the alert log are always something that needs further investigation as it can indicate some kind of I/O or system resource issue that stopping valuable I/O's down to storage. Now there could be a whole bunch of reasons why our asynchronous I/O as got stuck. Oracle reports "WARNING: aiowait timed out" messages after 10 mins of non-response since our I/O was queued by aioread(3) or aiowrite(3). It's also worth being aware of the two method of by which Asynchronous I/O is handled by Solaris. This is either via kernel asynchronous I/O (KAIO) if the underlying filesystem supports it (raw or filesystems with added s/w which give raw access) or the standard userland library implementation. For further background reading then have a look at the Solaris™ Internals: Core Kernel Components book by Jim Mauro; Richard McDougall, chapter 11.4. Asynchronous I/O.

So, to help understand the where in our life cycle we're stuck we'll need to collect some data. This also might help us define a better problem statement, which is obviously vitally important in our rational troubleshooting process. See my previous post about the importance of defining a good problem statement

Enabling the trigger environment

A full crash dump is much more preferred over a live core as things tend to change on the fly whilst the dump is being taken and corrupts some of the structures in the dump so we can get strange results. In this example I'm going to assume that we don't really want to take down the entire box as it'll result in all services being effected other than Oracle. Obviously, try the live method first and if that doesn't yield results then try a full dump. So, here's the steps in enabling a triggered live savecore on an aiowait timed out error message in an Oracle alert log.

1/ You'll need to setup a dedicated dump device to collect a live savecore. You'll either need a spare raw partition or have to create a large file using mkfile, see the man page dumpadm(1M) for details on how to do this.

2/ Download livecore_aio (or livecore_aio.x86) and guds script.

3/ Create collect_data.sh:

root@hippy-1 # more collect_data.sh
#!/bin/sh
/var/tmp/guds_2.8.4 -q -c30 -i1 -n5  -w0 -s<case ref> -X2 -D /var/tmp &
/usr/bin/savecore -L
echo "aiowait coredump event - please guds output and crash dump to Sun" | mailx -s "AIOWAIT TIMED OUT EVENT" root

In my case I'm going to fire off collecting guds to gather some performance stats on the box, but you could add anything else you want to run here including a "reboot -d" to take a full crash dump before rebooting the box.

4/ Change perms on binary and script before copying to /var/tmp

chmod +x livecore_aio collect_data.sh
cp collect_data.sh to /var/tmp

note: livecore_aio expects collect_data.sh to be in /var/tmp for it to work correctly

Test run

Test the program and script (as root):

# touch dummy_file
#./livecore_aio dummy_file &
# echo "WARNING: aiowait timed out" >> dummy_file

This should produce a live savecore dump and kick off guds.

Deploy

\*execute the livecore_aio binary
#./livecore_aio <full path of alert_log> &

When issue happens, upload the live core dump and guds data to Sun for analysis.

Monday Dec 22, 2008

Diagnosing a VxFS filesystem performance issue

I recently got involved in an interesting performance problem in which the customer was experiencing differences in copying a 500mb file to 2 different VxFS filesystems on the same box by up to a factor of 4 in some cases ie "timex cp <500mb file> /vxfs_filesystem". The only thing which had changed (I'm actually smiling at the word \*only\*) between 2 dates were:

\* Various OS Patches
\* Symantec Veritas Volume Manager Patch
\* Upgraded LPFC Driver Servers
\* Upgraded firmware
\* Upgraded Power Path Servers
\* Upgraded SYMCLI

which doesn't help immediately in thinking about probable or possible causes due to far too many things changing! So how does one actually track down what's happening in the lifecycle of the "cp" on a Solaris 9 box...... TNF to the rescue again. Using the TNF wrapper script from one of my previous post, we could enable and disable TNF whilst the cp was happening. So once we'd got that data from the customer and extracted / processed the TNF buffers I first looked at the TNF truss outputs:

[andharr@exdev:~/tnf/tnf/notworking]$ more tnf.truss
looking for pid  12045
Now(ms)          duration(ms)   pid     thread             call           return       errno
48335.67488000   15.83081400    12045    0x30101ab09a0    mmap64         0x7fffffff     0
48351.53089800   254.21403800   12045    0x30101ab09a0    write          0x800000       0
48605.75773800   16.39489300    12045    0x30101ab09a0    mmap64         0x7fffffff     0
48622.17487400   244.79986500   12045    0x30101ab09a0    write          0x800000       0
48866.98866100   16.51242700    12045    0x30101ab09a0    mmap64         0x7fffffff     0
48883.52309100   263.08493800   12045    0x30101ab09a0    write          0x800000       0
49146.62003100   16.96265000    12045    0x30101ab09a0    mmap64         0x7fffffff     0
49163.60188400   262.34315500   12045    0x30101ab09a0    write          0x800000       0
49425.95560000   13.35143400    12045    0x30101ab09a0    mmap64         0x7fffffff     0
49439.32559600   256.47619000   12045    0x30101ab09a0    write          0x800000       0
49695.81298800   13.16556800    12045    0x30101ab09a0    mmap64         0x7fffffff     0
49708.99975900   246.71828900   12045    0x30101ab09a0    write          0x800000       0
49955.73060900   15.93346900    12045    0x30101ab09a0    mmap64         0x7fffffff     0
49971.68440100   254.45911200   12045    0x30101ab09a0    write          0x800000       0
50226.15543500   16.82535100    12045    0x30101ab09a0    mmap64         0x7fffffff     0
50242.98686700   262.94091700   12045    0x30101ab09a0    write          0x800000       0
50505.93946600   15.95571100    12045    0x30101ab09a0    mmap64         0x7fffffff     0
50521.91406000   254.57880900   12045    0x30101ab09a0    write          0x800000       0
50776.50887100   16.54139300    12045    0x30101ab09a0    mmap64         0x7fffffff     0
50793.06922600   263.14118400   12045    0x30101ab09a0    write          0x800000       0
51056.22369200   13.62531200    12045    0x30101ab09a0    mmap64         0x7fffffff     0
51069.86692700   265.74658200   12045    0x30101ab09a0    write          0x800000       0
51335.62423100   14.17114700    12045    0x30101ab09a0    mmap64         0x7fffffff     0
51349.81458000   266.32114200   12045    0x30101ab09a0    write          0x800000       0
51616.14980400   14.28020100    12045    0x30101ab09a0    mmap64         0x7fffffff     0
<snip>

[andharr@exdev:~/tnf/tnf/working]$ more tnf.truss
looking for pid  16545
<snip>
Now(ms)          duration(ms)   pid     thread             call           return       errno
74.48367000   0.09929400     16545    0x30129365ce0    mmap64         0x7fffffff     0
74.60224600   145.95268900   16545    0x30129365ce0    write          0x800000       0
220.56909700   15.92858800    16545    0x30129365ce0    mmap64         0x7fffffff     0
236.51624700   145.57631700   16545    0x30129365ce0    write          0x800000       0
382.11048600   17.13331300    16545    0x30129365ce0    mmap64         0x7fffffff     0
399.25084000   148.40430500   16545    0x30129365ce0    write          0x800000       0
547.67306700   15.89682400    16545    0x30129365ce0    mmap64         0x7fffffff     0
563.59125400   143.89640600   16545    0x30129365ce0    write          0x800000       0
707.50126200   16.40793300    16545    0x30129365ce0    mmap64         0x7fffffff     0
723.91583600   144.97527400   16545    0x30129365ce0    write          0x800000       0
868.90455200   15.84185600    16545    0x30129365ce0    mmap64         0x7fffffff     0
884.76377000   145.10353100   16545    0x30129365ce0    write          0x800000       0
1029.88114300   15.60142300    16545    0x30129365ce0    mmap64         0x7fffffff     0
1045.50632900   145.67129000   16545    0x30129365ce0    write          0x800000       0
1191.19362100   16.41825600    16545    0x30129365ce0    mmap64         0x7fffffff     0
1207.63660000   144.74260200   16545    0x30129365ce0    write          0x800000       0
1352.39696400   17.08674700    16545    0x30129365ce0    mmap64         0x7fffffff     0
1369.49027200   143.29896300   16545    0x30129365ce0    write          0x800000       0
1512.80107700   14.10729700    16545    0x30129365ce0    mmap64         0x7fffffff     0
1526.91309500   144.05898900   16545    0x30129365ce0    write          0x800000       0
1670.98672600   16.63788500    16545    0x30129365ce0    mmap64         0x7fffffff     0
1687.63237200   141.60897100   16545    0x30129365ce0    write          0x800000       0
1829.25798500   15.96867300    16545    0x30129365ce0    mmap64         0x7fffffff     0
1845.23249900   144.46080400   16545    0x30129365ce0    write          0x800000       0
1989.70738500   15.94683000    16545    0x30129365ce0    mmap64         0x7fffffff     0
<snip>

Now, that's interesting, in that we see a difference in the time taken for write(), so what's happening in the write() lifecycle?

<not working>
    48545.230384         0.006560 12045     1 0x30101ab09a0 224 strategy                   device: 0x11100000faf  block: 39690624  size: 0x10000  buf: 0x3004185e058  flags: 558353 0,0
    48545.277271         0.000080 12045     1 0x30101ab09a0 224 strategy                   device: 0x1240000007a  block: 39692928  size: 0x10000  buf: 0x312eec85688  flags: 1289 0,0
    48545.416170         0.000800 12045     1 0x30101ab09a0 224 strategy                   device: 0x11100000faf  block: 39690752  size: 0x10000  buf: 0x327286721b0  flags: 558353 0,0
    48545.454655         0.002640 12045     1 0x30101ab09a0 224 strategy                   device: 0x1240000007a  block: 39693056  size: 0x10000  buf: 0x506b36947c8  flags: 1289 0,0
    48545.544348         0.008562 12045     1 0x30101ab09a0 224 strategy                   device: 0x11100000faf  block: 39690880  size: 0x10000  buf: 0x3004185f458  flags: 558353 0,0
    48545.580753         0.012962 12045     1 0x30101ab09a0 224 strategy                   device: 0x1240000007a  block: 39693184  size: 0x10000  buf: 0x312ead7be08  flags: 1289 0,0
    48545.642601         0.004721 12045     1 0x30101ab09a0 224 strategy                   device: 0x11100000faf  block: 39691008  size: 0x10000  buf: 0x30070b4af30  flags: 558353 0,0
    48545.676366         0.003361 12045     1 0x30101ab09a0 224 strategy                   device: 0x1240000007a  block: 39693312  size: 0x10000  buf: 0x3006ca9bb20  flags: 1289 0,0
    48545.735814         0.003681 12045     1 0x30101ab09a0 224 strategy                   device: 0x11100000faf  block: 39691136  size: 0x10000  buf: 0x327cd4835c8  flags: 558353 0,0
<snip>

So our I/O appears to be in 65k chunks but how long did 1 I/O actually take?

48545.230384  -  12045 1 0x30101ab09a0 224 strategy device: 273,4015  block: 39690624 size: 0x10000 buf: 0x3004185e058 flags: 0x88511
48549.243496  -  0 0 0x2a100477d20 64 biodone device: 273,4015  block: 39690624 buf: 0x3004185e058

4.013112ms

<working>
      236.516247         0.000720 16545     1 0x30129365ce0  65 syscall_start              sysnum: 4 write
      236.972870         0.004481 16545     1 0x30129365ce0  65 address_fault              address: 0xfe800000  fault_type: 2  access: 1
      238.215601         0.068810 16545     1 0x30129365ce0  65 strategy                   device: 0x11100000fa9  block: 24838144  size: 0x100000  buf: 0x300359fa3a8  flags: 289 0,0
      238.247765         0.002240 16545     1 0x30129365ce0  65 strategy                   device: 0x12400000022  block: 24840448  size: 0x100000  buf: 0x30038cd24a0  flags: 297 0,0
      238.461314         0.002240 16545     1 0x30129365ce0  65 thread_block               reason: 0x300359fa468  stack:  4 0x114190801048bb0 0x10f28ec01080b30 0x78a4d834789aa770 0x78
a7dd6078a798b0
<snip>

238.215601  -  16545 1 0x30129365ce0 65 strategy device: 273,4009  block: 24838144 size: 0x100000 buf: 0x300359fa3a8 flags: 0x121
254.402304  -  0 0 0x2a100471d20 64 biodone device: 273,4009  block: 24838144 buf: 0x300359fa3a8

16.186703ms

So time for 8m in 1m chunks:

16.18 \* 8 = 129.493624

and 8m in 65k chunks:

4.013112 \* 128 = 513.678336 (it's going to be less anyway due to fact that we can queue iops and be completed in parallel)

So it would seem that the difference in time can be explained by the way that the I/O is broken up on each filesystem. Perhaps the way in which the filesystem has been configured can explain the difference? Unforuntately, I didn't see differences between the filesystems themselves but some of the configuration details did match some of our physical write sizes:

Filesystem i/o parameters for /filesystem
read_pref_io = 65536
read_nstream = 1
read_unit_io = 65536
write_pref_io = 65536
write_nstream = 1
write_unit_io = 65536
pref_strength = 10
buf_breakup_size = 1048576
discovered_direct_iosz = 262144
max_direct_iosz = 1048576
default_indir_size = 8192
qio_cache_enable = 0
write_throttle = 0
max_diskq = 1048576
initial_extent_size = 1
max_seqio_extent_size = 2048
max_buf_data_size = 8192
hsm_write_prealloc = 0
read_ahead = 1
inode_aging_size = 0
inode_aging_count = 0
fcl_maxalloc = 8130396160
fcl_keeptime = 0
fcl_winterval = 3600
oltp_load = 0

Now our write pattern ties in for our non-working cp in that our prefered write size is 65536 (as above) but for the working one we're using 1048576 which could be due to discovered direct I/O. Now write() is trying to complete a size of 8388608 (8mb) which we can see from the truss:

29986:   0.0146 mmap64(0xFE800000, 8388608, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 0x07000000) = 0xFE800000
29986:   0.2394 write(4, "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0".., 8388608) = 8388608
29986:   0.0107 mmap64(0xFE800000, 8388608, PROT_READ, MAP_SHARED|MAP_FIXED, 3, 0x07800000) = 0xFE800000
29986:   0.2599 write(4, "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0".., 8388608) = 8388608

so why is our I/O being broken into smaller chunks for one filesystem and not on the other? Ah, what if one of the filesystems was fragmented more than the other? Yes, that could explain the difference if VxFS internally make the choice of using one other the other depending on the state of the filesystem. It's interesting reading the VxFS administration guide as it clearly documents the possible performance implications due to fragmentation (yes, RTFM is applicable here):

Fragmentation reduces performance and availability. Regular use of fsadm’s fragmentation reporting and reorganization facilities is therefore advisable. The easiest way to ensure that fragmentation does not become a problem is to schedule regular defragmentation runs using the cron command. Defragmentation scheduling should range from weekly (for frequently used file systems) to monthly (for infrequently used file systems). Extent fragmentation should be monitored with fsadm or the df  o s commands. There are three factors which can be used to determine the degree of fragmentation:

    \* Percentage of free space in extents of less than eight blocks in length
    \* Percentage of free space in extents of less than 64 blocks in length
    \* Percentage of free space in extents of length 64 blocks or greater

An unfragmented file system will have the following characteristics:

    \* Less than 1 percent of free space in extents of less than eight blocks in length
    \* Less than 5 percent of free space in extents of less than 64 blocks in length
    \* More than 5 percent of the total file system size available as free extents in lengths of 64 or more blocks

A badly fragmented file system will have one or more of the following characteristics:

    \* Greater than 5 percent of free space in extents of less than 8 blocks in length
    \* More than 50 percent of free space in extents of less than 64 blocks in length
    \* Less than 5 percent of the total file system size available as free extents in lengths of 64 or more blocks

The optimal period for scheduling of extent reorganization runs can be determined by choosing a reasonable interval, scheduling fsadm runs at the initial interval, and running the extent fragmentation report feature of fsadm before and after the reorganization. The “before” result is the degree of fragmentation prior to the reorganization. If the degree of fragmentation is approaching the figures for bad fragmentation, reduce the interval between fsadm runs. If the degree of fragmentation is low, increase the interval between fsadm runs. The “after” result is an indication of how well the reorganizer has performed. The degree of fragmentation should be close to the characteristics of an unfragmented file system. If not, it may be a good idea to resize the file system; full file systems tend to fragment and are difficult to defragment. It is also possible that the reorganization is not being performed at a time during which the file system in question is relatively idle.

---

Conclusion


As in all performance related issues, you really need to understand the lifecycle of what you're interested it so you can ideally observe what's happening or not happening! In this case I used mainly truss and TNF to drill down to see whats happening. However what should happen way before that is to actually understand what the impact is otherwise who actually cares if there is a difference if it's not causing a problem. In this case it was the DBA's who noticed the difference but in all honesty it wasn't actually causing a business impact! :) Typical! Still, it's been another learning curve in using TNF in the odd case where Dtrace (and in this case scsi.d) might have also helped if the system was running Solaris 10.

If you are using VxFS then you might want to look at this script to automatically check if a filesystem is fragmented or not. If it is then it'll automatically kick off a defrag........ aren't automatic scripts great! :)

Saturday Oct 25, 2008

Oracle parallel query performance on a T5140

Being an engineer it's always a good feeling getting to the bottom of a problem and none so as this one. Take a T5140, create a 3 way RAID 0 LUN using the internal disks and stick Oracle on top so you can do something useful with it for your application...........and what do you get......... a problem. I suspect after that opening some of you are thinking "where's he going with this?" .... the answer nowhere, I'm not picking holes in either the T5140 or Oracle.....good, I'm glad we got that one clear! :)

Anyhow, so a customer came to us complaining that their application wasn't running as expected on this platform and really wanted to know if there was a hardware fault /  bug with platform or operating system running on it. From the tests that the customer had been doing themselves they believed that the bottleneck was the underlying I/O subsystem and in this case the LSI H/W RAID. Essentially,  the customer had configured a 3 disk RAID 0 stripe using the default 64k stripe width like thus:

bash-3.00# raidctl -l c1t1d0
Volume                  Size    Stripe  Status   Cache  RAID
       Sub                     Size                    Level
               Disk
----------------------------------------------------------------
c1t1d0                  409.9G  64K     OPTIMAL  OFF    RAID0
               0.1.0   136.6G          GOOD
               0.2.0   136.6G          GOOD
               0.3.0   136.6G          GOOD

They had then created a single slice for which Oracle was installed and configured for Direct I/O (which is a good thing anyway if you've a UFS filesystem) so we were avoiding the filesystem buffer cache and double buffering. So for a 64k stripe per disk and three disks gives us a total stripe width of 192k. The throughput performance of each of this disks is between 50-60mb per second which means we have a theoretical throughput on all stripes of 150->180mb per second for reads. We can forget writes as Oracle is pwrite()'ing in 8k synchronous chunks to a non-write enabled volume and only hits one disk (because 8K is less than the 64k stripe size) and hence why we saw a 1Gb tablespace creation take 18 seconds and an average through put of 56mb per second which is what we would have expected for a single disk.

SQL> set timing on
SQL> create tablespace andy3
 2  datafile '/u01/oracle/oradata/SUN/andy03.dbf'
 3  size 1g;

Tablespace created.

Elapsed: 00:00:18.12

and iostat -xnz 1 shows us

                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  13.0  128.0  104.0 56369.0  0.0  1.6    0.0   11.2   1  93 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  14.0   78.0  112.0 56281.3  0.0  1.2    0.1   13.5   0  93 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  13.0   93.0  112.0 53734.0  0.0  1.4    0.0   13.4   1  93 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  13.0   95.0  104.0 58397.6  0.0  1.4    0.0   12.7   1  92 c1t1d0

This result was the same as the customers but things then get interesting when we start looking at full table scan parallel queries. The customer ended up with these results:

 Parallelism
Time
Throughput (Mb per second)
db_file_multiblock_read_count
 0 719
13.8
16 (16 x 8k = 128k)
 2 195
50.9
16
 4 208
47.7
16
 0 420
23.6
32 (32 x 8k = 256k)
 2 163
60.9
32
 4 208
53.9
32

Now, they look bad especially if you think that theoretically we should be able to achieve 150mb ->180mb based on a three disk stripe (3 x 60mb).

Using the same parallel test plan as the customer:

oracle@v4v-t5140a-gmp03~$more para.sql

set timing on;

select /\*+ FULL(t) \*/ count(\*) from contact_methods t;

select /\*+ FULL(t) PARALLEL(t,2) \*/ count(\*) from contact_methods t;

select /\*+ FULL(t) PARALLEL(t,4) \*/ count(\*) from contact_methods t;

select /\*+ FULL(t) PARALLEL(t,64) \*/ count(\*) from contact_methods t;

oracle@v4v-t5140a-gmp03~/oradata/SUN$ls -alh test01.dbf
-rw-r-----   1 oracle   dba         9.7G Oct 24 08:25 test01.dbf

I got these:

SQL> @para

 COUNT(\*)
----------
 15700000

Elapsed: 00:00:47.85

 COUNT(\*)
----------
 15700000

Elapsed: 00:00:32.53

 COUNT(\*)
----------
 15700000

Elapsed: 00:00:34.68

 COUNT(\*)
----------
 15700000

Elapsed: 00:00:42.17

whilst the first full table scan is running I see the following in iostat -xnz 1:

                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 108.0    0.0 93122.4    0.0  0.0  0.4    0.1    4.0   1  35 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 151.0    3.0 95796.9   48.0  0.0  0.5    0.1    3.0   1  34 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 115.0    0.0 103367.6    0.0  0.0  0.3    0.1    2.6   1  28 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 116.0    0.0 102232.7    0.0  0.0  0.3    0.1    3.0   1  29 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 122.0    3.0 105326.4   48.0  0.0  0.3    0.1    2.5   1  29 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 116.0    0.0 96467.2    0.0  0.0  0.5    0.1    4.1   1  34 c1t1d0

and then

                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 193.0    0.0 159383.3    0.0  0.0  8.0    0.1   41.4   1 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 195.5    3.0 163681.0   48.1  0.0  8.1    0.1   40.8   1 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 220.1    0.0 188770.3    0.0  0.0  7.7    0.1   34.8   3 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 192.0    0.0 168156.9    0.0  0.0  7.2    0.1   37.8   1 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 191.0    3.0 162361.2   48.0  0.0  7.4    0.1   38.1   1 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 190.0    0.0 162776.0    0.0  0.0  7.3    0.1   38.7   1 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 192.0    0.0 162737.6    0.0  0.0  6.9    0.1   35.9   1 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 186.0    3.0 153754.2   48.0  0.0  8.4    0.1   44.4   1 100 c1t1d0
                   extended device statistics                r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 191.0    1.0 160412.4    8.0  0.0  7.7    0.1   40.1   1 100 c1t1d0

when the parallel jobs are running.

This is because I changed the db_file_multiblock_read_count to 128 (128 \* 8k = 1M) and in fact I saw improvements using 192k (64k\*3) to match the stripe width. I also went with some recommendations from here which also helped along with running the latest T5140 firmware and latest KJP 137111-08 to avoid some known performance issues.

It's amazing that tuning one little option can have such a dramatic effect on the results and also shows that just because you don't get the results you're expecting not to assume that there is a fault with the hardware or software. For me personally, it's always good to understand the results and how you got there, although as in all performance related issues you can sometimes get sidetracked from what's gone on before and end going down the wrong path. To avoid that,  make sure you chat with you're colleagues when you feel like you're not going anywhere as a fresh set of eyes can bring you back on the path and closer to resolution.




Wednesday Oct 01, 2008

Capturing Snoop Data On An Oracle Event

Following from my previous article about capturing TNF data on an Oracle event, I thought I'd also mention that you can do something similar with capturing snoop data.  This really came about because of trying to troubleshooting ORA-29740 issues with Oracle RAC. Most of the time you really want to have a look at these Sunsolve documents for giving hints:

  • Solution 230303 : SunCluster[TM] 3.x: Troubleshooting ORA-29740 in a RAC environment
  • Solution 205145 : Oracle error ORA-29740 (reason 3) resolved with network tuning

But, if you think that network is perhaps something worth chasing and the actual physical switches & hubs etc don't hint at a fault then we could enable the lantracer script to snoop on an interconnect and terminate when Oracle receives an ORA-29740. To do this you'll need to do the following:

1) in $ORACLE_HOME/bin, place a script called oradbg containing:

#!/bin/sh
echo "stopping Lantracer" >> /tmp/oradbg.$$
/usr/bin/rm /usr/tmp/lan/running
echo `/bin/date` > /tmp/oradbg.$$
echo "done\\." >> /tmp/oradbg.$$

2) in an sqlplus "/ as sysdba" session run the following:

alter system set "_oradbg_pathname"="<$ORACLE_HOME>/bin/oradbg" scope=spfile;
alter system set event="29740 debug" scope=spfile;

where $ORACLE_HOME is the expanded path of the $ORACLE_HOME env var

3) add execute to the oradbg script

4) restart the instance

5/ create /var/tmp/lan and copy lantracer script to it. chmod 755 lantracer

6/ start lantracer

cd /var/tmp/lan; ./lantracer >/dev/null 2>&1 & 

7/ check oracle can stop lantracer by using it to detect a 942 event:

 sqlplus <user>/<password>
 alter session set events '942 debug';
 select \* from <a table that doesn't exist>;

This should remove /var/tmp/lan/running and leave us with some snoop scripts which should hopefully have captured data during the event.

Capturing TNF Data On An Oracle Event

Recently, I got involved in an interesting VOSJEC problem in which Oracle was core dumping at intermittent times with ORA-7445 on a large Solaris 8 system with no known root cause. Unfortunately the core dump from Oracle or the trace files helped either Oracle or us to working out the issue so a colleague of mine Tim Uglow managed to create a TNF package to collect TNF data constantly and then stop/dump the TNF kernel buffer out to disk at the time that the Oracle event happened. That way we'd be able to see what led up to the event.

To allow oracle processes running as user "oracle" to stop the tnf kernel tracing we need to change the permissions on the tnf driver. Currently you have ..

# ls -alL /dev/tnf\*
crw-------   1 root     sys      114,  0 Apr 10 22:02 /dev/tnfctl
crw-------   1 root     sys      114,  1 Apr 10 22:02 /dev/tnfmap

and we need to change it to ...

# ls -alL /dev/tnf\*
crw-rw-rw-   1 root     sys      114,  0 Jun 27 15:57 /dev/tnfctl
crw-rw-rw-   1 root     sys      114,  1 Jun 27 15:57 /dev/tnfmap

This is done by the following two commands...

# rem_drv tnf
# add_drv -m '\* 0666 root sys' tnf

the install script for JAVAtnfd.datastream.pkg package will do this for you and the uninstall script will put it back to its original state. This change would allow ordinary users to affect the kernel tracing but most users don't know about the subsystem and the only information they can gather is kernel events, there is no actual data visible.

Installing the scripts


1) copy JAVAtnfd.datastream.pkg to /tmp on your machine
2) pkgadd -d  /tmp/JAVAtnfd.datastream.pkg

You may get warning messages like ..

"Device busy
Cannot unload module: tnf
Will be unloaded upon reboot."

These can be ignored.. devfsadm will have changed the permissions.

To start the tnf tracing..

as root please run ...

# cd /opt/JAVAtnfd/data 
# /usr/bin/nohup /opt/JAVAtnfd/bin/tnfscript &

then check in /opt/JAVAtnfd/data/nohup.out for errors, every time you start it it will append to nohup.out.

If you needed to stop the tracing ..

looking in /opt/JAVAtnfd/data/nohup.out for the last line like "tnfscript pid XXXX starting" and then run ...

# kill XXXX

should stop it.

When the oracle debug executes it will touch /opt/JAVAtnfd/data/stop and stop the tnf tracing and that will cause the tnfscript to exit at the end of its 10 minute loop - look at the end of /opt/JAVAtnfd/data/nohup.out for "tnfscript pid XXXX exiting as requested by Oracle". Before it exits it will extract the required files into /opt/JAVAtnfd/data/\*.'date' and email root the location of the four files that need to be sent to support together with the Oracle alert log and the trace file that will be mentioned in the alert.log file, something like....

Errors in file <path to trace file>
ORA-7445...
<path to trace file> is also required.

Instructions from Oracle to get Oracle processes to stop the tnf on a ORA-7445 error.


1) in init.ora define the following 2 parameters:

  event="7445 debug"
  _oradbg_pathname=<$ORACLE_HOME>/bin/oradbg

or you can set via and sqlplus session:

SQL> alter system set "_oradbg_pathname"="/u01/oracle/product/10.2.0/db/bin/oradbg" scope=both;
System altered.
SQL> alter system set event="7445 debug" scope=spfile;
System altered.

where $ORACLE_HOME is the expanded path of the $ORACLE_HOME environment variable

2) in $ORACLE_HOME/bin, place a script called oradbg containing:

  #!/bin/sh
  echo "stopping TNF collection using prex" >> /tmp/oradbg.$$
  echo "ktrace off" | /usr/bin/prex -k
  echo $$ > /opt/JAVAtnfd/data/stop
  echo `/bin/date` > /tmp/oradbg.$$
  echo "done\\." >> /tmp/oradbg.$$

3) add execute to the oradbg script

4) restart the instance

Upon a 7445 error being encountered (whereby oracle's signal handler spots a sigsegv/core dump) we will disable the prex via the built-in oracle 'debug' event and a file will be written to /tmp containing the date and time and a note that we stopped prex. It willalso touch the file /opt/JAVAtnfd/data/stop which will signal the Sun tnf script to stop. Obviously if the customer has any path differences for any of the bins they should modify accordingly. This can easily be tested after restarting the instance by running:

sqlplus <user>/<password>
 alter session set events '942 debug';
 select \* from <a table that doesn't exist>;

This will generate an error along the lines: <

ERROR at line 1:
ORA-00942: table or view does not exist

and the debug event will fire on the temporary session-level dummy version of the event and will run the prex command. They can then check /tmp for a new file with the details called oradbg. Obviously you can trace any other Oracle event by just changing the events debug in sqlplus which might make problem resolution potentially a little easier on Solaris prior to Solaris 10 and the world of Dtrace! :)

Monday Sep 22, 2008

Jumpstarting problems

I ran into this problem a while back whilst trying to jumpstart one of my VOSJEC lab systems which caused a little bit of grief for a while:

Sun Fire V210, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.33, 2048 MB memory installed, Serial #73138818.
Ethernet address 0:14:4f:5c:2:82, Host ID: 845c0282.



ChassisSerialNumber UNDEFINED
Rebooting with command: boot net -sv                                  
Boot device: /pci@1f,700000/network@2  File and args: -sv
100 Mbps FDX Link up
Requesting Internet Address for 0:14:4f:5c:2:82
Requesting Internet Address for 0:14:4f:5c:2:82
100 Mbps FDX Link up
boot: cannot open kernel/sparcv9/unix
Enter filename [kernel/sparcv9/unix]:
boot: cannot open kernel/sparcv9/unix
Enter filename [kernel/sparcv9/unix]:
boot: cannot open kernel/sparcv9/unix


[andharr@e2big:~]$ snoop -i snoop_v210.out
  1   0.00000 OLD-BROADCAST -> (broadcast)  RARP C Who is 0:14:4f:5c:2:82 ?
  2   0.01679        e2big -> vos-v210-n0-gmp03 RARP R 0:14:4f:5c:2:82 is 129.156.214.198, vos-v210-n0-gmp03
  3   0.00025 vos-v210-n0-gmp03 -> BROADCAST    TFTP Read "819CD6C6" (octet)
  4   0.00169 vos-v210-n0-gmp03 -> (broadcast)  ARP C Who is 129.156.214.22, vos-x4100-gmp03 ?
  5   0.00479 vos-v210-n0-gmp03 -> (broadcast)  ARP C Who is 129.156.215.14, vos-jet-gmp03 ?
  6   0.00297        e2big -> vos-v210-n0-gmp03 TFTP Data block 1 (512 bytes)
  7   0.00182 vos-v210-n0-gmp03 -> e2big        TFTP Error: unknown transfer ID

bummer, so what if we do it again, accept this time specify the boot server:

Sun Fire V210, No Keyboard
Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.22.33, 2048 MB memory installed, Serial #73138818.
Ethernet address 0:14:4f:5c:2:82, Host ID: 845c0282.



ChassisSerialNumber UNDEFINED
Rebooting with command: boot net:129.156.210.175 -sv                  
Boot device: /pci@1f,700000/network@2:129.156.210.175  File and args: -sv
100 Mbps FDX Link up
Requesting Internet Address for 0:14:4f:5c:2:82
Requesting Internet Address for 0:14:4f:5c:2:82
100 Mbps FDX Link up
module /platform/sun4u/kernel/sparcv9/unix: text at [0x1000000, 0x10c8b2d] data at 0x1800000
module /platform/sun4u/kernel/sparcv9/genunix: text at [0x10c8b30, 0x12b9b17] data at 0x1867440
module /platform/SUNW,Sun-Fire-V210/kernel/misc/sparcv9/platmod: text at [0x12b9b18, 0x12ba84f] data at 0x18bbc60
module /platform/sun4u/kernel/cpu/sparcv9/SUNW,UltraSPARC-IIIi: text at [0x12ba880, 0x12cd2d7] data at 0x18bc340
SunOS Release 5.11 Version snv_90 64-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Ethernet address = 0:14:4f:5c:2:82
<snip>

So some other tftp server is responding and even though I've disabled the tftp & bootparams on the other host and it does look like the correct server, I'm still picking up that "unknow transfer ID" error as part of the tftpboot. Mmm, something is still responding.... :( so I'll have to workaround the issue by specifying the jumpstart server using "boot net:129.156.210.175".

Tuesday Apr 29, 2008

What's the answer to life the universe and everything?

"42"

For those of you that had read or listened to the Hitchhikers Guide to the Galaxy the above question and answer will have more meaning to you than those of you that haven't. Essentially, how can you have a literal answer to such an undefined question which suggests on an allegorical level that it is more important to ask the right questions than to seek definite answers.

I sometimes think of just saying "42" to the question of "What's the answer to our performance problem?" which is usually supplied with some kind of data either in the form of GUDS (a script which collects a whole bunch of Solaris OS output) or some other spreadsheet or application output. This data usually has no context or supplied with anything other than "the customer has a performance problem" which of course makes things slightly difficult for us to answer unless the customer will accept "42".

So investigating performance related issues is usually very time consuming due to difficulty in defining a problem. So it would seem to reason that it's probably a good idea to approach these type of problems in a structured method. Sun has been using an effective troubleshooting process by Kepner Trego for a number of years of which defines a problem as follows:

"Something has deviated from the normal (what you should expect) for which you don't know the reason and would like to know the reason"

Still don't get it? Well, what if you're driving, walking, running, hopping (you get my point) etc from point A to B and have somehow ended up at X21 and you don't know why you've ended up, you'd probably want to know why and thus you'd have a problem because you'd be expecting to end up at point B but have ended up at point X21.

Ok, so how does this related to resolving performance issues then? Well, in order for Sun engineers to progress performance related issues within the services organization we need to understand a problem, the concerns around it and how that fits into the bigger picture. By this I mean looking at an entire application infrastructure (top down approach) rather than examining specific system or application statistics (bottom up approach). This can then help us identify a possible bottleneck or specific area of interest to which we can use any number of OS or application tools to focus in on and identify root cause.

So perhaps we should start by informing people what performance engineers CAN do:

1/ We can make "observations" from static collected data or via an interactive window into customer's system (Shared Shell). Yes, that doesn't mean we can provide root cause from this but comment on what we see. Observations mean NOTHING without context.

2/ We can make suggestions based on above information which might progress to further data collection but again mean NOTHING without context.

Wow, that's not much is it....so what CAN'T we do?

1/ We can't mind read - Sorry, we can't possibly understand you're concerns, application, business, users without providing USEFUL information. So would is useful information? Well answers to these might help get the ball rolling:

\* What tells you that you have a performance issue on your system? i.e Users complaining that the "XYZ" application is taking longer than expected to return data/report, batch job taking longer to complete, etc.

\* When did this issue start happening? This should be the exact date & time the problem started or was first noticed.

\* When have you noticed the issue since? Again the exact date(s) and time(s).

\* How long should you expect the job/application to take to run/complete. This needs to be based on previous data runs or when the system was specified.

\* What other systems also run the job/application but aren't effected?

\* Supply an architecture diagram if applicable, describing how the application interfaces into the system. i.e

user -> application X on client -webquery-> application server -sqlquery-> Oracle database backend server

2/ We can't rub a bottle and get the answer from a genie nor wave a magic wand for the answer - Yes, again it's not just as simple as supplying a couple of OS outputs and getting an answer from us. We'll need to understand the "bigger" picture or make observations before suggestions can be advised.

3/ We can't fix the problem in a split second nor can applying pressure help speed up the process - Again we need to UNDERSTAND the bigger picture before suggestions and action plans can be advised.

So what kind of data stuff can we collect to observe?

Probably one of the quickest ways of allowing us to observe is via Shared Shell. This allows us a direct view onto a system and allows us to see what the customer actually see's. Again, we'll need to discuss with the customer what we're looking at and UNDERSTAND the "bigger" picture to make suggestions or action plans moving forward. If shared shell isn't available then we'll need to collect GUDS data usually in the form of the extended mode. This collects various Solaris outputs in various time snapshots which we can view offline, however we do need baseline data along with bad data to make any useful observations. Yes, one snapshot isn't much help as high values could be normal! Yes, just because you see high user land utilization it doesn't necessarily mean its bad or shows a performance problem. It could just be the system being utilized well processing those "funny" accounting beans for the business. Again and I've said this a few times.....data is USELESS without CONTEXT.

If Oracle is involved then you could get the Oracle DBA to provide statspack data or AWR reports for when you see the problem and when you don't as that might give an indication of Oracle being a bottleneck in the application environment.

Other application vendors might have similar statistic generating reports which show what they are waiting for which might help identify a potential bottleneck.

The "Grey" area

The grey area is a term used by many as an issue which breaks the mold of conventional break fix issues and starts entering the performance tuning arena. Break fix is usually an indication that something is clearly broken such as a customer experiencing a bug in Solaris or helping a customer bring a system up which as crashed or needs to be rebuilt and requires Sun's assistance and expertise to resolve. Performance tuning usually happens because a customer's business has expanded and their application architecture can't cope with the growth for example. It's a little difficult to gauge when a situation starts to go down that path when most application architectures are very complex and involve lots of vendors. I also happen to work in the VOSJEC (Veritas Oracle Sun Joint Escalation Centre) and deal with quite a few interoperability issues so know things can get pretty complex with trying to find the problematic area of interest. For some reason some people term this as the blame game or finger pointing which I personally hate to use. In fact I'd rather it be a Sun issue from my perspective as we get then take the necessary action in raising bugs and getting engineering involved to provide a fix and ultimately resolve the customer's issue. Thankfully my Symantec and Oracle counterparts also take this approach which makes things a little easier in problem resolution.

Conclusion

I think real point of this is that you should really grasp a problem before asking for assistance, as if you understand the problem, then you're colleagues understand the problem and more importantly we (Sun) or I understand the problem and that's half the battle. The rest is so much easier...... :)

About

I'm an RPE (Revenue Product Engineering) Engineer supporting Solaris on Exadata, Exalogic and Super Cluster. I attempt to diagnose and resolve any problems with Solaris on any of the Engineered Systems.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today