Tuesday May 26, 2009

Why everyone should be using ZFS

It is at times like these that I'm glad I use ZFS at home.


  pool: tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        tank           ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c20t0d0s7  ONLINE       6     0     4
            c21t0d0s7  ONLINE       0     0     0
          mirror       ONLINE       0     0     0
            c21t1d0    ONLINE       0     0     0
            c20t1d0    ONLINE       0     0     0

errors: No known data errors
: pearson FSS 14 $; 

The drive with the errors was also throwing up errors that iostat could report and from it's performance was trying heroicially to give me back data. However it had failed. It's performance was terrible and then it failed to give the right data on 4 occasions. Anyother file system would, if that was user data, just had deliviered it to the user without warning. That bad data could then propergate from there on, probably into my backups. There is certainly no good that could come from that. However ZFS detected and corrected the errors.


Now I have offlined the disk the performance of the system is better but I have no redundancy until the new disk I have just ordered arriaves. Now time to check out Seagate's warranty return system.

Tuesday Feb 05, 2008

Good Morning Build 81, or not.

I did not even get a chance to login to the Sun Ray server running build 82 before it had crashed twice. So all was not well. A bit of digging and it was looking like a problem somewhere in portfs with kmem corruption. Since the problem was easily reproducible (boot system login and use for a few hours) I got the lab staff to set kmem_flags to 0xf in /etc/system and boot again.

Sure enough this morning there were two more crash dumps with variations of this in the message buffers:

kernel memory allocator: 
duplicate free: buffer freed twice
buffer=60063bfed60  bufctl=300f08886b8  cache: kmem_alloc_32
previous transaction on buffer 60063bfed60:
thread=300f43dac60  time=T-0.000269600  slab=300f08761e0  cache: kmem_alloc_32
kmem_cache_free+30
port_pcache_remove_fop+44
port_pfp_setup+198
port_associate_fop+2b8
portfs+2c8

panic[cpu512]/thread=300f43dac60: 
kernel heap corruption detected

> $c
vpanic(12ac480, 5, 2c8, 1, 18de000, 12ac400)
kmem_error+0x4e8(18de000, 3000005ae08, 60063bfed60, 12ac400, 12ac478, 
2afdfbc8220)
port_associate_fop+0x408(16, 7, 4a330, 16, 4a330, 2a10424d968)
portfs+0x2c8(1, 0, 7, 2a0, 0, 4a330)
syscall_trap32+0xcc(1, a, 7, 4a330, 10000006, 4a330)
> 

Looking at the code it appears that if port_pfp_setup encounters an error it frees the some kernel memory twice. Specifically it frees the memory pointed to by the cname local variable in port_associate_fop twice. Hence the random panics. The diffs for the fix are:


\*\*\* port_fop.c  Fri Oct 26 08:58:01 2007
--- /tmp/cg13442/port_fop.c     Tue Feb  5 14:04:21 2008
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
\*\*\* 1306,1311 \*\*\*\*
--- 1306,1312 ----
                if (error = port_pfp_setup(&pfp, pp, vp, pfcp, object,
                    events, user, cname, clen, dvp)) {
                        mutex_exit(&pfcp->pfc_lock);
+                       cname = NULL;
                        goto errout;
                }

I have just files this bug:

6659309: port_associate_fop frees a buffer twice if port_pfp_setup returns an error.

What I don't know is why we suddenly started seeing the bug. Is it that build 82 exercise event ports more or that the bug has been revealed by some other change? Either way it make me nervous for my home server running, you guessed it, build 82! At least next time someone asks why we bother running a Sun Ray server on the latest greatest nevada bits I have a preprepared place to send them. It is here.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2015
MonTueWedThuFriSatSun
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today