Solaris ufs bug in Month of Kernel Bugs
By Alan Hargreaves-Oracle on Nov 12, 2006
While I agree that we have an issue that needs looking at, I also believe that the contributor is making much more of it than it really deserves.
First off, to paraphrase the issue:
If I give you a specially massaged filesystem and can convince someone with the appropriate privilege to mount it, it will crash the system.
I'd hardly call this a "denial of service", let alone exploitable.
First off, in order to perform a mount operation of a ufs filesystem, you need sys_mount privilege. In Solaris, we currently are runing under the concept of "least privilege". That is, a process is given the least amount of privilege that it needs to run. So, in order to exploit this you need to convince someone with the appropriate level of privilege to mount your filesystem. This would also invlove a bit of social engineering which went unmentioned.
That being said, they system should not panic off this filesystem and I will log a bug to this effect. It is a shame that the contributor did not make the crashdump files available as it would certainly speed up any analysis.
One other thing that I should add is that anyone who tries to mount an unknown ufs filesystem without at least running "fsck -n" over it probably deserves what they get.
OK, I have copied it to a relatively current nevada system and mounted it as /dev/lofi/1. On running "fsck -n" we see:
\*\* /dev/rlofi/1 (NO WRITE) BAD SUPERBLOCK AT BLOCK 16: BAD VALUES IN SUPER BLOCK LOOK FOR ALTERNATE SUPERBLOCKS WITH MKFS? no LOOK FOR ALTERNATE SUPERBLOCKS WITH NEWFS? no SEARCH FOR ALTERNATE SUPERBLOCKS FAILED. USE GENERIC SUPERBLOCK FROM MKFS? no USE GENERIC SUPERBLOCK FROM NEWFS? no SEARCH FOR ALTERNATE SUPERBLOCKS FAILED. YOU MUST USE THE -o b OPTION TO FSCK TO SPECIFY THE LOCATION OF A VALID ALTERNATE SUPERBLOCK TO SUPPLY NEEDED INFORMATION; SEE fsck(1M).
In the normal course of events, would you mount this filesystem. I certainly would not. This however is not the normal course of events and I'm playing on a lab system.
v40z-c# uname -a SunOS v40z-c 5.11 snv_46 i86pc i386 i86pc
Let's try a read only mount first.
v40z-c# mount -r /dev/lofi/1 /mnt v40z-c# ls /mnt lost+found v40z-c# umount /mnt
OK, the read only mount is fine. Now the read/write, ... Bingo
v40z-c# mount /dev/lofi/1 /mnt panic[cpu3]/thread=ffffffff9a6974e0: BAD TRAP: type=e (#pf Page fault) rp=fffffe8000c7f2c0 addr=fffffe80fe39d6c4 mount: #pf Page fault Bad kernel fault at addr=0xfffffe80fe39d6c4 pid=2170, pc=0xfffffffffbb70950, sp=0xfffffe8000c7f3b0, eflags=0x10286 cr0: 8005003b
cr4: 6f8 cr2: fffffe80fe39d6c4 cr3: 1ff76b000 cr8: c ... fffffe8000c7f1b0 unix:die+b1 () fffffe8000c7f2b0 unix:trap+1528 () fffffe8000c7f2c0 unix:_cmntrap+140 () fffffe8000c7f440 ufs:alloccgblk+42f () fffffe8000c7f4e0 ufs:alloccg+473 () fffffe8000c7f560 ufs:hashalloc+50 () fffffe8000c7f600 ufs:alloc+14f () fffffe8000c7f6c0 ufs:lufs_alloc+f3 () fffffe8000c7f770 ufs:lufs_enable+261 () fffffe8000c7f7e0 ufs:ufs_fiologenable+63 () fffffe8000c7fd60 ufs:ufs_ioctl+3e0 () fffffe8000c7fdc0 genunix:fop_ioctl+3b () fffffe8000c7fec0 genunix:ioctl+180 () fffffe8000c7ff10 unix:sys_syscall32+101 ()
OK, so we should now have a crashdump to look at.
While the machine is rebooting, it occurs to me that if we put this ufs onto an external USB device, we might actually have an exploitable issue here, once the new hal/rmvolmgr framework is in place (nv_51) if we try to automatically mount ufs devices.
core file: /var/crash/v40z-c/vmcore.0 release: 5.11 (64-bit) version: snv_46 machine: i86pc node name: v40z-c domain: aus.cte.sun.com system type: i86pc hostid: 69e47dae dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/dsk/c1t1d0s1(517M) time of crash: Sun Nov 12 13:08:18 EST 2006 age of system: 34 days 1 hours 42 minutes 34.95 seconds panic CPU: 3 (4 CPUs, 7.56G memory) panic string: BAD TRAP: type=e (#pf Page fault) rp=fffffe8000c7f2c0 addr=fffffe80fe39d6c4 sanity checks: settings...vmem...sysent...clock...misc...done -- panic trap data type: 0xe (Page fault) addr: 0xfffffe80fe39d6c4 rp: 0xfffffe8000c7f2c0 savfp 0xfffffe8000c7f440 savpc 0xfffffffffbb70950 %rbp 0xfffffe8000c7f440 %rsp 0xfffffe8000c7f3b0 %rip 0xfffffffffbb70950 (ufs:alloccgblk+0x42f) 0%rdi 0xffffffff8d60b000 1%rsi 0xffffffff8930c308 2%rdx 0xb5 3%rcx 0xb5 4%r8 0xfffffe80fe39d6c0 5%r9 0x12f0 %rax 0x8 %rbx 0x361005a8 %r10 0 %r11 0xfffffffffbcd9ff0 %r12 0x5a8 %r13 0xffffffff8930c000 %r14 0xffffffff8d60b000 %r15 0xffffffff99656c00 %cs 0x28 (KCS_SEL) %ds 0x43 (UDS_SEL) %es 0x43 (UDS_SEL) %fs 0 (KFS_SEL) %gs 0x1c3 (LWPGS_SEL) %ss 0x30 (KDS_SEL) trapno 0xe (Page fault) err 0x2 (page not present,write,supervisor) %rfl 0x10286 (parity|negative|interrupt enable|resume) fsbase 0xffffffff80000000 gsbase 0xffffffff8901c800
ufs:alloccgblk+0x42f() ufs:alloccg+0x473() ufs:hashalloc+0x50() ufs:alloc+0x14f() ufs:lufs_alloc+0xf3() ufs:lufs_enable+0x261() ufs:ufs_fiologenable+0x63() ufs:ufs_ioctl+0x3e0() genunix:fop_ioctl+0x3b() genunix:ioctl+0x180() unix:_syscall32_save+0xbf() -- switch to user thread's user stack --
The trap has occurred in alloccgblk+42f() in the ufs code.
ufs:alloccgblk+0x410 call +0xe0a4 (ufs:clrblock+0x0) ufs:alloccgblk+0x415 decl 0x1c(%r13) ; cgp->cg_cs.cs_nbfree-- ufs:alloccgblk+0x419 decl 0xc4(%r14) ; fs->fs_cstotal.cs_nbfree-- ufs:alloccgblk+0x420 movslq 0xc(%r13),%r8 ; %r8 <- cgp->cg_cgx ufs:alloccgblk+0x428 addq 0x2d8(%r14),%r8 ; %r8 <- fs->fs_u.fs_csp[%r8] ufs:alloccgblk+0x42f decl 0x4(%r8) <-- panic here
We've just made a call to ufs:clrblock() and are decrementing something after a long list of pointer dereferencing. We only call clrblock() once in this routine, so that puts us at:
428 #define fs_cs(fs, indx) fs_u.fs_csp[(indx)] 1238 clrblock(fs, blksfree, (long)blkno); 1239 /\* 1240 \* the other cg/sb/si fields are TRANS'ed by the caller 1241 \*/ 1242 cgp->cg_cs.cs_nbfree--; 1243 fs->fs_cstotal.cs_nbfree--; 1244 fs->fs_cs(fs, cgp->cg_cgx).cs_nbfree--;
Panicing on line 1244.
I should note at this point that the source I am quoting is from the same source tree as opensolaris.
After the macro expansion, it becomes
So what is cgp->cg_cgx?
SolarisCAT(vmcore.0/11X)> sdump 0xffffffff8930c000 cg cg_cgx cg_cgx = 0x6f0000
This is probably a trifle on the largish side, which would explain how we have ended up in unmapped memory.
The address we end up with for the (struct csum \*) is 0xfffffe80fe39d6c0
If we go back to look at fs->fs_ncg, we see that there were only two cylinder groups allocated. We have an obvious inconsistancy.
Also, interestingly, this is not dieing in the mount(2) system call. It's dieing in a subsequent ioctl(2). This ioctl appears to be the one enabling ufs logging.
So how might we handle this?
Now, as the filesystem is already mounted and we are in a subsequent ioctl(), we can't fail the mount. Could we fail in alloccgblock() and have the error propogate back up to the process making the ioctl()?
Walking back up the stack, we see that alloccg() handles alloccgblk() returning 0. Now if alloccg() returns 0 to hashalloc(), in this instance, we'll first try to do a quadratic rehash, which will also fail, so we'll fall through to the brute force search. As this starts at cylinder group 2 and there are only 2 cylinder groups, this will call alloccg() once and fail, falling through to return 0 to alloc(). Note that no matter how many times we end up in alloccgblock() it has the same arguments, so it would fail the same way.
In alloc(), it notes that we did not get a block returned and assumes this is because some other thread grabbed the last block. It then treats the whole thing as if we ran out of space and returns ENOSPC to lufs_alloc(). lufs_alloc() catches this, frees up everything and returns the error (ENOSPC) to lufs_enable(), which in turn catches it cleans up and returns it to ufs_fiologenable() and the error is eventually passed back to user space. While not exactly the error we would have hoped, the end result would be that logging would not be turned on and the system would not panic due to this corrupted filesystem.
I'll log this as a bug against ufs for Solaris 10 and nevada
I have logged CR 6492771 against this issue. The link for the CR should work some time in the next 24 hours, but the content as logged is pretty much a cut and paste from this blog entry.
The bug I logged has been closed as a duplicate of
4732193 ufs will attempt to mount filesystem with blatantly-bad superblock