Kprobes in Linux vs Dtrace

An article on osnews points at an article at IBM about Kprobes on Linux

While this is probably a step in the right direction I still have some concerns. I would encourage the author to look at adding in some more protection. i.e. Always practice safe probing.

  1. I don't see any checking for NULL Pointer dereferences for the printk's. If this is the case, then a poorly written kprobe can still take out a production box. In fact any bad piece of code could take it out.
  2. It stills rather clunky to get simple probes inserted. Looking through the article shows a lot of work required to get the probes in. The equivalent probe in dtrace would be
    #!/usr/sbin/dtrace -s
    
    syscall::fork1:entry, syscall::forkall:entry, syscall::vfork:entry {
        printf("\\n\\tpid=%d kthread=0x%llx\\n", pid, (long long)curthread);
        printf("\\tt_state=0x%x cpu=%d\\n",
            curthread->t_state, curthread->t_cpu->cpu_id);
        printf("\\n\\tCaller program is \\"%s\\"\\n\\n", execname);
        printf("\\tUser Space stack\\n");
        ustack();
        printf("\\n\\tKernel Space Stack\\n");
        stack(10);
    }
    
    
    Which gives us the following results
    # ./fork.d
    dtrace: script './fork.d' matched 3 probes
    CPU     ID                    FUNCTION:NAME
      2    207                      vfork:entry 
            pid=1443 kthread=0x300056a3c60
            t_state=0x4 cpu=2
    
            Caller program is "csh"
    
            User Space stack
    
                  libc.so.1`vfork+0x20
                  csh`execute+0xcbc
                  csh`process+0x360
                  csh`main+0xe94
                  csh`_start+0x108
    
            Kernel Space Stack
    
                  unix`syscall_trap32+0xcc
    
    
    Alternately, with the knowledge that in Solaris each of these three system calls call cfork() (which you could also determine with dtrace), we could simply do
    #!/usr/sbin/dtrace -s
    
    fbt::cfork:entry {
        printf("\\n\\tpid=%d kthread=0x%llx\\n", pid, (long long)curthread);
        printf("\\tt_state=0x%x cpu=%d\\n",
            curthread->t_state, curthread->t_cpu->cpu_id);
        printf("\\n\\tCaller program is \\"%s\\"\\n\\n", execname);
        printf("\\tUser Space stack\\n");
        ustack();
        printf("\\n\\tKernel Space Stack\\n");
        stack(10);
    }
    
    Which would give us exactly the same output as the calls to cfork() are done with tail recursion. On an x86 box it would look something like:
    # ./fork.d
    dtrace: script './fork.d' matched 1 probe
    CPU     ID                    FUNCTION:NAME
      0   3882                      cfork:entry 
            pid=669 kthread=0xffffffffd5ae6000
            t_state=0x4 cpu=0
    
            Caller program is "csh"
    
            User Space stack
    
                  libc.so.1`vfork+0x45
                  csh`execute+0x12f
                  csh`process+0x24b
                  csh`main+0xa25
                  80580ea
    
            Kernel Space Stack
    
                  unix`sys_call+0xda
    
    

Now there are also a couple of other nice things to consider here.

  1. No need to register and unregister the probe. If I'm not running the dtrace script, then the probe does not exist.
  2. If I want to change the query, I just edit the script.

  3. This one is actually a pretty basic probe. I can get much more complex information with very little effort, and as I have already stated, it's just a matter of modifying the script and the probe does not exist unless I am running the script.

But the most important thing to remember is that we have protection against the probe taking out the system. That means that we have no hesitation in running dtrace probes on production boxes, where outage time is measured in thousands of dollars per second (yes we have such customers).


Update

Dan Price made a suggestion which tidies the script up even more, meaning that even if we change the way that we do fork(), the script will remain working. This gives us stability with kernel releases as well. To see the new script, look at the comments for this entry.
Comments:

Post a Comment:
Comments are closed for this entry.
About

* - Solaris and Network Domain, Technical Support Centre


Alan is a kernel and performance engineer based in Australia who tends to have the nasty calls gravitate towards him

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Links
Blogroll

No bookmarks in folder

Sun Folk

No bookmarks in folder

Non-Sun Folk
Non-Sun Folks

No bookmarks in folder