Removes are failing, time to debug

Removes are failing and I don't think this is the classic issue we have had with this operation. I know that at the last BAT, the removes were succeeding enough to cause a panic. And we don't seem to be getting to that code even.

Snoop shows the client sending the remove, but the MDS is not sending anything to the DS. Barring the MDS sending to the wrong DS, I've put together a simple DTrace script to see what is going on:

[root@pnfs-17-24 ~]> more remove.d
#!/usr/sbin/dtrace -s

::do_ctl_mds_remove:return
{
        printf("rc = %d", args[1]);
}

::ctl_mds_clnt_remove_file:return
{
        printf("rc = %d", args[1]);
}

:::nfss-e-vop_fid_pseudo_failed
{
}

:::nfss-i-layout_is_null_cannot_remove
{
}

::mds_op_remove:entry
{
        self->resop = (nfs_resop4 \*)arg1;
        self->resp = self->resop->nfs_resop4_u.opremove;
}

::mds_op_remove:return
{

        printf("rc = %d", self->resp.status);
}

I don't like the hoops I have to go through to get the status on mds_op_remove, but I have to do that because that function is of type void. I'm going to change them to return a status. It doesn't cost much and makes simple DTrace scripting easier.

Anyway, here is our result:

[root@pnfs-17-24 ~]> ./remove.d
dtrace: script './remove.d' matched 6 probes
CPU     ID                    FUNCTION:NAME
  0  30503 do_ctl_mds_remove:nfss-e-vop_fid_pseudo_failed 
  0  62050         do_ctl_mds_remove:return rc = 28
  0  62052             mds_op_remove:return rc = 0
\^C

What is that 28?

[thud@ultralord nfssrv]> grep 28 /usr/include/sys/errno.h 
#define	ENOSPC	28	/\* No space left on device		\*/

Now I want to know what vop_fid_pseudo() is returning, but only when called in this context. I can use the thread properties like this:

::vop_fid_pseudo:entry
/self->live == 1/
{
        self->vp = (vnode_t \*)arg0;
        self->fidp = (fid_t \*)arg1;
}

::vop_fid_pseudo:return
/self->live == 1/
{
        printf("fid len = %d\\trc = %d",
            self->fidp->un._fid.len, args[1]);
}

Only if self->live is set will I see output:

[root@pnfs-17-24 ~]> ./remove.d
dtrace: script './remove.d' matched 8 probes
CPU     ID                    FUNCTION:NAME
  0  62468            vop_fid_pseudo:return fid len = 10        rc = 28
  0  30503 do_ctl_mds_remove:nfss-e-vop_fid_pseudo_failed 
  0  62050         do_ctl_mds_remove:return rc = 28
  0  62052             mds_op_remove:return rc = 0
\^C

And now I need to know where that error is set:

  49 vop_fid_pseudo(vnode_t \*vp, fid_t \*fidp)
  50 {
  51         struct vattr va;
  52         int error;
  53 
  54         error = VOP_FID(vp, fidp, NULL);
  55 
  56         /\*
  57          \* XXX nfs4_fid() does nothing and returns EREMOTE.
  58          \* XXX nfs3_fid()/nfs_fid() returns nfs filehandle as its fid
  59          \* which has a bigger length than local fid.
  60          \* NFS_FH4MAXDATA is the size of
  61          \* fhandle4_t.fh_xdata[NFS_FH4MAXDATA].
  62          \*
  63          \* Note: nfs[2,3,4]_fid() only gets called for diskless clients.
  64          \*/
  65         if (error == EREMOTE ||
  66             (error == 0 && fidp->fid_len > NFS_FH4MAXDATA)) {
  67 
  68                 va.va_mask = AT_NODEID; 
  69                 error = VOP_GETATTR(vp, &va, 0, CRED(), NULL);
  70                 if (error)
  71                         return (error);
  72 
  73                 fidp->fid_len = sizeof (va.va_nodeid);
  74                 bcopy(&va.va_nodeid, fidp->fid_data, fidp->fid_len);
  75                 return (0);
  76         }
  77 
  78         return (error);
  79 }

And that means I need to know what VOP_GETATTR() is returning. I'll pull the same trick as before, but I'll need to be looking at, well, I have to figure that out now, don't I?

The trick is to let the vnode tell me:

int
fop_getattr(
        vnode_t \*vp,
...
        err = (\*(vp)->v_op->vop_getattr)(vp, vap, flags, cr, ct);

So I need to find the pointer to the function:

        printf("vop_getattr = %p", self->vp->v_op->vop_getattr);

And then I need to find out what that is:

  0  62467             vop_fid_pseudo:entry vop_getattr = fffffffff883c180
...
[root@pnfs-17-24 ~]> mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs mpt sd sockfs ip hook neti sctp arp usba stmf fctl nca lofs idm cpc random zfs nfs fcip logindmux ptm sppp ]
> fffffffff883c180::dis
zfs_getattr:                    pushq  %rbp

Okay, we at least know which filesystem is complaining. I probably could have done this differently, but this way works. The only error in that function comes from zfs_zaccess():

  0  53373               zfs_zaccess:return rc = 0
  0  53125               zfs_getattr:return rc = 0
  0  53125               zfs_getattr:return rc = 0

Which means I was reading the code incorrectly! Note that these are being called by the code, my check forces that to be true, but the error must be elsewhere. It must be the VOP_FID() which is causing the issue!

4323 static int
4324 zfs_fid(vnode_t \*vp, fid_t \*fidp, caller_context_t \*ct)
4325 {
4326         znode_t         \*zp = VTOZ(vp);
4327         zfsvfs_t        \*zfsvfs = zp->z_zfsvfs;
4328         uint32_t        gen;
4329         uint64_t        object = zp->z_id;
4330         zfid_short_t    \*zfid;
4331         int             size, i;
4332 
4333         ZFS_ENTER(zfsvfs);
4334         ZFS_VERIFY_ZP(zp);
4335         gen = (uint32_t)zp->z_gen;
4336 
4337         size = (zfsvfs->z_parent != zfsvfs) ? LONG_FID_LEN : SHORT_FID_LEN;
4338         if (fidp->fid_len < size) {
4339                 fidp->fid_len = size;
4340                 ZFS_EXIT(zfsvfs);
4341                 return (ENOSPC);
4342         }

Some quick DTrace will tell me I'm correct:

::zfs_fid:entry
/self->live == 1/
{
        zp = (znode_t \*)(self->vp)->v_data;
        zfsvfs = zp->z_zfsvfs;

        lfid = sizeof (zfid_long_t) - sizeof (uint16_t);
        sfid = sizeof (zfid_short_t) - sizeof (uint16_t);

        size = (zfsvfs->z_parent != zfsvfs) ? lfid : sfid;

        printf("fidlen %s size! (%d, %d, %d, %d)",
            self->fidp->un._fid.len < size ? "<" : "not <",
            self->fidp->un._fid.len, size, lfid, sfid);
}
...
  0  53162                    zfs_fid:entry fidlen < size! (0, 10, 20, 10)
  0  53163                   zfs_fid:return rc = 28

So is this new? Or something I introduced? I don't see how the code ever worked here. We pass a fid_t off the stack, i.e., uninitialized, into vop_fid_pseudo() and when we call zfs_fid(), that check should trigger every time. The comments on lines 57-61 lead me to think that vop_fid_pseudo() is not intended for types other than NFS. In any event, the other callers of it do:

makefh4(nfs_fh4 \*fh, vnode_t \*vp, struct exportinfo \*exi)
...
        fid_t fid;
...
        bzero(&fid, sizeof (fid));
        fid.fid_len = MAXFIDSZ;

And the file should have been removed!

  0  67453                   zfs_fid:return rc = 0
  0  81693             vop_fid_pseudo:entry vop_getattr = fffffffff8088180
  0  67452                    zfs_fid:entry fidlen not < size! (64, 10, 20, 10)
  0  67453                   zfs_fid:return rc = 0
  0  81694            vop_fid_pseudo:return fid len = 10        rc = 0

From the snoop we can both see the MDS communicating with the DS and that I need to go teach snoop about a remove!

CTL-MDS:  ----- Sun CTL-MDS -----
CTL-MDS:
CTL-MDS:  Proc = 10 (Remove object(s) or entire fsid at the DS)
CTL-MDS:

While I fixed a valid bug (and we are past the code that was panicking at the BAT, I doubt I fixed the problem we had been seeing:

Before a huge copy:

[root@pnfs-17-22 ~]> zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
pnfs1                       9.35G  6.28G    23K  /pnfs1

After a huge copy:

[root@pnfs-17-22 ~]> zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
pnfs1                       12.5G  3.09G    23K  /pnfs1

And after the delete:

[root@pnfs-17-22 ~]> zfs list
NAME                         USED  AVAIL  REFER  MOUNTPOINT
pnfs1                       12.5G  3.09G    23K  /pnfs1

Note, the DS did report that it tried to delete the file:

[root@pnfs-17-22 ~]> ./remove.d 
dtrace: script './remove.d' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  62434        ctl_mds_srv_remove:return status = 0
  1  62434        ctl_mds_srv_remove:return status = 0
  1  62434        ctl_mds_srv_remove:return status = 0
  0  62434        ctl_mds_srv_remove:return status = 0
  1  62434        ctl_mds_srv_remove:return status = 0
  1  62434        ctl_mds_srv_remove:return status = 0
  0  62434        ctl_mds_srv_remove:return status = 0

Amazingly enough, I had forgotten I was running that DTrace script. So the issue is that ctl_mds_srv_remove() reports success, but the file contents are still there.

I'm starting to wonder if I haven't hit my head on a known ZFS bug that is fixed in 118. The pNFS code is based on 117. I'll investigate.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

tdh

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today