X

News, tips, partners, and perspectives for the Oracle Solaris operating system

From slow ls(1) in my home dir to standards bug

Darren Moffat
Senior Software Architect

I noticed that ls of my home directory can sometimes be quite slow. Now I'm not one of those people with lots of files in my home directory. Of the non dot files and dot directories this home directory (my one in the US) has 6 directories (bin, lib, share, tmp, public_html and wx.backup) and one symlink that points to my UK home directory.

It is only slow if that symlink is present and not otherwise. Quick "fix" move the uk symlink to be .uk since my default shell aliase for ls(1) doesn't pass the -a flag.

Thats all very well but why is this happening ?

It turns out there are two completely separate things going on here. My shell, zsh, has autocorrection turned on and was thus doing an access(2) on the file arguments for ls(1). I discovered this, and the fact that it was the access(2) of the symlink that was taking the time by using truss(1) like this:

$ truss -D -f -p 391463
391463: read(10, 0xFFBFF843, 1) (sleeping...)
391463: 1.2128 read(10, "\\n", 1) = 1
391463: 0.0100 pollsys(0xFFBFF9F0, 1, 0xFFBFF988, 0x00000000) = 0
391463: 0.0005 getuid() = 35661 [35661]
391463: 0.0006 write(10, "\\r\\n", 2) = 2
391463: 0.0003 alarm(0) = 0
391463: 0.0003 access("/bin/ls", F_OK) = 0
391463: access(".uk", F_OK) (sleeping...)
391463: 12.2743 access(".uk", F_OK) Err#13 EACCES
391463: access("./.uk", F_OK) (sleeping...)
391463: 1.9904 access("./.uk", F_OK) Err#13 EACCES
391463: 0.0016 openat(AT_FDCWD, ".", O_RDONLY|O_NDELAY|O_LARGEFILE) = 3
391463: 0.0003 fcntl(3, F_SETFD, 0x00000001) = 0
391463: 0.0002 fstat64(3, 0xFFBFEAF0) = 0
391463: 0.0002 getdents64(3, 0xFF364000, 8192) = 5776
391463: 0.0003 close(3) = 0
391463: 0.0003 ioctl(10, TCSETSW, 0x000A16D8) = 0
391463: 0.0002 time() = 1136470377
391463: 0.0001 lwp_sigmask(SIG_SETMASK, 0x00020000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
391463: 0.0002 lwp_sigmask(SIG_SETMASK, 0x00020000, 0x00000000) = 0xFFBFFEFF [0x0000FFFF]
391463: 0.0001 pipe() = 3 [4]
391463: 0.0008 fork1() = 895324
....

That first access call took a long time. Note also that when the access(2)
does return it is failing with EACCESS - this is because this lab machine isn't in the correct netgroup to NFS mount my home directory and was expected.

Okay so lets run it as

nocorrect ls -la .uk
. It is still slow,
so back to the truss output.

...
949565: 0.0003 acl("./lib", ACE_GETACL, 7, 0x000315F8) = 7
949565: 0.0002 lstat64("./uk", 0xFFBFF6F8) = 0
949565: 0.0003 readlink("./uk", "/net/enospc.uk/export/home/.sys/darrenm", 4096) = 39
931411: sigsuspend(0xFFBFFA30) (sleeping...)
949565: stat64("./uk", 0xFFBFE660) (sleeping...)
949565: 4.4851 stat64("./uk", 0xFFBFE660) Err#13 EACCES
949565: 0.0006 lstat64("./wx.backup", 0xFFBFF6F8) = 0
...

Hmn this time it is ls(1) that is making the system call responsible for
the slow down, stat(2). So why is ls(1) doing a stat on the symlink,
after all it already knows it is a symlink. A very quick search of
the OpenSolaris source show this code fragment in ls.c where we call
stat(2) just after calling readlink(2).

1232 

/\*
1233

\* follow the symbolic link
1234

\* to generate the appropriate
1235

\* Fflg marker for the object
1236

\* eg, /bin -> /sym/bin/
1237

\*/
1238

if ((Fflg || pflg) &&
1239

(stat(file, &statb1) >= 0)) {
1240

switch (statb1.st_mode &
1241

S_IFMT) {
1242

case S_IFDIR:
1243

buf[cc++] = '/';
1244

break;
1245

case S_IFSOCK:
1246

buf[cc++] = '=';
1247

break;
1248

default:

Yes my default shell alias for `ls` does indeed pass the `-F` flag. So
what I need is a variant of `-F` that doesn't follow the symlink unless
I also pass the `-L` flag.

According to ls(1) that is the behaviour
of /usr/xpg6/bin/ls but not of /usr/bin/ls. So `nocorrect /usr/xpg6/bin/ls -F uk` should do what I want, however it doesn't. There is no #define XPG6 bracketing around that stat(2) call, there is an #define XPG6 bracket around a stat(2) just a few lines further down though. I believe there should be an #ifndef XPG6 around the code fragment I posted above. I not standards expert so I'm going to ask a colleague who is.

This certainly wasn't where I expected ls being slow to end up. If this turns out to be the bug I think I have identified it will be my first standards compliance bug I've found. It is the first time I've used one of the new xpg6 variants of the Solaris commands, I've used /usr/xpg4/bin/ variants before though.


Technorati Tags:

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.