Thursday Jul 30, 2009

Hung system, need core

I've got a hung v20z and I want to get a core. Luckily I had it configured to drop down to kmdb and I had a chatline available to people in my group. :-<

The trick is get the LOM to work with you. In this case, CTRL-e c l 0 (that is an 'L' and not a '1') got the LOM console to force the system to drop down into kmdb. And from there, it was easy to generate the core:

Jul 29 12:35:41 pnfs-17-22 last message repeated 1 time
[halt aborted]
[ignored]
[halt sent]

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ scsi_vhci crypto mac cpc uppc neti sd ptm ufs unix 
cpu_ms.AuthenticAMD.15 sv mpt zfs krtld s1394 sppp rdc nca uhci ii hook lofs 
genunix idm ip nsctl logindmux nsmb sdbc usba specfs pcplusmp nfs md random 
cpu.generic sctp arp stmf sockfs smbsrv ]
[1]> $

Now I need to figure out why the system was wedged!

And the core was corrupt!


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Friday Apr 03, 2009

Just got an Ultra 24

I had an Ultra 24 waiting for me at the Tulsa offices, and really, I never knew we had a Tulsa office and that it was a quarter mile from my house.

The machine was dead quiet when I turned it on and then it started to roar. The fans would not go off. The machine would actually not boot at all. All I could find were these forum posts - Ultra 24 FAN speed. I downloaded the 1.5 version of the BIOS and that did not help. After the reboot, the fans were still going. I then turned the machine off as I checked to make sure I hadn't disconnected anything when I added RAM. I hadn't, but the machine started to work after that. :->. I believe what was effectively a cold reboot enabled that.

The other thing I did notice is that the system came with Solaris 10. I went through the initial install screens and they appeared to be as sluggish as I remembered. I then installed OpenSolaris on the box (onnv_109) and the install screens there screamed. The UI was the same, the exact same series of screens, but the overall response level was much faster.

I've now recommissioned the machine name ultralord in my home domain.


Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Thursday Mar 05, 2009

Next Oklahoma City OpenSolaris User Group meeting

On March 10th, I'll be heading out to Oklahoma City for the next OKCOSUG meeting: OpenSolaris Project: Oklahoma City OpenSolaris User Group. I find the topic to be interesting and I like meeting people who use our product.

You can click on the link above for directions and such - also, please be sure to register to let Bryan know how many to plan for getting refreshments. And a quick summary of the agenda is:

Agenda 

5:00PM to 5:30PM Meet and Greet
5:30PM to 7:00PM Sun Unified Storage 7000 Technical Overview
7:00PM to 7:15PM Summary and open for questions

Originally posted on Kool Aid Served Daily
Copyright (C) 2009, Kool Aid Served Daily

Thursday Oct 30, 2008

Said Syed's OKCOSUG talk

Just got back from my first Oklahoma City OpenSolaris User Group meeting. It was fun. Said gave a presentation on sizing provisioning for attaching storage to VMware's ESX. It provided a good overview on VMotion and SVMotion.

But of more interest to me was Said's interest in serving the customer. Not only did he try to shape the presentation to those in the audience, he wanted to learn how to convey information better in his blog. I hope I was able to help.

One thing that I learned about customer service and blogging from him was that he flat out told the audience, if you have a question, post a comment in one of my blog entries and I'll get back to you. I.e., he is flipping the push model of the author delivering in blogging to a pull model of the reader driving content. I was floored by this and came away wondering how to have a free floating request section such that comment fields stay true to the blog article and new content can be driven.

Also of interest was the audience member who basically asked when was Sun's xVM going to support NFSv4.1. We don't even have it close to shipping and already people want it in configurations we aren't thinking about!


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Monday Oct 13, 2008

I've been meaning to learn more about RBAC

If you also have been meaning to learn more about RBAC, a good start would be: Introducing pfexec, a Convenient Utility in the OpenSolaris OS By Joerg Moellenkamp, with contributions from Marina Sum, October 13, 2008.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Wednesday Oct 08, 2008

Getting around a tool repository which is not updating

With the introduction of Mercurial, we have a need to keep our tools directory up to date. We could simply NFS mount the one in Menlo Park, but for WAN and build performance, that sucks. So, the Austin Labs have a local copy. And it is not being kept up to date. We've all been bitten by an old copy of the BFU script.

To get around this, we've built our own local repository and made sure that our paths all take this into account. Well, that just failed for me:

[th199096@jhereg mms]> hg outgoing -v
running ssh onnv.eng "hg -R /export/onnv-clone serve --stdio"
comparing with ssh://onnv.eng//export/onnv-clone
searching for changes
abort: style not found: /ws/onnv-tools/onbld/etc/hgstyle

I know the 'hgstyle' stuff is new, I saw Flag Day info on it. And sure enough: [th199096@jhereg mms]> df -k /ws/onnv-tools/onbld/etc Filesystem kbytes used avail capacity Mounted on mool-ha1-nfs.central:/export/ds01/d531/tools/01/elpaso.eng/opt/onbld 140454588 109105801 29944242 79% /ws/onnv-tools/onbld

I don't want to hack on the script, which I think shouldn't be using the full path. So I'll have to change where I'm getting my copy of the tools in /ws.

Okay, I don't have permissions on the NIS server, but I can get the map:

[th199096@jhereg ~]> ypcat -k auto.ws | grep onnv-tool
onnv-tools /SUNWspro   -ro mool-ha1-nfs.central:/export/ds01/d531/tools/01/slug-17.eng/export/$CPU/opt/SUNWspro    /teamware   -ro mool-ha1-nfs.central:/export/ds01/d531/tools/01/slug-17.eng/export/$CPU/opt/SUNWspro/SOS8    /onbld      -ro mool-ha1-nfs.central:/export/ds01/d531/tools/01/elpaso.eng/opt/onbld

And I can add it to my local /etc/auto_ws:

#
# Local copies of /ws workspaces
#
# For /ws/on10-clone use:
# /ws/on10-patch-clone-auspen or on10-feature-clone-auspen
#
on10-clone-aus          iquad:/pool/ws/on10-clone
on10-patch-clone-aus    iquad:/pool/ws/on10-patch-clone
onnv-clone-aus          iquad:/pool/ws/onnv-clone
on10-test-aus           iquad:/pool/ws/on10-test
onnv-test-aus           iquad:/pool/ws/onnv-test
onnv-stc2-aus           iquad:/pool/ws/onnv-stc2
on10-tools-aus  -ro     iquad:/pool/ws/on10-tools-$CPU
onnv-tools-aus  -ro     aus1500-home:/pool/ws/onnv-tools-$CPU
onnv-tools      /SUNWspro       -ro     /opt/SUNWspro /teamware       -ro     /opt/SUNWspro/SOS8    /on
bld     -ro     /opt/onbld

And no go:

[th199096@jhereg /etc]> sudo svcadm restart autofs
...
[th199096@jhereg th199096]> ls -la /ws/onnv-tools
/ws/onnv-tools: Permission denied
total 1
[th199096@jhereg th199096]> dmesg
...
Oct  8 16:54:23 jhereg automountd[883428]: [ID 406441 daemon.error] parse_entry: mapentry parse error: map=auto_ws key=onnv-tools
Oct  8 16:55:55 jhereg automountd[883477]: [ID 406441 daemon.error] parse_entry: mapentry parse error: map=auto_ws key=onnv-tools

I turn spaces into tabs, no luck. I check other machines and they do the hierarchy locally for other things. Well, I then convert the pathnames from /opt/SUNWspro to localhost:/opt/SUNWspro. And that turns the trick:

[th199096@jhereg th199096]> ls -la /ws/onnv-tools
total 5
dr-xr-xr-x   4 root     root           4 Oct  8 17:04 .
dr-xr-xr-x   2 root     root           2 Oct  8 17:04 ..
dr-xr-xr-x   1 root     root           1 Oct  8 17:04 SUNWspro
dr-xr-xr-x   1 root     root           1 Oct  8 17:04 onbld
dr-xr-xr-x   1 root     root           1 Oct  8 17:04 teamware

I probably need to put a real fix into our jumpstart servers and make the path dependent on $CPU, but I think I was doing something when this happened.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Tuesday Oct 07, 2008

Oklahoma City OpenSolaris Users Group (OKCOSUG ) October 30th 2008 Meeting

The Oklahoma City OpenSolaris Users group (OKCOSUG) next meeting will be on Thursday, October 30th at Oklahoma City University. Said Syed, Sun Staff Engineer, will be the featured speaker. The meeting will run from 5:30 to 7:30PM, with light refreshments beginning at 5:00PM.

Pizza and refreshments will be provided, so please RSVP by pointing your favorite browser to OKCOSUG Event to help us get an estimate of attendance. It will also help speed up the sign-in process.

!! FREE OpenSolaris Back to School Kits CDs and Giveaways. !!

Agenda

  • 5:00PM to 5:30PM Meet and Greet
  • 5:30PM to 5:45PM VMware Overview
  • 5:45PM to 7:15PM Sizing your storage needs
    • - Why storage is such a big deal in Virtualized environments
    • - How to appropriately size Storage arrays for virtualized environments
    • - Real life examples
    • - Summary
    • - Where to get more information
  • 7:15PM to 7:30PM Questions

And as always for the latest updates don't forget to check our web page at: http://opensolaris.org/os/project/okcosug/. Our Users Group Email is: ug-okcosug@opensolaris.org.

Thank you for your time and looking forward to seeing you at this and any future OpenSolaris meetings.

Sincerely,

Bryan Boden

Friday Oct 03, 2008

Building OpenSolaris inside SWAN

Okay, building OpenSolaris with the opensolaris.sh environment inside SWAN is different. I first tried it with:

% ws cleanroom
% nightly opensolaris.sh

And got garbage. I tried palying with some environment variables and didn't get anywhere. I then tried it with bldenv:

% exit
% cd cleanroom
% bldenv -d opensolaris.sh
% nightly opensolaris.sh

That went fast:

/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.7 2005/10/13
number of concurrent jobs = 36

No 32-bit compiler found
\*\*\* Error code 1
The following command caused the error:
if /builds/th199096/cleanroom/usr/src/tools/proto/opt/onbld/bin/i386/cw -_cc -_versions >/dev/null 2>/dev/null; then \\

Finally, I went back to the ws approach and with the following opensolaris.sh diffs:

[th199096@jhereg cleanroom]> diff opensolaris.sh usr/src/tools/env/opensolaris.sh 
45c45
< GATE=cleanroom;                       export GATE
---
> GATE=testws;                  export GATE
48c48
< CODEMGR_WS="/builds/th199096/$GATE";                  export CODEMGR_WS
---
> CODEMGR_WS="/export/$GATE";                   export CODEMGR_WS
91c91
< STAFFER=th199096;                             export STAFFER
---
> STAFFER=nobody;                               export STAFFER
157c157
< #BUILD_TOOLS=/opt;                            export BUILD_TOOLS
---
> BUILD_TOOLS=/opt;                             export BUILD_TOOLS
159,161c159,160
< #SPRO_ROOT=/opt/SUNWspro;                     export SPRO_ROOT
< #SPRO_VROOT=$SPRO_ROOT;                               export SPRO_VROOT
< #__SSNEXT="";                                 export __SSNEXT
---
> SPRO_ROOT=/opt/SUNWspro;                      export SPRO_ROOT
> SPRO_VROOT=$SPRO_ROOT;                                export SPRO_VROOT
186d184
< export CW_NO_SHADOW=1

That seems to have worked. Now I need to test a pNFS community setup and run cthon.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Remember to read the README

So I have the closed binaries which correspond to the new nfs41-gate up on osol. I grabbed a copy of that source and started a build up. And it failed.

My thoughts were that either:

  1. I hosed the push to osol and thus all of the source was not there.
  2. The recent switch to Sun Studio 12 is impacting me.

The first is justifiable paranoia and the second has happened to me before. So, I searched my blog (more than 51% of why I blog is to have an easy to search repository of tips, tricks, and efdups.) and found this tidbit: RTFR - Or make sure you do read all of the README. Now it wasn't a direct hit, but what the hey, while I'm here I should read that README.

And sure enought, it has something on the compiler switch:

   Please note that the compiler that comes with the Solaris Developer
   Express release is Studio 12, which is not the standard compiler
   for OpenSolaris code.  If you use Studio 12, you will need to set
   __SSNEXT to the null string in your environment file.  Please do
   report problems with Studio 12, particularly if the problem goes
   away when you use Studio 11 (the current standard compiler).

I'll rebuild with that change and see if it is a hit or the paranoia is justifiable after all.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Wednesday Oct 01, 2008

One code review out, another to come shortly

I just put a code review request out for 6751438 mirror mounted mountpoints panic when umounted on nfs-discuss (see [nfs-discuss] Code reviewers wanted for 6751438 mirror mounted mountpoints panic when umounted ).

The hardest part was finding time to test. This resulted from a fix made a couple of months ago. And at that time, both unit and mini-PIT testing showed no panics. And now the mirrormount test suite inside mini-PIT could reliably trigger a panic. Luckily, I understand what the bug is and the panics have stopped.

I'm also about to ask for a code review for 6738223 Can not share a single IP address, which is quite simple to fix and we probably never would have fixed it except for:

  1. I saw someone copying it over to the CIFS code.
  2. We've had a couple of people ask about it on nfs-discuss at OpenSolaris.

The basic issue is that you can not share to a single IP without explicitly mentioning a netmask. I go on about it in these old blog entries: [Open]Solaris and sharing subnets and single machines and Checking a host entry - some code analysis.

The fix is easier than the testing, but I'll do that in the morning after a fresh build and ask for the code review later in the day.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Bugs from this weekend's installfest

So I filed two bugs from my install experience this weekend (see Time to update my w2100z, First reboot after install of w2100z, One of our boxes is missing, and finally Finding that missing box):

6754049 eeprom is rewriting menu.lst with bad information
When I used eeprom(1M) to change the console value, grub's menu.lst got hosed and the default set to the bad entry.
6754052 mounting either a blank dvd or a usb drive caused a panic
I went to burn a new DVD for a headed system and mounted both a USB drive (which had the ISO image) and a DVD drive by inserting a blank DVD. One of those actions caused a panic.

Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Tuesday Sep 30, 2008

Build turd issues you may not see in a Flag Day or a heads up...

Sometimes when you do an incremental build, you run into some fluff that kills your build:

==== cpio archives build errors (DEBUG) ====

Failed to create generic kernel archive:	200550 blocks
cpiotranslate: kernel/misc/amd64/sysinit: no packaging info
cpiotranslate: kernel/misc/sysinit: no packaging info

And everyone is supposed to know how to handle these:

[th199096@aus-build-x86 mms]> ls -al proto/root_i386/kernel/misc/amd64/sysinit
-rwxr-xr-x   1 th199096 staff       4200 Sep 25 21:22 proto/root_i386/kernel/misc/amd64/sysinit
[th199096@aus-build-x86 mms]> rm proto/root_i386/kernel/misc/amd64/sysinit proto/root_i386/kernel/misc/sysinit
[th199096@aus-build-x86 mms]> `which nightly` -in nightly.env

Hey, what do you know, Ken Erickson did have a Flag Day for those who maintain private copies of bfu; Heads up for everyone else], but it still does not mention cleaning up the turd on your own.

After all, everyone knows how to deal with these turds.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Monday Sep 29, 2008

Really loving that upgrade from snv85 to snv99

I love hacks which make your life easier, but I also love an evolving OS. It used to be to do ssh-agent management, I had the following in my .dtprofile (and I think dt is no longer being invoked):

###
if whence ssh-agent > /dev/null && [[ ${SSH_AGENT_PID:-0} -eq 0 ]]
then
        eval $(ssh-agent) > /dev/null
        trap "kill $SSH_AGENT_PID" EXIT
fi
(xterm -e ssh-add &)
###

I'd get a little X window and have to manually enter my pass phrases every time I rebooted.

I don't know when it was introduced, but we now have a proper keychain manager and I'm loving it.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

cdrw output looks more userfriendly

At last, a bright spot:

[root@warlock archives]> cdrw -l
Looking for CD devices...
    Node                   Connected Device                Device type
----------------------+--------------------------------+-----------------
 cdrom0               | AOPEN    COM5232/AAH PRO  1.04 | CD Reader/Writer
 cdrom1               | AOPEN    DUW1608/ARR      A04b | CD Reader/Writer

instead of ... ahh, I don't have a capture of it

Much easier to remember which is which now for me...


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

Saturday Sep 27, 2008

Builds are too slow...

Okay, I've got a brand new Sun Fire X4150 Server and it is geeked out with processors and memory. When I installed SUNWonbld, it said that I should use 36 for dmake concurrency. So, let's set our .make.files and let a build rip.

I'm going to modify usr/src/tools/env/developer.sh with the following:

[th199096@jhereg spe-build]> diff nightly.env  $SRC/tools/env/developer.sh
41c41
< NIGHTLY_OPTIONS="-aFCDlmprn";  export NIGHTLY_OPTIONS
---
> NIGHTLY_OPTIONS="-aCDlmpr";           export NIGHTLY_OPTIONS
194d193
< export CW_NO_SHADOW=1

I cut out the $STAFFER and such. The main differences are that I am not doing the gcc shadow building and I am not doing a non-DEBUG build. This should blaze, but it doesn't:

==== Nightly distributed build started:   Fri Sep 26 21:09:37 CDT 2008 ====
==== Nightly distributed build completed: Fri Sep 26 22:17:58 CDT 2008 ====

==== Total build time ====

real    1:08:21
...
/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.7 2005/10/13
number of concurrent jobs = 4

Okay, I just took the hit to Studio 12, so maybe there is a bit more time for that. And I think I have everything local, but perhaps I am hitting the network. But lets focus on dmake telling me it will be using 4 concurrent jobs. That is by no stretch 36.

[th199096@jhereg spe-build]> grep jhereg ~/.make.machines 
jhereg   max=36
jhereg.central.sun.com   max=36

I invoke the build like this:

[th199096@jhereg spe-build]> printenv | grep DMAKE
DMAKE_MODE=parallel
DMAKE_MAX_JOBS=36
[th199096@jhereg spe-build]> env -i `which nightly` nightly.env
Time spent in user mode   (CPU seconds) : 10608.23s
Time spent in kernel mode (CPU seconds) : 6272.44s
Total time                              : 1:08:21.75s
CPU utilisation (percentage)            : 411.5%

I use env -i because someone told me that it makes sure I have just the right things in my environment. How can I tell that I'm getting the right number?

I can copy `which nightly` and hack it to just report the dmake concurrency.

[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env 
Testing DMAKE, quick exit
number of concurrent jobs = 4

Okay, pretty clear that I am only getting 4, but why? Add some more debugging in the main DMAKE procesisng code:

hostname=`uname -n`
if [ ! -f $HOME/.make.machines ]; then
        echo "No $HOME/.make.machines found!"
        DMAKE_MAX_JOBS=4
else
        echo "Grepping for $HOST in $HOME/.make.machines"
        DMAKE_MAX_JOBS="`grep $hostname $HOME/.make.machines | \\
            tail -1 | awk -F= '{print $ 2;}'`"
        if [ "$DMAKE_MAX_JOBS" = "" ]; then
                echo "Nothing in that file!"
                DMAKE_MAX_JOBS=4
        fi
fi
DMAKE_MODE=parallel;
export DMAKE_MODE
export DMAKE_MAX_JOBS

And run it:

[th199096@jhereg spe-build]> env -i ./nightly.tst -i nightly.env
Grepping for jhereg in /.make.machines
Nothing in that file!
Testing DMAKE, quick exit
number of concurrent jobs = 4

Hey, why is it looking in /.make.machines and not in my homedir?1

[th199096@jhereg spe-build]> echo $HOME
/home/th199096
[th199096@jhereg spe-build]> more home.tst 
#!/bin/ksh -p
#

echo "My home is $HOME"
[th199096@jhereg spe-build]> env -i ./home.tst 
My home is 
[th199096@jhereg spe-build]> ./nightly.tst -i nightly.env 
Grepping for jhereg in /home/th199096/.make.machines
Testing DMAKE, quick exit
number of concurrent jobs = 36

Okay, env is hosing me.

[th199096@jhereg spe-build]> env -i HOME=/home/th199096 ./home.tst 
My home is /home/th199096

And crap, env spells it out for me:

OPTIONS
     The following options are supported:

     -i | -        Ignores the environment that  would  otherwise
                   be  inherited  from  the  current shell.  Res-
                   tricts the environment  for  utility  to  that
                   specified by the arguments.

So, another quick test:

[th199096@jhereg spe-build]> env ./home.tst
My home is /home/th199096

I know I was told to invoke my builds this way to speed them up - i.e., to grab the correct paths. I also know I've been battling this $HOME issue the whole time.

I wonder how long the build will take now?

[th199096@jhereg th199096]> zfs clone pool/builds/th199096/spe-gate@fresh pool/builds/th199096/spe-build2
[th199096@jhereg th199096]> ws spe-build2

Workspace                    : /builds/th199096/spe-build2
Workspace Parent             : ssh://aus1500-home//pool/ws/th199096/spe-gate
Proto area ($ROOT)           : /builds/th199096/spe-build2/proto/root_i386
Root of source ($SRC)        : /builds/th199096/spe-build2/usr/src
Root of test source ($TSRC)  : /builds/th199096/spe-build2/usr/ontest
Current directory ($PWD)     : /builds/th199096/spe-build2

[th199096@jhereg spe-build2]> cp ../spe-build/nightly.env  .
[th199096@jhereg spe-build2]> vi nightly.env 
[th199096@jhereg spe-build2]> rm ../spe-build/nightly.tst 
[th199096@jhereg spe-build2]> `which nightly` nightly.env 

Yeah, zfs clone is sweet for rapid testing of a baseline!

And we get such a big savings, not!

[th199096@jhereg spe-build2]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 10624.32s
Time spent in kernel mode (CPU seconds) : 7579.56s
Total time                              : 1:04:35.29s
CPU utilisation (percentage)            : 469.7%

The concurrency was correct:

/opt/SUNWspro/bin/dmake
dmake: Sun Distributed Make 7.7 2005/10/13
number of concurrent jobs = 36

All of the important tools are local:

[th199096@jhereg spe-build2]> df -h /opt/SUNWspro/bin/dmake /opt/onbld/bin/nightly /opt/onbld/bin/i386/cw /usr/java/bin/javac /usr/ccs/bin/as 
Filesystem             size   used  avail capacity  Mounted on
pool/tools             134G   9.6G    38G    21%    /pool/tools
pool/tools             134G   9.6G    38G    21%    /pool/tools
pool/tools             134G   9.6G    38G    21%    /pool/tools
/dev/dsk/c0t0d0s0       44G    11G    32G    27%    /
/dev/dsk/c0t0d0s0       44G    11G    32G    27%    /

Ok, the next thing will be to check if there is a difference between working with a clone (which has to copy-on-write) and a fresh dataset.

[th199096@jhereg spe-build3]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 10634.44s
Time spent in kernel mode (CPU seconds) : 9678.18s
Total time                              : 1:08:42.11s
CPU utilisation (percentage)            : 492.7%

No. I'll have to think on this. The other option available is to reimage the system with all 3 disks in the pool:

[root@jhereg ~]> zpool iostat -v
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool        96.9G  39.1G      1      6  61.4K   448K
  c0t1d0    48.5G  19.5G      0      3  30.8K   224K
  c0t2d0    48.5G  19.5G      0      3  30.5K   224K
----------  -----  -----  -----  -----  -----  -----

Not sure how much one more spindle will reduce the build.2.

Okay, last test is to remove the following options:


And these yield the biggest savings to date3:

[th199096@jhereg spe-build4]> `which nightly` nightly.env 
Time spent in user mode   (CPU seconds) : 7993.05s
Time spent in kernel mode (CPU seconds) : 4818.01s
Total time                              : 45:44.46s
CPU utilisation (percentage)            : 466.7%

The hole we as developers tend to fall into is to want to rebuild everything. We don't always need to rebuild the BFU archives if we are just changing a kernel module. At the BAT, I was rebuilding just the nfs or nfssrv modules and scp'ing them over (I might have hosed NFS don't ya know). My "build" times were in the matter of seconds. I spent more time moving the mouse and worrying about whether or not I had changed a header which needed to be installed in my proto area.

And in the end, before I can integrate my changes, I'll need to be lint and cstyle clean, I'll need to build non-DEBUG versions, and I'll need to build for sparc. And I'll need to retest then.

I started off with a moral about questioning advice given to you versus actual experience, but it turns out the increase in dmake concurrency didn't really help, now did it?

Notes

/.make.machine

Going back, I wondered why my test did not complain about not finding /.make.machine:

[root@jhereg scripts]> ls -la /.make.machines 
lrwxrwxrwx   1 root     other         27 Sep 26 12:32 /.make.machines -> opt/onbld/gk/.make.machines
[root@jhereg scripts]> more !$
more /.make.machines
elpaso max=20

So there is a default installed by SUNWonbld.

Broken disk?

Hey, wait, don't I really have four disks and not three?

[th199096@jhereg th199096]> iostat
   tty        sd0           sd1           sd2           sd3            cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
   0  113   0   0    0   66   2   40  304   5   28  303   5   27    3  3  0 93
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@0,0
       1. c0t1d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@1,0
       2. c0t2d0 
          /pci@0,0/pci8086,25f8@4/pci1000,3150@0/sd@2,0

I saw some message before the last jumpstart about taking some disk offline. And I've never really seen jhereg. It is in a lab in Austin.

Okaay, that missing disk is the DVD drive: :->

[root@jhereg ~]> iostat -En
c1t0d0           Soft Errors: 0 Hard Errors: 11 Transport Errors: 6 
Vendor: TSSTcorp Product: CD/DVDW TS-T632A Revision: SR03 Serial No:  
Size: 0.00GB <0 bytes>
Media Error: 0 Device Not Ready: 11 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t0d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DAELAA 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t1d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DAG92A 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c0t2d0           Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: HITACHI  Product: H101473SCSUN72G  Revision: SA25 Serial No: 0827DA6AWA 
Size: 73.41GB <73407865856 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 

Groan, I messed up the 4th build

I got my mail message for that fast lint build and it stated that build3 had finished. I got the wrong directory! I had copied over the nightly.env, fixed the path, and then made an error. So I copied the file over again. Except this time I forgot to change the path!

So the savings may have been false. Another build has been kicked off!

[th199096@jhereg spe-build4]> `which nightly` nightly.env
Time spent in user mode   (CPU seconds) : 7965.57s
Time spent in kernel mode (CPU seconds) : 4818.72s
Total time                              : 46:52.02s
CPU utilisation (percentage)            : 454.6%

So the savings were real.


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily
About

tdh

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today