Thursday Apr 07, 2005

Sun Ray Audio

The one thing on Sun Ray that I really hate is it's audio support. The problem is that utaudio devices are not multiplexing audio devices so to have more than one application accessing audio requires multiple instances of utaudio and then you have to tell each application which one to use, yuck.

Life would be worse if it was not for this script which I use to kick off applications

#!/bin/ksh

command=$(basename "$1")

case ${UTAUDIODEV:-noset} in
noset)  exec ${1+"$@"}          # Run the application
        # Not reached
        exit 1
        ;;
\*)      AUDIODEV=$(/opt/SUNWut/bin/utaudio $$ "${command}")
        trap 'pkill -f "utaudio.\*$$"; exit 1' EXIT
        LD_PRELOAD=libc_ut.so   # In case the application opens /dev/audio
        ${1+"$@"}               # Run the application
esac



Which I use to wrap the various audio commands. Not perfect but better than the application hanging when it blocks trying to open the UTAUDIO device. The only problem left is that utaudio will start and set the device before it is ready to accept sounds, so this can't be used with audioplay. Grrr.

Tuesday Apr 05, 2005

truss output is not the documented interface.

This comes up quite often. If you truss an application that does asynchronous io you can get something like this:

/1:     kaio(AIOWRITE, 3, 0x080623A0, 67108864, 0x0000003F0C000200, 0x08062350) Err#48 ENOTSUP
/1:     lwp_unpark(10)                                  = 0
/9:     lwp_park(0x00000000, 0)                         = 0
/10:    lwp_park(0x00000000, 0)                         = 0
/10:    pwrite64(3, "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0".., 67108864, 0x0000003F0C000200) 


However none of the aio routines (aiowrite, aioread, aiowait or aio_write aio_read aio_suspend, aio_waitn or lio_listio) can return ENOTSUP. However, and it seems obvious when you are writing it down, the system call, kaio, is not the published interface and the ENOTSUP is used as a private interface with the library so that the library can then give you aio via threads.


The two most common reasons for kaio to return ENOSUP are that the io you are trying is to big or that the device or file system does not support kaio. While this gives the application common programming interface it does have the slight drawback that there is no supported way for the application to know if the device supports kaio or not.

Monday Apr 04, 2005

Tip of the day.

My tip of the day. Probably only applies to folks who use lab systems but could conceivably be useful elsewhere.

If you want to move the root device on you system to another disk or partition. Use lucreate and the luactivate to do the job.

Here is one from a Solaris 8 box I had to do some testing on (sorry the output is a bit long):

v4u-80c-gmp03 123 # lucreate  -n qus_root -m /:/dev/dsk/c2t9d0s0:ufs
Please wait while your system configuration is determined.
Determining what file systems should be in the new BE.

Searching /dev for possible BE filesystem devices
                             
Please wait while the configuration files are updated.
Please wait. Configuration validation in progress...

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
Beginning process of creating Boot Environment <qus_root>.
No more user interaction is required until this process is complete.
\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

Setting BE <qus_root> state to Not Complete.
Creating file systems on BE <qus_root>.
Creating <ufs> file system on </dev/dsk/c2t9d0s0>.
Cylinder groups must have a multiple of 2 cylinders with the given parameters
Rounded cgsize up to 230
/dev/rdsk/c2t9d0s0:     34840528 sectors in 7394 cylinders of 19 tracks, 248 sectors
        17012.0MB in 337 cyl groups (22 c/g, 50.62MB/g, 6208 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 103952, 207872, 311792, 415712, 519632, 623552, 727472, 831392, 935312,
 1039232, 1143152, 1247072, 1350992, 1454912, 1558832, 1662752, 1766672,

Snip

 34107792, 34211712, 34315632, 34419552, 34523472, 34627392, 34731312,
 34835232,
Mounting file systems for BE <qus_root>.
Calculating required sizes of file systems for BE <qus_root>.
Populating file systems on BE <qus_root>.
Copying file system contents to BE <qus_root>.
INFORMATION: Setting asynchronous flag on ABE <qus_root> mount point </.alt.4145/> file system type <ufs>.
Copying of file system / directory </> is in progress...
Copying of file system / directory </> completed successfully.
Creating compare database for file system </>.
Updating compare database on other BEs.
Updating compare database on BE <qus_root>.
Compare databases updated on all BEs.
Making Boot Environment <qus_root> bootable.
Making the ABE bootable.
Updating ABE's /etc/vfstab file.
The update of the vfstab file on the ABE succeeded.
Updating ABE's /etc/mnttab file.
The update of the mnttab file on the ABE succeeded.
Updating ABE's /etc/dumpadm.conf file.
The update of the dumpadm.conf file on the ABE succeeded.
Updating partition ID tag on boot environment <qus_root> device </dev/rdsk/c2t9d0s2> to be root slice.
Updating boot loader for <SUNW,Ultra-80> on boot environment <qus_root> device </dev/dsk/c2t9d0s0> to match OS release.
Making the ABE <qus_root> bootable succeeded.
Setting BE <qus_root> state to Complete.
Creation of Boot Environment <qus_root> successful.
Creation of Boot Environment <qus_root> successful.
v4u-80c-gmp03 124 # lustatus
BE_name                     Complete  Active  ActiveOnReboot  CopyStatus
------------------------------------------------------------------------
orig                        yes       yes     yes             -         
qus_root                    yes       no      no              -         
v4u-80c-gmp03 125 # luactivate qus_root

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

The target boot environment has been activated. It will be used when you 
reboot. NOTE: You must use either init or shutdown when you reboot.  If 
you do not use one of these commands, the system will not boot using the 
target BE.

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

In case of a failure while booting to the target BE, the following process 
needs to be followed to fallback to the currently working boot environment:

1. Enter the PROM monitor (ok prompt).

2. Change the boot device back to the original boot environment by typing:

     setenv boot-device disk:a

3. Boot to the original boot environment by typing:

     boot

\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*

Activation of boot environment <qus_root> successful.
v4u-80c-gmp03 126 # init 6
v4u-80c-gmp03 127 # 
INIT: New run level: 6
The system is coming down.  Please wait.
System services are now being stopped.
Print services stopped.
Apr  4 11:04:29 v4u-80c-gmp03 syslogd: going down on signal 15
umount: /usr/local/share busy
nfs umount: /usr/local: is busy
nfs umount: /usr/local: is busy
Live Upgrade: Deactivating current boot environment <orig>.
Live Upgrade: Executing Stop procedures for boot environment <orig>.
Live Upgrade: Current boot environment is <orig>.
Live Upgrade: New boot environment will be <qus_root>.
Live Upgrade: Activating boot environment <qus_root>.
Live Upgrade: Updating partition ID tag on boot environment <qus_root> 
device </dev/rdsk/c2t9d0s2> to be root slice.
fmthard:  New volume table of contents now in place.
Live Upgrade: Updating boot loader for <SUNW,Ultra-80> on boot environment 
<qus_root> device </dev/rdsk/c2t9d0s0> to match OS release.
Live Upgrade: The boot device for boot environment <qus_root> will be 
</dev/dsk/c2t9d0s0>.
Live Upgrade: Changing primary boot device to boot environment <qus_root>.
Live Upgrade: The current boot environment <orig> boots from device 
<disk:a>.
Live Upgrade: The new boot environment <qus_root> boots from device 
</pci@1f,2000/pci@1/scsi@5/sd@9,0:a disk:a>.
Live Upgrade: Activation of boot environment <qus_root> completed.
umount: /usr/local/share busy
umount: /usr/local busy
nfs umount: /usr/local: is busy
The system is down.
syncing file systems... done
rebooting...
Resetting ... 

screen not found.
Can't open input device.
Keyboard not present.  Using ttya for input and output.

Sun Ultra 80 UPA/PCI (4 X UltraSPARC-II 450MHz), No Keyboard
OpenBoot 3.31, 1024 MB memory installed, Serial #15730881.
Ethernet address 8:0:20:f0:8:c1, Host ID: 80f008c1.



Rebooting with command: boot                                          
Boot device: /pci@1f,2000/pci@1/scsi@5/sd@9,0:a  File and args: 
SunOS Release 5.8 Version Generic_108528-29 64-bit
Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
configuring IPv4 interfaces: hme0.
configuring IPv6 interfaces: hme0.
Hostname: v4u-80c-gmp03Configuring /dev and /devices
Configuring the /dev directory (compatibility devices)
The system is coming up.  Please wait.
Live Upgrade: Synchronizing new boot environment.
Live Upgrade: Previous boot environment was <orig>.
Live Upgrade: Current boot environment is now <qus_root>.
NIS domainname is eu.cte.sun.com
Starting IPv6 neighbor discovery.
Setting default IPv6 interface for multicast: add net ff00::/8: gateway fe80::a00:20ff:fef0:8c1
starting rpc services: rpcbind keyserv nis_cachemgr done.
Setting netmask of hme0 to 255.255.248.0
Setting default IPv4 interface for multicast: add net 224.0/4: gateway v4u-80c-gmp03
syslog service starting.
Print services started.
volume management starting.
The system is ready.

v4u-80c-gmp03 console login: 

Friday Apr 01, 2005

When to run fsck

Not when the file system is mounted!

I've been banging my head with this one of an on for a few weeks. I got an email from an engineer who was talking to a customer (who are always right) saying that when they run fsck on a live file system it would report errors:

    # fsck /
    \*\* /dev/vx/rdsk/rootvol
    \*\* Currently Mounted on /
    \*\* Phase 1 - Check Blocks and Sizes
    \*\* Phase 2 - Check Pathnames
    \*\* Phase 3 - Check Connectivity
    UNREF DIRECTORY I=5522736 OWNER=root MODE=40755
    SIZE=512 MTIME=Mar 31 13:07 2005
    CLEAR? y

    \*\* Phase 4 - Check Reference Counts
    \*\* Phase 5 - Check Cyl groups

    67265 files, 1771351 used, 68625795 free (14451 frags, 8576418 blocks, 0.0% fragmentation)

    \*\*\*\*\* FILE SYSTEM WAS MODIFIED \*\*\*\*\*

I kept telling them that running fsck on a live file system can and probably will generate these “errors”. The kernel's in memory copy of the file system is correct and eventually it will bring the on disk copy back in line. However by answering yes they have now corrupted the on disk copy of the file system and to make things worse the kernel does not know this so may not run fsck when the system boots. The warnings section of the fsck and fsck_ufs manual pages gives you a hint that this is a bad thing to do.

The reason they were running fsck was to check the consistency of the file system prior to adding a patch. The right way to do that would be to run pkgchk.

There are times when it is safe to run fsck on live file system, but they are rare and involve lockfs but before you do make sure you really understand what you are doing, my bet is that if you do know, you won't really want to.

I believe the message is now understood by all involved but I'm trying to make sure by adding it to the blog sphere.

grep piped into awk

This has been one of my bug bears for years and it comes up in email all the time. I blame it on the VAX I used years ago that was so slow that this really mattered at the time but now, mostly it is just because it is wrong. This is what a mean, grep piped into awk:

grep foo | awk '{ print $1 }'



Why? Because awk can do it all for you, saving a pipe and a for and exec:

nawk '/foo/ { print $1 }' 



is exactly the same, I use nawk as it is just better. It gets worse when I see:

grep foo | grep -v bar | awk '{ print $1 }'

which should be:

nawk '/foo/ && ! /bar/ { print $1 }'

Mostly these just come about when typed on the command line but when I see them in scripts it just caused me to roll my eyes. They lead me to come up with these short rules for pipes:

If your pipe line contains

Use

grep and awk

nawk

grep and sed

nawk

awk and sed

nawk

awk sed and grep

nawk



Like the pirates code these are not so much rules as guide lines.

Thursday Mar 31, 2005

Fair Sun Ray

Yesterday we got another build of Nevada up and running on our sunray server, so the plan seems to be working. My colleague took the opportunity of less users on the upgraded server to explore whether the fair share schedular was doing what it was supposed to be doing. He ran up more looping processes than we have CPU on the system, then using prstat -J we could see that he had nearly 100% of the system, but my session still worked just fine. I did not perceive any slow down at at all. Then just to prove the point I stared up a load of CPU bound jobs (actually one process with lots of threads) and sure enough we balanced the system with 50% each, this time my session became slightly sluggish. The other users either did not notice, did not care, or just thought it was best to leave us both alone as we are mad.

We did have a slight mishap though with the system live locking, which we appears to be the fault of NFS, which is unfortunate but hey that is why we are living on the edge.

Thursday Mar 24, 2005

More internal aggregation

Robin responded to my last entry by asking if any one would want to aggregate my blog with theirs. This is an interesting question but not the problem I feel that internal blogging could solve.

For the last 8 years or so the group I have been working for is dispersed around the globe and this is good. However there are significant problems that face teams of users who are geographically dispersed, like how do I know what my collegues in Singapore or Bangalore or in the Valley are doing today? We tried to have regular meetings and conf calls, but at what time do you do these? We could all do a short email every day but I know I would junk them when busy and not have them when I need them.

However an internal blog aggregation would allow me to see what my collegues are doing now, at a time of day that suits me.

If others want to read them, good, but that is not the audience I am after.

Wednesday Mar 23, 2005

Group Aggregation

I got a bit side tracked today. There has been much talk about whether group blogs would be a good idea or not and it boiled down to the concept of having internal blogs that are aggregated by team and then those teams aggregated by org until you get an aggregation of all internal blogs.

Seems like a cracking idea to me so after getting planetcycling.org up and running I thought it would be a simple idea to get the planet software running here to aggregate the internal blogs for my team, to start with. Not quite. It gets some strange errors when running with the Solaris python:

Traceback (most recent call last):
  File "planet.py", line 240, in ?
    template = TemplateManager().prepare(template_file)
  File "/net/blahblahblah/planet-pts/htmltmpl.py", line 204, in prepare
    precompiled = self.load_precompiled(file)
  File "/net/blahblahblah/planet-pts/htmltmpl.py", line 341, in load_precompiled
    self.lock_file(file, LOCK_UN)
  File "/net/blahblahblah/planet-pts/htmltmpl.py", line 274, in lock_file
    fcntl.fcntl(fd, fcntl.LOCK_UN)
IOError: [Errno 22] Invalid argument


Bit odd. After a bit of investigating I discover that fcntl.LOCK_UN is set to 8 and not 3. Change that and all is well. Logged the bug and will see what happens but if you are trying to get planet running on solaris this is for you.


So now my team has an aggregation of all our internal blogs. The trouble is there is only one of them an that is mine, and who would be interested in what I'm doing, which was not supposed to be playing with blog aggregators at least not today. Now I have it working in it's basic form building a tree of blogs should be easy.

B

Tuesday Mar 22, 2005

Another reason to like 10

Today's reason for liking Solaris 10 is the lack of a 32 bit Sparc kernel. Cuts the time to test my driver in half. Pity I still have had to port back to 8 and 9 so have spent the last two days testing them. I guess this will not last as eventually an x86 and AMD64 port of the driver will be done and the test time will increase again, but today it is just Sparc.

Friday Mar 18, 2005

Instant Messaging

I now know why my teachers were always trying to get me to write out the question before answering it. They were preparing me for instant messaging. This was why I fell out of love with IRC many years ago it is so easy for the conversation to become multi threaded even when only two people are “chatting” and thus the answer to one question does not end up next to the original question but instead with an irrelevant or worse question later on. You get the same potential problem with all written forms but at least with email you can easily quote the original question.

As Peter pointed out Instant Messaging is back and increasingly in use within corporations including Sun but none of the IM systems have solved this threading problem

Are there any threaded IM solutions out there that have been tried? I'm not sure it would work but equally I'm not sure it would not.

Thursday Mar 17, 2005

Alan is number 1

I'm sure Alan would not be happy about this.







Tuesday Mar 15, 2005

Ultra 60, two careful owners, never thrashed.

Finally EOL'd the Ultra 60 at home as since having the Sun Ray working I had booted it just twice. Once to warm my feet, I kid you not, and once to wipe my home directory off it so I can transfer it to a colleague.

So I am now at the whim of the staff who manage my Sun Ray server and as such I'm running on a system with the latest development build of Solaris on it, which was booted on Monday morning by the folks in Singapore while we were sleeping, or in my case probably not.

: estale.eu FSS 1 $; uname -a
SunOS estale 5.10.1 snv_10 sun4u sparc SUNW,Sun-Fire
: estale.eu FSS 2 $; uptime
  5:26pm  up 1 day(s), 12:11,  19 users,  load average: 1.52, 1.71, 1.88
: estale.eu FSS 3 $;

Not bad for a release that only came out at the end of last week.

Tuesday Mar 08, 2005

Disk Scrubbing

I wrote and maintain a disk test tool that gets used for all sorts of purposes for which it was never intended. This week the question of disk scrubbing came up and I was surprised to see just how fast it could scrub a minnow. Disk scrubbing is a uniquely strange form of IO as it is all about writing as much data as possible and you never need to read the data back. But being able to scrub at a rate of 170Mbytes/sec to a single device via a single connection was pleasantly surprising. I guess I just have not been looking that hard before. Any way it gave me a n excuse to try the USCSI option to the disk tester in anger so it was worth writing after all!

What is odd is the misconceptions about what is needed and when from a disk scrubber. If you are using a disk in a secure data centre then there is no need to scrub the disk with multiple passes, which prevents at least hinders techniques like magnetic force microscopy, since those techniques require physical access to the disk drive. You have already accepted that the data centre is physically secure otherwise your data should never have gone there.

The issue is now just a question of how your data can be protected from over the network access. In that case if the disks are going to be reused by another user they would need to be scrubbed, just once, to prevent the data being read remotely by a user who now has read write access to the disk but does not have physical access. If the disk was to accept a firmware down load, then it might be possible to still do harm, but if the disk in question is in fact an array that can not be configured from the host computer then just a single write will do.



Thursday Mar 03, 2005

All my files belong to nobody

So the new AMD64 build system was ready to go, all that was required was to bring the old one down change the new one's name to pod5 and bring it up. It was not quite that simple. The old LX50 powered off nicely and the new system running build 09 of nevada was then sys-unconfiged so that when it booted I could give it it's new identity. Again all this went apparently smoothly, on booting up I configured DNS and NIS+ and I could login as a normal user. However all was not quite right. All my files were owned by the user “nobody” so I suspected a problem with nfsmapid as our server is running 10 so has NFS v4 After some fiddling about I found Eric's blog which pointed me at the /var/run/nfs4_domain file, which confirmed that the domain name was correct.

A pstack of the nfsmapid showed it stuck trying to call into NIS+, which I find odd as all other commands were o.k. However checking smf showed that the nis_cachemgr was not enabled. Starting that and restarting nfsmapid cured the problem, user error.

Slightly disappointed that nfsmapid did not bleat more publicly that something was not right before handing all remote files to nobody.



Friday Feb 18, 2005

Unexpected benefits

All our labs have remote control power so that system's can be power cycled remotely if needed. However there is an added, unexpected benefit to this. We can safely power off unused systems safe in the knowledge that if they are need, even remotely they can be powered back on again. Even if you don't believe in global warming at least we are saving some cash by not heating a lab and therefore note having to cool it quite so much.

Thursday Feb 17, 2005

Daring to be the same

Today was different. A customer came to see us, which in my job is rare, too rare. The high spot for me was watching a colleague do his Dtrace demo, which is highly interactive with the audience, not on his laptop, but on our production Sun Ray server. None of the users noticed even when he traced every call to mutex_enter, which even as it was being typed I did wonder if it was wise.

The other thing to come out was the question of when to be different. The customer has a highly customised environment with minimal packages installed and no “unneeded” daemons running. Which is fine, there are good reasons to do this, but I just had to ask, why? What was the business reason for not running each daemon and not installing each package? Well it was to reduce risk, risk of security breach and reduce the amount of stuff they had to support. However in reducing risk they had opened up a number of problems with the system, it ran, but there were problems with a number of subsystems that would not have been seen had the system been left alone. What in my opinion was missing from the risk assessment was the risk of being different. By heavily customising the systems to a minimal system they became so different from any other system that they are almost guaranteed to find problems that are not seen by anyone else.

It is not to say that this should not work in an ideal world, but this is not an ideal world. Running large systems is about risk management and reducing the unexpected. I hope that the customer went away with a better understanding of the risks that are exposed by being different. If being different does not get you an edge over your competitors then dare to be the same.

Thursday Feb 10, 2005

On the bleeding edge with Sun Ray

Running Sun on Sun means using the latest and greatest software on you desktop day in day out. For us that means we need a way to be able to upgrade our Sun Ray server every fortnight with a new build.

The problem is that since the Sun Ray server is used by a lot of people, down time is not an option, yet at the same time the reason we are doing this is to flush out any bugs, and this can occasionally impact the system. To balance these things we have a Sun Fire 4900 split into two domains. One 2 board one only 1 board.

We then upgrade the system with 1 board to the new release and when it is working, move a system board from the other domain into this one and advise the users to migrate. When the next release comes along we again upgrade the single board domain and move the board back. I hope this table makes it clear.

Current Build

Enoexec

Estale

Build

Number of Boards

Build

Number of Boards

1

1

2

1

1

2

1

1

2

2

3

3

2

2

1

4

3

1

4

2

5

5

2

4

1



This allows us to always be on the bleeding edge, exercise Dynamic reconfiguration, live upgrade and Sun Ray, while still having systems up for on average just under four weeks.

Tuesday Feb 08, 2005

smf meets nis_cachemgr

If you use NIS+ and reboot system you will know that occasionally the files in /var/nis get corrupted and nis_cachmgr will dump core. So many people opt for starting nis_cachemgr with the flag “-i” so that it does not use the cache at start time and goes and gets new one.

So how do you do this with smf? Oddly there is no option in the manafest to set this:

# svccfg export nisplus
<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
  <service name='network/rpc/nisplus' type='service' version='0'>
    <dependency name='keyserv' grouping='require_all' restart_on='none' type='service'>
      <service_fmri value='svc:/network/rpc/keyserv'/>
    </dependency>
    <exec_method name='start' type='method' exec='/lib/svc/method/nisplus' timeout_seconds='60'>
      <method_context/>
    </exec_method>
    <exec_method name='stop' type='method' exec=':kill' timeout_seconds='60'>
      <method_context/>
    </exec_method>
    <instance name='default' enabled='true'>
      <property_group name='application' type='application'>
        <stability value='Unstable'/>
        <propval name='emulate_yp' type='boolean' value='false'/>
      </property_group>
    </instance>
    <stability value='Unstable'/>
    <template>
      <common_name>
        <loctext xml:lang='C'>NIS+</loctext>
      </common_name>
      <documentation>
        <manpage title='rpc.nisd' section='1M' manpath='/usr/share/man'/>
      </documentation>
    </template>
  </service>
</service_bundle>



However looking in “/lib/svc/method/nisplus”, there is a property that would be used if set:

        cache=`/usr/bin/svcprop -p application_ovr/clear_cache $SMF_FMRI \\
            2>/dev/null`
        if [ $? != 0 ]; then
                cache=`/usr/bin/svcprop -p application/clear_cache $SMF_FMRI \\
                    2>/dev/null`
        fi

        [ "$cache" = "true" ] && cachemgr_flags="$cachemgr_flags -i"

So if you set “application_ovr/clear_cache” or “application/clear_cache” to true you will get the -i option.

# pgrep -fl nis_cache
  260 /usr/sbin/nis_cachemgr
# svccfg -s svc:/network/rpc/nisplus:default \\
    setprop application/clear_cache = boolean: "true"
# svcadm refresh  svc:/network/rpc/nisplus:default
# svcprop -p  application/clear_cache svc:/network/rpc/nisplus:default
true
# svcadm restart svc:/network/rpc/nisplus
# pgrep -fl nis_cach  
1788 /usr/sbin/nis_cachemgr -i



I'm sure this is all crystal clear in the docs.

Wednesday Feb 02, 2005

Stick to the published interfaces

Today's question was one that comes up periodically. How can you determine if dbm_store as failed due to a hash collision as opposed to some other failure, like out of memory or disk space or some thing else. The answer is that you can't. However it did get me thinking what a bad thing it would be to read the source and then develop a program based on the implementation of dbm_store rather than the documented interfaces. Now this is not new or rocket science, but is something people will have to be aware of if they use opensolaris to find out how things work. It is good to know how they work but the interfaces are still just the ones that are documented.

The bit that made me smile was the test program I was sent dutifully checked errno after dbm_store failed and then reported it, as if dbm_store set it. Changing the test to set errno before the call to dbm_store allowed me to choose the error I would see.

Monday Jan 31, 2005

Solaris 10.

Solaris 10 is go. It is hard to think of a previous minor release that has so much. Solaris 2.3. 2.4, 2.5 and 2.6 were all incrementally great improvements in quality. Not that they lacked features but working in the Answer Centre at the time of them all, I just recall how the problems we got got harder, a trend that continues to this day. That is to say simple problems were being ironed out, drivers hardened the OS more stable, the remaining bugs harder to find.

Solaris 7 was of course 64-bit for SPARC, and the “2” was lost, so that was a pretty big change, the 64 bit not the 2. 8 and 9 again were incremental improvements. Not that there were not massive improvements.

10 really has extra functionality, Zones offer a new technology that will allow customers to consolidate and provide new services in new ways. Dtrace is already allowing visibility of the system that offers an enormous opportunity for debugging all sorts of issues including tuning systems and applications in a way that has never been seen before. SMF is a whole new way of managing system services, allowing FMA to handle a large number of hardware problems without system down time. I'm really looking forward to seeing it deployed on customer systems and watching the new things our customers will do with things like zones and dtrace.

About

This is the old blog of Chris Gerhard. It has mostly moved to http://chrisgerhard.wordpress.com

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today