Sunday Oct 21, 2007

splitting my blog

So I have decided to split my blog, personal stuff goes to http://samoyed.org.uk/blog and Sun related technical stuff goes here, that way all the stuff about samoyeds and cycling will be separate and I might just post technical things more often. Its been maniacally busy since we came back from our summer holidays in Scotland or at least that is my excuse for not posting more often..

Wednesday Nov 01, 2006

b51 on my laptop

So I have upgraded my laptop ( toshiba satellite A40) from nv build 27 right up to date at b51.

There was a small hick-up with my chipset and some new hardware graphics accelerators but a bit of /etc/driver_aliases fiddling cured that. It seems much faster but that could just be later versions of software.

I like the new firefox and the new music player.

I just installed gnokii - it took a bit of hacking but most of it is working now, just need a bit more work on the ringtone editor.

So the next job is to get the sunray server up to b51 for s10 pre fcs! without losing my wifes home directory..

Sunday Apr 24, 2005

too smart

0 miles on the bike - so embarassing.

15 miles in the smart.

It managed 185 miles on 15 litres of unleaded = 56 mpg in the last two weeks.


Tuesday Apr 19, 2005

as expected..

At Home.

Life continues as normal, work, school, dog walking, homework, sleep.

At Work.

As predicted the folks who are complaining that their sunray session is slow have all had spinning applications using up their cpu shares as we use the Fair Share Scheduler on the box. I suspect that it is there to protect everyone else from my testcase programs that sometimes escape, but in these cases it stops them wasting my cpu cycles. I find it slightly disappointing that the quality of userland code is so low when you compare it to the solaris kernel.

In Between.

15 miles in the smart - it was raining and hailing today

0 miles on the bike.. lance retires - maybe
tim

Tags

Monday Apr 18, 2005

better late than never...

At Home.

We had a good family walk on Puttenham common whilst the weather was war. We saw lots of wildlife like swimming grass snakes. I think we tired the dogs - they seemed to slow down a little. Here is where we went - walk

Doctor Who was very good, the sort of programme the BBC should concentrate on, it brought back lots of memories of hiding behind the sofa on a Saturday night.

At Work.

My PCI firmware download problem is finally solved. The pci cards firmware requires no less than 70 usecs between writes, the PCI analyser trace showed two writes very close togther with the first one being delayed. This is a case of a posted write being held in a bridge being flushed by the next write.  Luckily the clever folks who designed PCI thought of this and made it a strict rule that a bridge has to complete posted writes before it can satisfy a read. So the solution is for the the software to read a memory location on the card straight after a write and before starting the 70 usec timer.  I wonder how many folks realise the asynchronous nature of pci writes with data held in the bridges and the host unaware.

A collegue has been trying to get the posix aio routes working, this is aio_read() and friends. They seem so much more difficult to use than the native Solaris aio routines like aioread(). I keep meaning to explore the new event framework that arrived quietly in solaris 10 , man port_create()/port_get().

In Between.

15 miles smart - its very wet here in the UK.

15 miles on the bike, it was very wet!


tim

Tuesday Apr 12, 2005

two / filesystems on one machine!

At Home.

Nice walk in Farnham Park this evening, whole family seemed to enjoy it. We disturbed some of the deer luckily the samoyeds didn't spot them.

At Work.

Customer installed QFS , rebooted and then QFS wouldn't start, complained it could not get a major number for qfsioc, remove and reload the package and it would work until the next reboot. Much debugging had gone on to no avail. The answer came from the explorer run that had been requested. The systems OBP boot device pointed towards the /devices path for c0t0d0s0, but the system had a triple mirrored root device using disksuite. The explorer showed that the mirror that occupied c0t0s0s0 was in maintenance state. So the booting machine accesses the physical c0t0d0s0 device and reads the bootblock , /etc/system and several other kernel files and then loads the required drivers and mounts the pseudo sds root device . Suddenly any updates will go to the surviving two mirrors  and not to the c0t0d0s0 copy. So the QFS filesystem drivers had been applied to the pseudo sds root filesystem but not to the frozen c0t0d0s0 filesystem. Hopefully all should be well when the needs maintenance state is resolved.

Making progress on the syseventd escalation, two days reading the code has allowed me to propose a simple experiment to see if the problem is a simple race between syseventd starting devfsadmd and devfsadmd getting to the point where it can subscribe to get events. Lets see what is in the output of the ..

truss -d -D -fl -o /var/tmp/out /usr/lib/sysevent/syseventd

In Between.

15 miles in the smart - legs ached.

0 miles on the bike.


tim

Monday Apr 11, 2005

no excuses today.


At Home.

School starts again, yesterday we went for a nice 5 mile walk around the basingstoke canal at Dogmersfield, really nice day, dogs behaved, amazing scenery - ancient houses in traditional maintained english land
 the walk.


At Work.

I had a realtime discussion today, one I have had many times. Running a process in the Realtime scheduling class does not make it a realtime process! You need to follow the realtime rules  to ensure you application doesn't do any thing that might not have a bounded response time. Disk drives can take seconds to do recovery of a bad block, bad for your pseudo realtime application if it is paging in from that block..... There are cluster implementations that have had problems over the years from not following the rules, obviously as the sun cluster sustainers sit near me Sun Cluster does not have that failing.

Syseventd is doing my head in!  how does the devfsadm_mod channel subscribe to EC_devfs events... more debug code required. Like many technical subjects that you think you understand the more you look the more you realise you don't understand it at all..


In Between.

0 miles in the smart.

15 miles on the bike - 180 beats/minute dance music seems to up the speed a bit, but my legs do ache now, any recommendations for good music to listen to?

Friday Apr 08, 2005

its friday, its 19:28 , its home time!

At Home.

Helped one child with a model volcano, its got a chamber full of sodium bicarbonate and a squeezy bottle of vinagar built in - should make his science project go with a fizz. Had my curry a day early as I was out and about shopping last night. I tend to go to the Gulshan, in Farnham as the food is very good, just spiced how I like it. I usually take one of the children as it is a good time to sit and talk whilst we wait for it to cook.

Our Ford Galaxy car insurance is due and the current insurer wanted about 700 pounds for a year, so I used the internet and got several quotes for half that amount, the cover and excesses seemed similar so I picked one, so it pays to shop around, beat the lethargy.

At Work.

One lingering escalation I have involves clustered machines intermitently producing "devfsadmd not responding" messages in /var/adm/messages. Syseventd is a really useful but little known collection of code. It handles events from the kernel and farms them out to modules that have registered interest. It is written in a way using theads/doors and event queues to avoid a client module being able to hang the man thread. There are four modules that are loaded from /usr/lib/sysevent/modules.

  • picl_slm.so - to handle sending events to picld, This registers to get passed EC_DEVFS/EC_DR events so it can respond to changes in /devices land.

  • sysevent_conf_mod.so - This is the user extensible part. The module looks in /etc/sysevent/config for  files that end ",sysevent.conf". It expects them to have a format that describes a program to invoke when incoming events from the kernel match some tokens. The README in that directory has some hints.  Several software manufactures use this to allow their software to be poked when something changes in /devices land so they can see what changed. Solaris 10 introduces syseventadm to mange these config files and the man page describes the tokens.

  • sysevent_reg_mod.so - this module supplies the infrastructure for other programs to register themselves as event consumers using libsysevent, posting events goes via the kernel. This is my current suspect..

  • devfsadmd_mod.so - this module gets events from the syseventd core but only wants EC_DEVFS type events. Each event gets sent through a door to devfsadmd, if there are problems then it forks and execs devfsadmd. So that is how devfsadmd gets started!

In this escalation there is a problem contacting devfsadmd so we start a new one  only to discover that the old one is still there. The code loops a couple of times trying this before producing.. "/devices or /dev may not be current (devfsadmd not responding) Run devfsadm". What I need is a testcase that leads to the devfsadmd_mod deciding that no one has subscribed to EC_DEVFS events so not sending the event. Any suggestions gratefully accepted..

In Between.

15 miles in the smart, it was snowing and sleeting at 2 degrees centigrade this morning!
0 miles on the bike, no excuse unlike chrisg.
tim

Wednesday Apr 06, 2005

time for bed.

At Home.

We have visitors as it is the Easter school break, I hope the recycling bin is big enough for all the wine bottles.
I discovered that my father did his Sapper basic training in the army camp that was flattened to make our campus
by Hawley Lake - small world.

I worked out how to rip (as i think the young folks say these days) from a cd using cdrw and then convert that wav file into mp3 using lame from http://www.mp3dev.org/ then just mount the mp3 player and cp the mp3's onto it - fab.

At Work.

Gosh it is busy, I much prefer that to quiet periods. Apart from the normal boring escalations I have couple of interesting ones..

  • An empty directory that rmdir can't remove, truss shows rmdir() failing with EEXISTS as theerrno , just as though the directory wasn't empty, ls -ali shows that the inodes are unique (so no hard links) but the link count on this empty directory is > 2. So how is this directory manipulated? sftp-server! so that is a file transfer protocol over ssh with a ftp like command line. As it was being run as root the rm implementation in sftp-server just applies unlink() to whatever it is told too, its rmdir implementation uses rmdir() to remove directories. To protect against erring clients/users I logged a bug against the sftp-server Sun ships in solaris 10 so that it will check if the target of an rm command is a directory and fail if it is.
  • A machine with pcisch host/pci bridge was showing one cpu taking 200k interrupts/sec. mpstat shows the interrupts in the "intr" column  which counts all interrupts and not in the "ithr" column where interrupts at a pil < 10 are recorded. lockstat profiling shows nothing on that cpu apart from some interesting latency issues with the level 14 profiling interrupt. But other cpus seem to have a lot of stacks in  pci_cdma_sync(). The new dma sync mechanism bites back, scarily in modern computers dma's can finish before the data really arrives so device driver writers must use ddi_dma_sync() to synchronise things. The new hi-performance mechansim relies on the host bridge having a rule that all outstanding dma must be completed before an interrupt can be asserted, so the ddi_dma_sync() call makes the pcisch chip send a level 14 interrupt for every ddi_dma_sync() call. The interrupt rate can be very high but the handler is very quick so causes little load but many tools that watch systems find the interrupt numbers disconcerting and trigger alarms...
  • truss with the -d and -D options record the times both absolute and deltas when events happen, these events are usually the return from syscalls so the delta time reported includes userland processing before the current syscall as well as the time in the syscall. One of the little published but very important changes in solaris 10 is to introduce the -E flag which times the syscall from start to finish so truss -d -D -E -fl will be very useful, as it will allow us to differentiate user and system time .

In Between.

15 miles in the smart as it was raining heavily
15 miles on the bike, kosheen on the mp3 ,but it was very cold - numb feet both ways.

tim

Monday Apr 04, 2005

sorry for the delay.

At Home.

No more death threats, but I had a thought that it might be as the mutts start on their daily walk they bark for the first hundred yards - maybe we have a night shift worker nearby.  Visited my parents, very relaxing in their rural back garden, shame its a nine hour round trip.

At Work.

Closed lots of escalations that timed out whilst I was on vacation, I always find it slightly disappointing when escalations sort of petter out rather than getting a technical solution. Our sunray server runs us under the Fair Share Scheduler ( man FSS) which is rather good as I spend a lot of time writing testcases to exhibit customer problems. One program I call the "thingy scheduler" it has multiple processes that share a MAP_SHARED segment, in that segment are a load of stacks and contexts, so using setcontext/getcontext and alarm() the contexts get timesliced between the various processes without a hint of libthread. It is all run from a config file and I mistakenly started it with 200 processes and ... no one complained. Instead of swamping the machine I had 29% of the cpu resources.

This morning was taken debating pci bus standards with a third party vendor. The customer has done an excellent job of managing all the vendors and gathering data.  I don't think it is our problem as the defect is observed deep down on a pci bus in a third party pci expansion unit with third party cards and drivers but it is interesting helping them move to a quick solution as well as streching my meagre knowledge of pci standards.

In Between.

Lots of miles in the smart car. ABS worked well when someone failed to stop at a T junction.

Many miles in the Bus, 42 mpg with 6 of us and luggage travelling in comfort.

26 miles on the bike, now were are in GMT I have ditched all the lights but the LED front and back lights.

We are going camping this summer, so with the dogs we need a trailer to hold all the stuff, I need to choose a big suitable trailer -will it be a Ifor Williams one?


tim

Tuesday Mar 22, 2005

usb tuesday

At Home.

Nothing much to report, my eldest asked for a chemical brothers CD for his birthday - pretty cool Huh!

At Work.

Spent half the day trying to decode what a customer's problem really was. We had several iterations of decoding techno babble until a clear problem statement emerged. With that we have a plan to work to a solution so lets wait and see what happens.

Connected the webcam to a sunblade 2000 in our lab, pulled down the usbskel driver from http://docs.sun.com/app/docs/doc/816-4854/6mb1o3ah2?q=usbskel&a=view the only thing missing is a Makefile. Here is what I used..

all: usbskel

usbskel: usbskel.c usbskel.h
        cc -D_KERNEL -xarch=v9 -c usbskel.c
        ld -dy -r -o usbskel usbskel.o -N misc/usba

Not the most elegant but enough to get a driver to put in /kernel/drv/sparcv9 together with this in /kernel/drv/usbskel.conf

usbskel_dumptree=1;
usbskel_parse_level=1;

and when I add_drv the usbskel driver to own interface 0 of the device thus..

add_drv -m '\* 0666 bin bin' -i  'usbif46d,8f0.100.config1.0' usbskel

I get this in /var/adm/messages...

 usbskel0: parse_level set to dump specific interface
 Port2:      USB descriptor tree for  Camera
 Port2:      highest configuration found=0
 Port2:      Configuration #0 (Addr= 0x300027f11b8)
 Port2:      String descr=<null string>
 Port2:      config descr: len=9 tp=2 totLen=173 numIf=3 cfgVal=1 att=0xa0 pwr=50
 Port2:      usb_cfg_data_t shows max if=2 and 0 cv descr(s).
 Port2:               interface #0 (0x300025068b0)
 Port2:              Alt #0 (0x300027ff0b0)
 Port2:              String descr=<none>
 Port2:              if descr: len=9 type=4 if=0 alt=0 n_ept=2 cls=255 sub=255 proto=255
 Port2:              usb_alt_if_data_t shows max ep=1 and 0 cv descr(s).
 Port2:                  endpoint[0], epaddr=0x81 (0x30003f02918)
 Port2:                  len=7 type=5 attr=0x1 pktsize=0 interval=1
 Port2:                  endpoint[1], epaddr=0x82 (0x30003f02930)
 Port2:                  len=7 type=5 attr=0x3 pktsize=1 interval=16
 Port2:              Alt #1 (0x300027ff0e8)
 Port2:              String descr=<none>
 Port2:              if descr: len=9 type=4 if=0 alt=1 n_ept=2 cls=255 sub=255 proto=255
 Port2:              usb_alt_if_data_t shows max ep=1 and 0 cv descr(s).
 Port2:                  endpoint[0], epaddr=0x81 (0x3000250b0b0)
 Port2:                  len=7 type=5 attr=0x1 pktsize=1023 interval=1
 Port2:                  endpoint[1], epaddr=0x82 (0x3000250b0c8)
 Port2:                  len=7 type=5 attr=0x3 pktsize=1 interval=16
 Port2:               interface #1 (0x300025068c0)
 Port2:               interface #2 (0x300025068d0)
 Port2:      Returning dev_curr_cfg:0x300027f11b8, dev_curr_if:0

I just need to work out what that means...

In Between.

15 miles in the bus ( ford galaxy - one of the best buys I have ever made).

0 miles on the bike, but I will ride it tomorrow as the bus gets serviced.

200 pounds on a set of front tyres for the bus, mind you they have done 35000 miles.

tim

Monday Mar 21, 2005

busy busy Monday.

At Home.

A busy weekend as ever, ferrying children, tidying up, gardening. Saturday was a nice day 22 degrees so I walked the mutts early before it got warm, we went though the lanes by the deer park then back though Farnham park, really nice. I got the PIR activated camera all linked up to the video ,kids enjoyed watching themselves.  No news on our dog lover, although I have eliminated all but 3 houses nearby. The camera is on wall with a 10 metre cable to a little control box, then there is a scart cable from the control box to the video's in port, and an infrared tx/rx that you position in front of the video unit's IR receiver. Then using the simple menu you teach it video commands like on/record/stop/standby and thats it, really easy to use. My mp3 sticks' usb interface stopped working so that is useless now - I've been listenning to Yes and Placebo on the ride to work, guess which one in which direction?

At Work.

Lots of escalation work, nothing memorable, just more cases of picking the right debugging tool for the problem. I wanted to see if I could use the fabulous libusb that is there in solaris 10 to access my Logiteck webcam, so I read the libusb guide /usr/sfw/share/doc/libusb/libusb.txt and wrote some code that just walk the buses /devices /interfaces /configurations printing all the data structure elements as it went. Running that with the webcam inserted allowed me to see that it had 3 usb interfaces, one is the usb video chip, the second is the mute button and the third is a raw audio stream.
On a sunray you plug it in and the system creates your own /dev structures under $UTDEVROOT, but on a non sunray device you have to use add_drv to bind the usb vendor id/product id to the ugen driver. The flaw in all of this is that all 3 usb interfaces on the web cam claim to be ISOCHRONOUS which is the one interface type that ugen/libusb don't support. Time to write a driver then.

I have a theory about troubleshooting that I describe as the "meta information" contained in a seemingly useless error message. There is the obvious information contained in the message which if it is from a glob of software is probably of little use unless it was from my code ;-) but there is much more information to be gained, the fact that the code emitted the message means that you can identify a point in the code and probably a series of conditionals that lead to that point, and if the error message has printed out values then there is much  value in understanding what these are, how they are gathered and used. The theory of course would be of no use if we had a messaging standard. When I write code I try produce meaningful messages that tell you what component is producing the message, why you are getting the message, the severity of the message and more importantly what the reader can do about the message.

In Between.

30 miles in the smart, another tank of petrol, still only 175 miles on 15 litres of petrol.

15 miles on the bike, Friday last week was so nice I went home at lunchtime and went back to work on the bike. Now its warm I ditched the waterproofs and crud catcher style mudguards, roll on the time change in a week or so as it will become lighter in the mornig and hour later ( 6am) and it will be lighter for an hour longer ( 7 pm) so no lights needed!

Why a smart car is dangerous! Never ride anything with a brain is one principle I live my life by, but it really should include don't drive anything with a computer near the drivetrain.  The smart car in auto gearbox mode does a good job of changing up and down its 6 speed box, with just one fault. If you are slightly too slow for a gear it will change down as you apply throttle. So as you roll up to a roundabout, you look right - its clear so you cross the white line onto the roundabout. Just at that moment a fastard flies onto the roundabout ( forgetting the bit in the highway code about giving way to people already on it) so in fear you apply throttle and oh NO! it changes down and that in smart car is a slow operation... You have to drive it like a motorbike, change down early and go through the manouver at higher than expected revs.  Twice now it has scared me by changing down slowly just at a bad moment.


tim

Thursday Mar 17, 2005

death to the dogs.

At Home

Today we had our first ever death threat! Six months ago we got an anonymous letter with a stencilled address through the post, in it was a post card asking us to stop our dog barking - but we do that already! We have a two barks and your inside rule and they are never left outside.  We checked with our neighbours and they don't have a problem.  So this morning the post arrives with another letter and this one contained ... a black spot. So we have a educated enemy, I had to check on google and it says ....

Black Spot - Tipping the black spot was a way pirates gave a death threat.

A bit more serious so time to call the police. What would be more useful would be dates and times when they barked, maybe the dogs have worked out how to open the door and bark outside when they are left for twenty minutes whilst my wife collects the kids. If I don't know when they bark I can't do anything about it so all that subtefuge is wasted! Mind you I did go buy a PIR camera for the back garden connected to a video recorder just in case the death threat is for the dogs... An amazingly good picture given the simple technology.

At Work

Some one asked today what the waitq column in iostat means for an nfs mountpoint, When the filesystem generates asynchronous requests (like read ahead) to the server they go onto a queue where a pool of threads takes them off and services them. If there are more requests than threads you have a positive waitq and delayed response, the number of threads ( and effectively the throttle for that mount point) can be tuned in /etc/system with ...

set nfs:nfs3_max_threads=25
set nfs:nfs4_max_threads=25

In Between

30 miles in the smart car, had to take eldest to dentist yesterday to get his front teeth repaired after a bike off and today it is dog sitting Thursday and I did not feel much like cycling after that stupid letter.

0 miles on the bike.


tim

Tuesday Mar 15, 2005

nearly a smear!

At Home

Oh no! Clea has just started shedding. Samoyeds don't moult much, unlike other dog breeds but to make up for that once or twice a year they shed all their inner coat. This is a very fine short thermal layer and it gets everywhere for a few days. The outer coat is like teflon, designed to insulate, if they get dirty you let it dry, give them a brush and they should be back to being white.  So now I have two carrier bags of fine dog fur, one more bag and I will have enough for some exceptionally warm gloves, Finn should lose his puppy coat in a few weeks.

Made no progress on the bathroom although my wife and I have worked out what we want. Just got to find it somewhere. Trying to persuade my eldest that revision is a necessary evil, not sure I succeeded.

At Work

Spent the day trying to understand a customer's problem, eventually at 6pm this evening I worked it out. They were using the fact that "sar 15" sometimes has 16 seconds between the timestamps as indicative of a system slowdown or a scheduling problem. A quick read of the source code shows that sar when run live and not from a data file just forks and exec's sadc and keeps a pipe in between. sadc does all the kstat gathering and timing thus..

while (1) {
    read the kstats
    get the current time.
    pass the time and kstats down the pipe with a series of reads and writes to sar.
    sleep(15)
}

So if kstat gathering and the data passing take 0.2 of a second then the whole loop takes 15.2 seconds, but the time is printed out with a granularity of 1 seconds so after 5 times round the loop it will look like sar was delayed by an extra second and the reported time will be 16 seconds not 15. Whilst not the most important thing it has mislead the customer and I . So now I have written a test program that brackets a call to sleep(1) with gethrtime() calls, subtracting these allows us to see exactly the nanoseconds spent in the sleep() call. Lets see if that shows any irregularities in the kernel realtime timer subsystem, to minimise sheduling issues it should be bound to a non interrupting cpu and run in the RT scheduling class.

In Between

0 miles in the smart.

15 miles on the bike, damp this morning, It has really warmed up here, from snow last week to ten degrees this week. I was too warm even in a short sleeve jersey, time for shorts - look out world. One problem with this time of year is that I have a lead acid battery in the water bottle holder for the lights but I could do with some water. You spend a lot on an ultra light bike and then you load it up with lights until it weights as much as your mother's old sit-up-and-beg bike..Roll on the long summer days.

Today title is "nearly a smear", as you approach the Sun campus at junction 4a of the m3 there is a roundabout, straight on into the campus, left to Reading, right onto the M3 north bound. So it is rush hour this morning every one is intent on getting to work and damn the road traffic act, I am coming down the hill to the roundabout, it is clear so lets accelerate. 40 yards to go and a car seems to be trying to overtake, or maybe they are turning right. Strange the car has its left indicator on but it is on my right and I am going straight on.... So the driver of the red Vauxhall Astra does turn left across me, Lots of Anglo Saxon explitives from me and surprisingly they give me that lovely gesture to tell me I am Lance (number one in the world) Armstrong!

What were they thinking? I was thinking they need to redo their driving test before they kill someone!

And just to avoid being sexist the journey home was nearly ruined by a BMW driver passing me and then drifting toward the kerb whilst alongside... whilst he read a map on his lap...


tim

Monday Mar 14, 2005

Tired Monday...

At Home

Tired dogs, tired people after a busy weekend.  I spent Saturday talking my daughter to all her activities, ballet, horse riding. Horse riding is such a technical hobby, how she manages to remember all the details and not fall off (too often) is amazing.

At Work

I am always astounded at the varying quality of "escalations" I get to interfere in. They vary from works of art where the problem is presented in fine detail with reasoned debate and supporting material, to throw away statements like "it goes slow" - what is it? How is it measured? Does it ever go fast? When did it start to go slow - what changed at that date?

Over the last XX years I keep coming across the Kepner Tregoe rational process Kepner Tregoe . Our part of Sun has adopted this methodology as an excellent way of documenting problems as well as speeding up finding the solution.  I always think of this as procedural common sense, the format is logical and the answers steer you either to relevant questions or towards the answer, and as a by product irrelevant information can be tested and discarded.

The number of times you get information like "the machine started to panic on the 15th november and has panic'ed since" and you just have to ask "what happened on the night of the 14th November?", ah we applied a patch! Or you ask the question " do you have a similar machine with a  similar workload that does not have the fault?" "yes the other twelve work perfectly", and you have to ask " so what is the difference between the working twelve and the ailing one?".

If there is one way to get your call into Sun dealt with quickly then consider this methodology as an excellent way of interaction with our service organisation. There must be other similar methodologies so my apologies to their supporters.

I stared at a crash dump today that hits in Sunsolve with a few reports each year from the customer base, they all report one off panics on a variety of h/ware, all alleged to be fixed by a hardware intervention. This brings up the subject of how long do you need to wait after a fix has been attempted before you declare success?  With my meagre grip of statistics the time is reduced the greater the number of failures, something to do with standard deviation? So for these customers having one failure after 3 years we probably have to wait about 10 or 12 years before we say that the problem is fixed with a h/ware change. A mentor of mine had an interesting theory that it might be better to take a few more hits so as to reduce the standard deviation of the fault interval, This would then reduce the post fix testing time dramatically allowing success to be declared much earlier. I suspect a software race condition myself...

In Between

15 miles in the smart car, todays excuse - too tired.

0 miles on the bike, I'll try really hard tomorrow.


tim

About

timatworkhomeandinbetween

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder