Tuesday Jun 05, 2007

getting all your heap into 4mbyte pages using mpss.so.1 or ppgsz

using mpss.so.1 or ppgsz to get all your heap into 4Mbyte pages withthe help of a little preload library.[Read More]

Monday Nov 06, 2006

libusb and Veleman k8055 just worked!

I bought myself a USB card from maplin.co.uk - a Velleman k8055 card, 2 analog inputs, 2 analog outputs, 5 digital inputs, 8 digital outputs. I bought it to experiment with libusb.

It took about an hour to solder together and as soon as I plugged it in to my Solaris laptop ( nv_b51) it was autodetected and HID attached to it - odd to have it export the HID class but easy to fix.

I took the usb vid and pid from /var/adm/messages and used update_drv -a to force a binding to the ugen driver. Unplugged and replugged the card and the ugen driver bound to the card.

Then via google I found a k8055 linux libusb application, using the studio 11 compilers ( free from www.sun.com) that built without error. The k8055 work perfectly, I can read the sensors and write values to the ports - taken all the fun out of it...

I think I'll turn it into a kernel driver to give me some practise..

Wednesday Nov 01, 2006

b51 on my laptop

So I have upgraded my laptop ( toshiba satellite A40) from nv build 27 right up to date at b51.

There was a small hick-up with my chipset and some new hardware graphics accelerators but a bit of /etc/driver_aliases fiddling cured that. It seems much faster but that could just be later versions of software.

I like the new firefox and the new music player.

I just installed gnokii - it took a bit of hacking but most of it is working now, just need a bit more work on the ringtone editor.

So the next job is to get the sunray server up to b51 for s10 pre fcs! without losing my wifes home directory..

Wednesday Oct 25, 2006

what is the format specifier to print a stack from a dtrace aggragation.

It is "%k" as in

/usr/sbin/dtrace -i 'BEGIN{ @foo[ cpu, stack(3)] = count();} END{ printa("cpu %d stack %k count %@d\\n", @foo);}'

Thursday Oct 19, 2006

more about setting the stack_size rlimit in a running process

Several folks have asked when should a program set the stack size rlimit.. just before exec() is the only sensible point.

Once your process has started up things have been mapped just below the reserved stack space, the size of which is the value of the stack space resource limit at the time the program assembled its address space ( ie during exec).

lets use pmpa and have a look..

 ulimit -S -s 20000
ulimit -S -s
20000
 sleep 20 & pmap $!
2460:   sleep 20
00010000       8K r-x--  /usr/bin/sleep
00022000       8K rwx--  /usr/bin/sleep
00024000       8K rwx--    [ heap ]
FE700000     864K r-x--  /lib/libc.so.1
FE7E8000      32K rwx--  /lib/libc.so.1
FE7F0000       8K rwx--  /lib/libc.so.1
FE810000       8K r-x--  /platform/sun4u-us3/lib/libc_psr.so.1
FE820000      24K rwx--    [ anon ]
FE830000     184K r-x--  /lib/ld.so.1
FE86E000       8K rwx--  /lib/ld.so.1
FE870000       8K rwx--  /lib/ld.so.1
FFBFE000       8K rw---    [ stack ]
 total      1168K

mdb

> (FFBFE000-FE870000)%0t1024=D
                20024           
> 

so the first shared library ld.so.1 has been mapped below the reserved swap space.

ulimit -S -s 200000
ulimit -S -s
200000
sleep 20 & pmap $!
[1]     2463
2463:   sleep 20
00010000       8K r-x--  /usr/bin/sleep
00022000       8K rwx--  /usr/bin/sleep
00024000       8K rwx--    [ heap ]
F3700000     864K r-x--  /lib/libc.so.1
F37E8000      32K rwx--  /lib/libc.so.1
F37F0000       8K rwx--  /lib/libc.so.1
F3840000       8K r-x--  /platform/sun4u-us3/lib/libc_psr.so.1
F3850000      24K rwx--    [ anon ]
F3860000     184K r-x--  /lib/ld.so.1
F389E000       8K rwx--  /lib/ld.so.1
F38A0000       8K rwx--  /lib/ld.so.1
FFBFE000       8K rw---    [ stack ]
 total      1168K

 mdb

> (FFBFE000-F38A0000)%0t1024=D
                200056          
> 


So if I use setrlimit to change the current stack space setting to a bigger number then all future mappings will be pushed down below that reserved space but existing mappings won't move, and if your stack tries to grow over them you will get a segv signal. So you should only ever increase the stack size rlimit just before a call to exec().

This stack size will only affect the default stack for the main thread in a process, the stack for other threads are sized at thread_create() time either using the default 1MB or a program specified amount.

Tuesday Oct 17, 2006

How to half your 32 bit process's address space accidentally

My customer was complaining that his server process was running out of memory, malloc() was returning NULL. A pmap() of the process showed it was a 32 bit application so limited to a touch less than 4GB of address space. The pmap showed it had only 600MB of space used, a small stack section, lots and lots of shared libraries and a 500Mb heap ( malloc stuff) that was right up to the base of the shared libraries, so had no room to grow.
A bit of careful looking and there was an approximately 2gb hole in the address space starting at about 2gb - how odd!

After getting a truss of the application starting up it became obvious, it was performing a setrlimit( RLIMIT_STACK to RLIM_INFINITY) just before the hole appeared. That call sets the stack size to 2GB ( the stack starts out way up the top of the address space near 4gb on a 32 bit application), the kernel when handing out user address space has to avoid the area reserved for the stack, so all future mmaps are located below 2GB halfing the process's available address space.

Tuesday Sep 05, 2006

getting warm again.

Just as I thought it was cooling down enough for the dogs to start exercising again it goes and warms up again! Oh well I can put off getting the kickbike serviced for another couple of weeks.

What we need on the kickbike is a better rear brake, its got a 10 or 12 inch rear wheel and mountain bike v brakes. I find I have to squeeze the brakes really hard and we are getting through brake blocks at an amazing rate - maybe two samoyeds are too much.




Thursday Aug 31, 2006

Don't change the signs, change the units!

As the 2nd car sped past me at 60 mph a few inches from my elbow I had this thought..
Lets just change the speed limit units and keep all the existing signs, all we would have to do is buy some 180 kph signs from the continent for motorways and the whole country would be safer - 30 kph is about 20 mph, 60 kph is about 40mph, lets do it tomorrow!


speed fast car safety poker

holiday report

Took the kids and dogs to Cornwall on holiday this year. We stayed in the village of  newtown-in-St.Martin near Goonhilly, nice little place but the cars do speed through there especially at night. Lots of small lanes to walk the dogs and children. Local pub with good beer but sadly never did any food whilst we were there.

Things we did..
Took the ferry from Falmouth up to the National Trusts Trelissick Gardens, even got the dogs on the boat..Fal River Links.
Went crabbing on the inner pier at porthleven. I say the inner pier as we read a warning about freak waves on the outer pier and Looe bar. The dogs got a bit bored but the kids got 20 crabs each time.
Went to Mylor key crabbing off the quay at the back of the car park, got loads of large crabs there. The only down side was a mans dog attacked Finn and then got beaten in front of the children upsetting the younger one - he should have bought the dog a lead rather than hit it.
We went to East side of Kennack Sands a few times, slightly ruined by other people's dogs being off their leads roaming the beach causing our pair to bark a lot. Oh yes and they hate kites..
We went to Hayle beach but had to leave when a kite flyer decided to practice on the empty beach right over the dogs.
We went to Chysauster ancient settlement on a wet day.
We went  to a dog allowed beach near Church cove a few times as the sand had some streams that were ideal for damming and the rock pools were interesting, also a good walk allong the headland to the Halzephron Inn and back.
Ate lots of pasties and I drank some beer.
Had lunch in the Blue Anchor in in Helston a few times.
Cycled up the 1in 5 from helford village a few times.


Wednesday Aug 30, 2006

tracking why an application got a SIGFPE divide by zero.

Back from a long holiday a collegue asked me to look at why a small c++ application was dying with SIGFPE on x86 boxes running Solaris 10. They had run dbx and truss and had worked out that it was taking a SIGFPE divide by zero trap on a idivl instruction deep in the flush of a i/o stream. The truss showed the fault as

    Incurred fault #8, FLTIZDIV  %pc = 0x0805065E
      siginfo: SIGFPE FPE_INTDIV addr=0x0805065E
    Received signal #8, SIGFPE [default]
      siginfo: SIGFPE FPE_INTDIV addr=0x0805065E

So that would look like a divide by zero, dbx showed that the instruction was a idivl but the divisor register was not zero !

After a bit of looking at the AMD instruction documants we see that the idiv instruction can generate a "divide error" exception for two reasons - a divide by zero error and an integer overflow. The solaris kernel maps the "divide error" exception onto the FPE_INTDIV trap which truss reports but it could be caused by either cause. In this case we had an integer overlow as the result exceeded the capacity of a signed int. Now the folks who maintain the library that made the stream know to go look at their code.

Monday Jun 05, 2006

never use a tilde character as the first letter in your password.

I thought i was being clever with my algorithm for choosing new passwords, for reasons I can't remember this had "~" as the first character - BIG mistake. "~" is the escape character for lots of things like ssh and our service processor to host protocols.. good job I did not have "~#" or "~." as the first two characters! Something best avoided.

Monday Apr 24, 2006

dma_attr_sgllen can make your i/o look slow

I was asked to look at a slow i/o performance problem using solaris 10 on our fabulously fast
AMD64 boxes. The iostat command was reporting a very slow active service time (asvc_t) when the memory supplying the data to the large(ish) i/o was not allocated from large pages.

Dtrace showed the large page based i/o going out in one chunk with one call to sdintr() at the end of the
i/o before the buf was returned, but the 4k page based i/o was going out in a number of chunks. Each chunk of 128k was terminating in a call to sdintr(), only after the last chunk returned was the buf returned. The important part of the stacktrace  that dtrace or lockstat profiling will show is calls to ddi_dma_nextcookie() as each chunk is initialised.

For the i/o kstat  the service time runs from when the buf is sent to the HBA for transport to the disk and ends when the buf is returned. For the 4k based page i/o each chunk extends the service time by a multiple of the real service time.

So what was causing the i/o to be broken up...  the sd target driver relies on the underlying HBA driver to
do the scsi packet and DMA initialisation via the scsi_init_pkt() call. For this particular HBA the
ddi_dma_attr structure ( man ddi_dma_attr) had 32 in the dma_attr_sgllen field.  This field
describes the number of scatter gather segments that the dma engine built into the HBA card can
deal with per i/o request. If an i/o requires more than 32 scatter gather list elements then it will be broken
up into multiple i/o requests.

The large page buffer is allocated out 2Mb pages of contigeous virtual memory addresses but
more importantly each large page is made from  contigeous physical memory which is used by the dma engines so a buffer allocated from these large pages occupies just one DMA scatter gather list element.

The small page buffer is allocated out of 4K pages of contigeous virtual memory addresses but
each 4k page can be mapped into its virual address  from any physical address, in the worst case each 4k page takes one DMA scatter gather list element. A large i/o can therefore take more than 32 elements and so be chunked into N i/os of 128k making the iostat active service time look N times worse than it is really.

So using large pages can have hidden benefits.  (man ppgsz)

Wednesday Mar 22, 2006

something new and very useful from kernel tnf

Turning on all kernel's TNF probes gathers you a big blob of data about what is going inside the kernel. Prior to Solaris 10 this is the only way to get accurate timing information for system calls. Recently, as in last night, I was trying to work out from one of these blobs why a write into a ufs filesystem might take a long time. I had the pid of the writing process so I could find all its threads, I could see one of them issue the write() system call, then I kept seeing the thread block and almost wake up before blocking again. It did this a number of times, obviously it was competing for a resource like a semaphore or a condvar or a mutex and not getting it. All tnfdump gives you is the address of the resource.

But if you use tnfdump -rx you see a bit more.

Here is what tnfdump gives you..

1995.449700  0.010800 10768     3 0x3002ec894a0   4 thread_block  reason: 0x3002ec89632 stack: 

but here is what tnfdump -rx gives on a solaris 8 system
0x63f8e8  : {
                 tnf_tag 0x22f8     thread_block
             tnf_tag_arg 0x63f840   
              time_delta 0x4fd198
                  reason 0x3002ec89632
                   stack 0x63f900   
        }
0x63f900  : {
                 tnf_tag 0x2358     tnf_symbols
           tnf_self_size 0x38
                       0 0x1011c800
                       1 0x100448cc
                       2 0x1007d5f8
                       3 0x1007d9e4
                       4 0x100adaa4
                       5 0x100adc68
        }

On solaris 9 the stack is compressed two addresses to a line. That looks like a kernel stack trace to me, but the challenge is turn that into a symbolic stack trace. You could on the machine where the prex was run do
echo "0x1011c800/i" | mdb -k
But to be portable I think you should gather from the machine where the prex was run
/usr/ccs/bin/nm -fx /dev/ksyms > file or a live crashdump.

So time for a modified tnfdump or another awk script to glue these things together so we can see why threads might block in the kernel.

so more on that when I have it working but until then if you have to gather kernel prex data, send in the raw output file from tnfxtract, a live dump or the nm of /dev/ksyms so we don't loose any information from the data.

Monday Mar 20, 2006

I was in her lane - obvious really.

This morning I got squeezed by a new grey Saab estate with child onboard approaching the QueensGate roundabout in Farnborough, she moved left into my lane and into my space even though I shouted a warning, she kept going until I had to brake to avoid a crash. Of course she had to stop at the traffic ahead so we had words.

She had to get into my space as she was in the wrong lane ....&\*\*&\^\^%

The on the way home in Ewshot village by the excellent Windmill pub a 4x4 cut the corner across the junction and nearly hit me head on - thanks.

Oh yes and before I forget Hampshire County Council's response to my complaint about the roundabout outside the Sun camps was that it was difficult and they would think about it some more but in the mean time if I wore something bright drivers would notice me better - very patronising, very helpful. Prompted me to send my MP an email.

I used the find your mp page on the house of common's website but it errored most appropriately with "connection reset by peer" , most amusing.

lrand48() is an excellent function for generating testcases.

Spending a lot of time writing test cases to try and reproduce system panics has led me to use an interesting(ish) methodology. You stare at the data structures in the dump, you stare at the code and see if you can work out how to get things into a similar state. Then comes the difficult bit, I've taken to working out what operations are possible from the userland code and then randomising them using lrand48().

So this weeks exercise has been to reproduce a panic in the poll() code that from code inspection is impossible. The per process file table (indexed by file descriptor) is per process but the poll structures are per lwp, so there is a linked list of interested threads attached to a file entry if any threads are polling on that file entry. In the dump there are 3 threads chained off one file entry. So we know that we have a multi threaded process performing poll() on a single file descriptor from several threads at once. We panic'ed in close as we traversed that list as one of the threads has been reused by a process that does no polling so its per thread poll structures were null. So now we know that it has threads exiting and threads closing the file that we are polling on.

So I wrote a threaded program that opened a net connection and then went into a loop, it randomly started a new thread, those threads then possibly polled on that connection, or possibly exited, or possibly closed the connection. The main thread dealt with all of this, re-opening the connection if it got closed, starting new threads as ones exited - all under the choice of lrand48().

Did this reproduce it ? No, so then I randomised the number and contents of the pollfd array passed to poll() and suddenly the machine paniced with an identical stack trace to the customer's machine - the power of lrand48()

The good news is that it is fixed in solaris 10 already...

About

timatworkhomeandinbetween

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder