Friday Dec 14, 2007

I'm annoyed at SES2 section

One of the things I'm working on at the moment is a firmware flashing utility. We've got an existing one in Solaris, called fwflash(1m) and one thing that PSARC made very clear is that They don't want a proliferation of firmware flashing utilities inside Solaris. So I'm working on making fwflash(1m) pluggable.

There's a good deal of work required to make this succeed, mostly in the implementation of a plugin interface, and a specific plugin for the area that has a requirement I need to solve.

That requirement pretty much mandates the use of SCSI Enclosure Services-2 (SES2), which is all good and well except when we get to section which deals with the Additional Element Status descriptor protocol-specific information for SAS. I'm particularly annoyed at sections (SAS Expanders) and (SCSI Initiator Port, SCSI Target Port, Enclosure Services Controller Electronics).

The problem is that - as far as I can see, after about a week's worth of serious and detailed investigation - these sections overlap in how they deliver a data payload to you. So figuring whether you've got a SAS expander, or one of a SCSI Initiator Port, SCSI Target Port or Enclosure Services Controller Electronics is actually incredibly difficult.

I could punt and look at the size of the data payload, except that there'll be cases where Expanders vs (the rest) will coincide in terms of payload size. Or I could assume that everything I see there is an Expander - which would be wrong. Or I could do a massive amount of extra engineering in order to approximate what is probably the answer. Or I could use a lookup table to match against the devices which I really want and need to get access to. Right now, the lookup table is winning - a fact about which I am \*not\* happy.

So, what used to be elegant code in my first prototype is now quite ugly. I'm not happy about it, this vagueness in SES2 has kicked my schedule around and has caused sleepless nights while trying to figure out a way forward.

The SCSI family of standards are normally very well defined, very clear, and precise. I'm not impressed with SES2, that's for sure.

Technorati tags:

Thursday Jun 15, 2006

Performancing v1.2 -- works better now with roller

The state of blog editors really annoys me. Roller's editor options don't quite give me what I need and I find the interface annoying. (There's nothing that I can really put my finger on, I just don't like it, sorry!) I've tried a few others (including a not-yet-released addon for StarOffice which manages to trip over 6431036 just about every time.... weird!), and just crafting my own html.

I've got better things to do with my time (like my day job for a start!)

So I figured I'd see whether Performancing for Firefox had improved its support for Roller. (I still think the name is silly).

Turns out that with the current version (v1.2), it has.

After a few config back-n-forths getting the xmlrpc api path correct, I managed to not only compose the previous post (on ::findleaks) but to also publish it, with tags, too.

For BSC:

The blog type is "custom", in the next dialog select "Roller", and add "" as the API path. You need to hit the "use boolean for publish" option.

For JRoller:

The blog type is "custom", in the next dialog select "Roller", and add "" as the API path. You need the "use boolean for publish" option too.

Technorati Tags: ,

On ::findleaks -- wot is it?

A few days ago I was asked by a partner engineer:

When we were looking at a core file we ran ::findleaks and it came back with some buffers from our driver. However, the buffers are from our buffer pool. They are allocated at driver attach and will be freed at driver detach. The core file was from a Leadville stack panic during device discovery at boot, so I expected the driver to be fully present. The ::findleaks is identifying 5 buffers.

We actually have allocated and pooled 512 of these buffers. I am wondering why ::findleaks considers these buffers as "leaked". Could you help me understand what this means?

I'm glad you asked.

To start with, the ::findleaks dcmd (manual page entry) has a ::help entry within mdb:

> ::help findleaks

findleaks - search for potential kernel memory leaks
[ addr ] ::findleaks [-dfv]
   Does a conservative garbage collection of the heap in order to find potentially leaked buffers. Similar leaks are coalesced by stack trace, with the oldest leak picked as representative. The leak table is cached between invocations.
   addr, if provided, should be a function or PC location. Reported leaks will then be limited to those with that function or PC in their stack trace.
   The 'leak' and 'leakbuf' walkers can be used to retrieve coalesced leaks.
    -d    detail each representative leak (long)
    -f    throw away cached state, and do a full run
    -v    report verbose information about the findleaks run

   Target: kvm
   Module: genunix
   Interface Stability: Unstable

You can follow along with the findleaks code via
but be warned --- the code is more than just a tad tricky to follow.

Ok, as to how you really make use of ::findleaks, have a look through this transcript:

$ mdb -k 12
mdb: warning: dump is from SunOS 5.11 onnv-gate:2006-01-12; dcmds and macros may not match kernel implementation
Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp usba s1394 fctl nca lofs random nfs fcip cpc ipc ptm sppp ]
> ::status
debugging crash dump vmcore.12 (64-bit) from doppio
operating system: 5.11 onnv-gate:2006-01-12 (i86pc)
panic message:
BAD TRAP: type=e (#pf Page fault) rp=fffffffffbc58a90 addr=a0 occurred in module "" due to a NULL pointer
dump content: kernel pages only
> ::msgbuf
### snippage ###
kernel memory allocator:
invalid free: buffer not in cache
buffer=ffffffff87a8cda0 bufctl=ffffffff879d70d8 cache: streams_mblk
previous transaction on buffer ffffffff87a8cda0:
thread=ffffffff94795c00 time=T-15569.454683674 slab=ffffffff879b2780 cache: streams_mblk

kernel heap corruption detected

fffffe800000baa0 genunix:kmem_error+4ab ()
fffffe800000baf0 genunix:kmem_slab_free+dd ()
fffffe800000bb50 genunix:kmem_magazine_destroy+127 ()
fffffe800000bb90 genunix:kmem_depot_ws_reap+9d ()
fffffe800000bbc0 genunix:kmem_cache_reap+35 ()
fffffe800000bc60 genunix:taskq_thread+200 ()
fffffe800000bc70 unix:thread_start+8 ()

#### snippage ####

> ::findleaks
16729 1 ffffffff848bf528 AcpiOsTableOverride+0x15f
Total 1 kmem_oversize leak, 16729 bytes

ffffffff82a1d008 55 fffffed4cc4c89e0 allocb+0x65
ffffffff82a1d008 128 ffffffff879ecab0 allocb+0x65
ffffffff80040008 128 ffffffff879e6578 dblk_constructor+0x57
ffffffff80040008 55 ffffffff935f9040 dblk_constructor+0x57
ffffffff8003a748 1 ffffffff93a5de80 devi_attach+0x94
ffffffff8002f748 1 ffffffff89c92678 kobj_alloc+0x88
ffffffff8002e008 2 ffffffff8a188ce8 kobj_alloc+0x88
ffffffff80039008 1 ffffffff8970c4c0 kobj_alloc+0x88
ffffffff84896748 116 ffffffff983da998 rootnex_dma_allochdl+0x5a
ffffffff84896748 128 ffffffff879ec9d8 rootnex_dma_allochdl+0x5a
Total 615 buffers, 997408 bytes
> ffffffff879e6578::bufctl -v
ffffffff879e6578 ffffffff87a8a4e0 b7d45a53f ffffffff82aefe80
ffffffff80040008 ffffffff808dc340 0

So what we do is take the bufctl address for each of the leaked bufs, and run it through the ::bufctl dcmd with the "-v"  option, which allows us to see the stack trace of the thread which allocated that particular buf. From that we can see which function (and where in it) leaked the memory.

In the msgbuf for a panic like the one above you see the address of the suspect bufctl, so you can do this:

> ffffffff879d70d8::bufctl -v
ffffffff879d70d8 ffffffff87a8cda0 6c5b3b526b33 ffffffff94795c00
ffffffff80040008 ffffffff80b770c0 ffffffff820514c8

Note that we have two thread addresses (right hand column)

> ffffffff820514c8::thread
ffffffff820514c8 inval/badd cafe badd cafe -13570 -17699 0 ffffffff91af94e0
> ffffffff94795c00::thread
ffffffff94795c00 run 1002 104 3 59 0 0 n/a

Ok, we know that we're going through streams code, and that the panic stack went through dblk_destructor -- therefore it's reasonable to assume that we should look at the two leaks from dblk_constructor as a first point:

> ffffffff879e6578::bufctl -v
ffffffff879e6578 ffffffff87a8a4e0 b7d45a53f ffffffff82aefe80
ffffffff80040008 ffffffff808dc340 0

> ffffffff935f9040::bufctl -v
ffffffff935f9040 ffffffff934fe7c0 7a840278b8ac fffffe80000b3c80
ffffffff80040008 ffffffff805b7e00 ffffffff8268c6e8

Which for me means logging a call with since SkGe is their code.

You can find out more about ::findleaks by reading Jonathan Adams' blog entry on the implementation, and reading
manpages for umem_debug --- you can use ::findleaks on userland stuff too, which is a real boon.

I hope the above - although rambling - gives you enough to get started with on the magic of ::findleaks.

Technorati Tags: , ,

Sunday May 07, 2006

linux kernel bug count increasing, apparently

I just read an article pointed to from OSNews regarding Andrew Morton's gut feel about the linux kernel bug count.

I'll happily admit to not being too keen on linux. That's because my employer pays me to work on Solaris and I see vast differences between the philosophies of the two OSes, and Solaris is more my personal style.


But look, I don't care whether you run or develop for Solaris or Linux or any of the BSDs or MacOSX or even MS-Windows. What I care about is quality. If you're writing code for an OS, for cryin' out loud make that code as bugfree as possible! Do your users, your customers and your bosses a favour by building in quality from the get-go. That means taking care when you design your product, documenting as you go, making sure that you don't just arbitrarily change the way that widget X works but that if you do make changes you do so for a justifiable reason.

What I want to see in linux is the sort of stability that I see in Solaris. And since I write drivers, that means I want to see stable in-kernel interfaces. I find it incredibly frustrating to read comments in the linux communities that boil down to "no, \*nobody\* needs a stable in-kernel interface in linux, all you need to worry about is kernel->userspace." My business is inside the kernel. If you want me to support linux, then do yourself a favour and make it easier for me to justify doing so.

The Solaris Writing Device Drivers guide (which is the basis for the SunEd course as well) has been used by thousands of developers all over the world for many, many years. It's one of the reasons --- apart from the Binary Compatibility guarantee --- (in my completely unhumble opinion) for the success of Solaris. ISVs and IHVs know that they can depend upon the DDI/DKI interfaces being stable. That means that their development and support costs are lower because they don't need to keep re-certifying their widget for a new release of the kernel, whether that release is from a new version of Solaris (eg 2.6, 7, 8, 9, 10, ....) or from a kernel patch.

What I see in linux is that the kernel is growing organically. Organic growth is generally a good thing to have. However, what I would like to see is the linux kernel having some semblance of design in its interfaces, so that the need to go and change those interfaces is minimised. That makes it easier for ISVs and IHVs to justify supporting linux, and easier to justify supporting OSes other than MS-Windows and MacOSX. It's win-win for everybody.

Finally, I find Greg Kroah's comments

...remember we are talking about GPL released drivers here, if your code doesn't fall under this category, good luck, you are on your own here, you leech

to be quite inflammatory and unnecessary. Not every company on the planet likes GPLv2, and when one is fighting a battle to provide support for non-MS/non-Mac OSen in the first place such comments do not make it easy to win the battle with management.

Wednesday Apr 26, 2006

Coding hilarities

One of the feeds I subscribe to is The DailyWTF. It's often light-weight, but from time to time it comes up with something truly horrendous. Like today's entry, for example. Who would have thought that C could be an interpreted language? (Ok, so apparently this one was in php, but still!)
On reading the comments I saw a reference to something from Damian Conway which had me rolling on the floor: Lingua::Romana::Perligata --- Perl for the XXI-imum Century.
See, geeks can be language nerds in more ways than one!

Wednesday Mar 29, 2006

Been there, done that....

It's been an intense week so far... trying to figure out a 32bit crash (the same code works just fine in 64bit), visiting LinuxWorld Conference+Expo to help out on the Sun stand (more on that later), and getting some urgent, stopper bugs finally diagnosed. So then I catch up with UserFriendly and see today's strip. Been there, done that.... haven't bought the tshirt though because I'd have to wander off to CafePress and use some other braincells which right now I can't afford to re-target!

Thursday Feb 23, 2006

NWS Consolidation is now searchable

I just discovered that OpenGrok has been updated, with a gui amongst other things. Chandan's blog entry has the goods on it. What I found really neat though was that the OpenSolaris Searchable source code (powered by OpenGrok) has been updated to let you search for the NWS Consolidation code as well. I'll have to update my presentation.....

Friday Feb 03, 2006

Netbeans IDE 5 released

With the news that NetBeans IDE 5.0 has been released, I figure it's time to mention just how impressed I have been with it. I kept in touch with the q-builds, RC-builds and betas... and I was pretty impressed with features like the waaaaaaaaaaaaaaay improved CVS support. I was pushing for Teamware support... that's on the list of source-code management tools to rejig the support for, so it's a "not just yet." One of the features that I really enjoy about the Sun Studio workshop environment is the C, C++ and Fortran code recognition. Now that's made up of modules which the compiler team have written and integrated on top of NetBeans IDE 4.1 .... I'm eagerly awaiting an update on those modules so that they work with the 5.0 release. But! the really really big feature for me is the incredible speed that this release runs at. I thought it was pretty hot with jdk1.5, then I pulled down the latest Mustang release (that's Java6 for the uninitiated) and I was simply blown away at how much faster NetBeans IDE 5.0 was. So do yourself a favour and grab the new NetBeans IDE 5.0 release, and if you want that extra jolt of speed, grab Mustang as well.

Monday Oct 31, 2005

keyboard shortcuts in netbeans

For my OOD subject this semester we've had to write a fair chunk of Java code. Now at the start of August I was somewhat rusty on my Java coding skills, not really having written much of substance in the past three years. That soon changed! One thing that really, really helped me was (and is) netBeans. I've grown quite fond of the code completion facility, the "select all these files and create JUnit test classes" facility, and most importantly, the "sorry, this code is broken and here's why" which comes with a whole slew of suggestions on how to fix it. What I've not found until today is any doco about the default keybindings. Since I'm a longtime (emacs family) user, this can be a little dangerous to the code in the buffer. So a quick search resulted in some very obvious links --- and I'm feeling kinda silly that I didn't think to search earlier :-) Here are two hits --- one in PDF and the other in HTML. And of course I do know that I can set my netBeans editor personality to be Emacs too. I've just done that and now I'm gonna have to unlearn a few things :-)

Thursday Sep 29, 2005

We promise to not shoot first

Got up early this morning for my team concall (it was cancelled 6 hours ago, after I'd gone to bed last night... hmmm) and while in the mental bringup process I found an email from Hal Stern (amazingly cool bloke) about OpenDocument. The actual announcement itself is at, but the gist of it is this:
[Sun] gave a non-assertion convenant to [Sun's] patents for the OpenDocument file format standard, which is developed under the aegis of OASIS. A simple paraphrase of the covenant is: "You can infringe on [Sun's] patents in order to implement the X standard/specification, as long as you don't sue [Sun] or anybody else for infringing on your patents while implementing that standard/specification."
As another person expressed, it, that means that Sun won't shoot first. I don't see this as a giveaway of technology or patents or rights. I see it as a belief that implementing this standard by as many [people,products] as possible is too important to humanity to let it be screwed up by legal issues. I also like that it's a complete and public response to Microsoft's fud (see Simon Phipps and Tim Bray for more on this). We're making this clear and simple. Don't get distracted.

Tuesday Sep 27, 2005

Remote X11 displays and unpainted Java windows

Over the past few years there have been a few times that I've alpha- and beta-tested java applications. Now of course I like to try applying tests outside of what I think the authors have thought of, and surprisingly enough, a lot of these applications fail to understand the concept of the remote X11 display. You know, where you use ssh, telnet, rsh or rlogin to connect to machine which does not provide your primary display, and you set the DISPLAY environment variable. I recall one particular application's authors told me that using a remote display was explicitly not supported. This for a multi-user, networked backup system. Of course the (now-EOL'd) Motif version worked without a problem when I ran the gui on the server and displayed it on my workstation. Ahem. Anyway, I finally got a few spare cycles to google this problem. I searched for java x11 display hang and found a link to Elliot Hughes' blog where this issue was discussed. Elliot did a bit of investigation and discovered that if he set
and then re-ran his java application, everything displayed just fine. So I tried that fix, and it works for me too. I'm happy now!

Microsoft now doing nightly builds

While on irc over the weekend (avoiding writing code for uni), a friend pointed me to this WSJ article about Microsoft, testing and software quality. (Here's google's Cache of the article in case you're not a subscriber). I was struck by the implied cowboy attitude that existed within Microsoft. I know that their software is buggy --- as far as I'm aware there is no bug-free software, but surely a company as well-established in the software-engineering game as they are would be able to at least build their software on a nightly basis?
...with 4,000 engineers writing code each day, testing the build became a Sisyphean task. When a bug popped up, trouble-shooters would often have to manually search through thousands of lines of code to find the problem.
Apparently not!
...Microsoft would have to throw out years of computer code in Longhorn and start out with a fresh base. It would set up computers to automatically reject bug-laden code...By late October, Mr. Srivastava's team was beginning to automate the testing that had historically been done by hand. If a feature had too many bugs, software "gates" rejected it from being used in Longhorn. If engineers had too many outstanding bugs they were tossed in "bug jail" and banned from writing new code. The goal, he says, was to get engineers to "do it right the first time."
You've got to be kidding -- where was the quality control before this point? Something I've always been very conscious of here at Sun is that our code review and integration processes are heavy in order to prevent messes such as what Microsoft came up against in Longhorn. And despite the desire to have free and easy commit privileges to our source trees (for me ON and the SAN gates), I really do not want to be in a position where I get blacklisted by the gatekeepers and test staff because my code has to be backed out. That would actually make me unemployable! Finally, there's this quote from Mr Gates himself:
Hours after showing off Windows Vista to software makers this month, Mr. Gates in an interview noted how Microsoft's Office group is now using some of Mr. Srivastava's tools to improve its code. "It's amazing the invention those guys have brought forward," he said. "I wish we'd done it earlier."
You know, Bill, the software industry would probably be in a much better and more innovative shape right now if you'd driven software quality requirements into everybody that Microsoft hired. Welcome to the new, unit-tested and nightly-built world. We've been waiting a long time for this!

Saturday Sep 10, 2005

Leaving PTS

Well my time in PTS is just about up, as is my time in Sun's support organisation. From this Thursday my boss will be a bloke in Burlington and I'll be part of the SAN Engineering group (in our newly named Data Management Group) working on new features in the fibrechannel protocol stack. I'm really looking forward to this change --- I've always wanted to be part of Sun's Solaris development community, and doing storage drivers is a logical extension of what I've spent all my time doing so far at Sun. I'm not going to be moving to the USA --- there's no need because we're a global company which knows how to get teams that span the planet working together effectively. And it certainly helps that my team leader will be down in Melbourne :-) So my three main weapons of development will be broadband, a (mobile) phone, and instant messaging (definitely not the Spanish Inquisition!). I'll mostly be working from home -- so I can maximise my contact with the rest of the team at hours which are mutually acceptable. So, three working days left, no more new escalations for me to handle --- I just have to get close or hand off the ones that I own at present. I can't wait!

Tuesday Aug 23, 2005

I love browsing bookshops

Went in to uni yesterday for my Circuit Analysis lecture, but had a few minutes spare beforehand. Since I made a trip to Borders on the weekend and they were all out of the engineering books I was after, I was needing a good pick-me-up book. I found a book that I first purchased in 1992 when I was studying at ANU: Kingston's Algorithms and Data Structures: Design, Correctness, Analysis. Back in 1992 I think I paid about AUD50 or so for it ... and loaned it to a friend who forgot to return it. They're such a good friend I've forgotten who they are! :-) Anyway, it was definitely a pick-me-up book, especially since I realised a little while ago that for all it's supposed real world credibility, the UTS Engineering degree appears to have almost no instruction on algorithms for those doing Computer Systems Eng or Software Eng. Sure, there's a subject in the IT faculty, but its description is minimal and it's not purely about algorithms. One of my Sydney-based coworkers told me a few months ago that when he did his degree (cue four computer scientists from Yorkshire....) everything was about algorithms. I really wish that there was more emphasis on algorithms (the good, the bad, the ugly, the insane, the unmaintainable....) these days. Well with Kingston's book back in my hot little hands I'll have a good opportunity to make my own path.

Sunday Jul 17, 2005

Exam result is in...

I've stayed up a little later than normal (oooh, midnight.... watch me turn into a pumpkin!) so I could get my exam and subject result. I'm very pleased to say that I've got a high distinction (89/100, known as a "7" in the Queensland universities GPA scheme) for my Embedded C subject @ UTS. Since I was sitting on 40/40 for the assignments if we assume no scaling took place then I guess I lost 11 marks on the exam. It was probably for all the stuff which you kinda depend on the compiler catching for you, so get lazy and fail to memorise. Bummer! Ok, so this coming semester I can look forward to Circuit Analysis and Object-Oriented Design. That's going to be a lot of fun both mathematically and programmatically. ODEs and PDEs again, then there's the java.... that's soooo high-level compared to the level that I normally work at. It's all fun and I'm feeling quite happy now. ciaociao!

I work at Oracle in the Solaris group. The opinions expressed here are entirely my own, and neither Oracle nor any other party necessarily agrees with them.


« April 2014