Swapping drives between Solaris machines
By PotstickerGuru on Dec 06, 2006
In the constant search for more low-power, home system deals, another processor I've had good success with is the AMD Geode NX. The Geode is a line of x86-compatible 32-bit processors AMD offers for various embedded applications, usually in extremely low-power, space-constrained applications. I'm not sure what the roots are of the Geode line, but at the higher end of the spectrum is the Geode NX and from what folks are saying on the web, it looks like these have roots from the Athlon XP line and might actually be the cream-of-the-crop CPUs that are down-clocked and can run stably at 1.2V (rather than the usual 1.6V) so as to reduce power. I've only seen Geode NX 1750 1.4 GHz cpus sold on combos with motherboards retail but AMD advertises multiple speed versions of the Geode NX, probably for volume OEMs.
About two months ago, there was a sale at Newegg.com on a particular Geode NX/Motherboard combo. The price on the combo was $69.99. This deal was for a PCChips M863G(v7.0) micro-ATX socket-A board with SiS 741GX/964L chipset and integrated graphics, AC'97 audio and SiS900 fast ethernet. The combo included a factory mounted, Geode NX 1750 idling which supposedly idles at 14 Watts going up to 20 Watts. The combo included a generous heat sink and super loud 70x15mm fan that ate 7.3Watts by itself! (Fan and H/S replaced with much quieter CoolerMaster unit running at 7V and 2200 rpm). Rev 7.0 of this board does support a 1.2V core cpu voltage setting that lets the Geode NX achieve its low power status. Newegg seems to be sold out of this combo, but they get more once in a while. (Picture courtesy of Newegg.com).
Fig. 1. PCChips M863G board with bundled AMD Geode NX 1750 CPU from Newegg.COM
Directron.com is also sold-out on a similar combo with a Biostar M7VIG 400 board with all-in-one VIA KM266 Pro chipset. The Biostar board should have been the better board and I actually bought this board first, paying a few bucks more because I knew the chipset better and thought it was well supported. But after buying it, I wasn't fully happy because the board revision and BIOS didn't support the Geode NX's low-voltage mode. So while the cpu could run cooler and with the same clock rate, 1.4V was the lowest voltage it could drop to, so basically, I was sort of running the equivalent of an Athlon XP at 30 - 35 Watts, when the cpu should be able to operate at 14 - 20 Watts. I also had some issues with Xorg on the Unichrome graphics which I was able to work around, but it wasn't a clean install. The Xorg workaround will be explained a little further down.
So with the Biostar-VIA board not being as low-power as I thought, and Newegg having the similar combo with SiS chipset, I didn't hesistate; I bought one. As luck would have it, not 10 days after I received the first board, Newegg had the $10 rebate offer that would save more on the same item, so I had to take advantage of that deal too and get a second combo to score on the rebate. I did this pretty much not having checked if the SiS graphics, network and audio would work - the deal was just too good to pass up.
Sad days; Retirement of two old friends coming soon.
The goal of buying those Geode NX combos was to begin the process of retiring a pair of 8 year old boxes that have served me well over the years. These were proprietary BookPCs I bought back in 1998 with a tiny Super Socket 7 mobo, special 95 Watt power supply, and were short depth. These had the VIA Apollo MVP3 chipset with ECC SDRAM support and I maxed each one out with 512 MB of branded ECC memory and AMD K6-2 450 MHz. These were tough boxes and the weight alone on these small but heavy units was definitely old school manufacturing. They had Davicom 9102 NICs and I only put a disk drive into each, removing optical drive and FDD after the installation for reduced cabling and improved airflow. I've upgraded these boxes over the years. First 6.4GB IDE drives in both, then 40 GB Ultra quiet drives. Today, one runs Linux kernel 2.4 still and the other runs Solaris Nevada. Both have run so long that the power supply and CPU fans have clogged up from dust and seized up and the power supplies (which are very hard to find) have blown. I spent a couple of days back in 2001 relearning basic circuit analysis again and researching power supplies. After a couple of trips to Halted Supply Co. (HSC) near Lawrence Expressway and Central in Santa Clara, I finally got a bag of high frequency switching capacitors in various sizes at about $0.35/each. So for about $1.50 per power supply plus $7 for a new fan, I had some quiet and good-as-new power supplies and was back up. Since 2001, I take the servers down at least every 100 days or more often during Spring and Summer to dust off the systems, replace fans that are noisy, etc. I still have a spare set of capacitors out there for one more repair.
It's amazing how much dust can accumulate inside a server box when you run it at home. With humans shedding skin flakes, pollen and dust from outside in the garden, those servers were actually like air filters for the house. After cleaning, I was shocked at how warm the side of the case was; and then it hit me that the dust build up had been preventing heat from radiating out the sides of the case too, and that sent more heat out the back through the power supply. So almost every 2 to 3 months, the servers get taken down on a Saturday evening, at 1 am or so, and the cases are cracked open and the units go outside to the patio where I try to stay barefoot and grounded and then blow on the boards with some type of compressed air. Some stubborn soot gets caught between pins or in nooks and crannies like on the cpu heat sink and I use a soft tooth brush with Swiffer dust cloths to clean and wipe. The units go back inside and get tested to see if cpu, case and P/S fans are noisy or wobbling after service. I stock 50mm, 60mm and 80mm fans in 10mm and 15mm thickness (and other size fans too) for this purpose. They get swapped if noisy and I usually switch pin order to bias the fans at 7V versus 12V so as to reduce noise. Not all fans and motherboards support this, so you need to make sure the ones you get aren't smoking or failing to start doing this. Then the systems get closed up and put back into service.
It'll be sad retiring these old friends, since they handle email, web, Java servlets and JSPs, firewalling and database for close to 10 domains and do it without any real performance issues so far. But each consumes about 44 Watts in power when I have other servers that are 1.5 times faster at 19 Watts, which could tremendously improve battery life on the two 1500VA UPS units I own and allow me to consolidate all the switches and routers onto a shared UPS rather than adding separate smaller UPS units for those. With newer, even faster chips using about the same or less power, I may even be able to use just one box for all services and consolidate completely and really saving money, power, the environment, and all that good stuff.
Installation Dilemma - Slim Drive or No Drive?
Ever since Sun produced the Netra X1 line of products, I've loved the low 1U, shallow 13 inch depth case. Those boxes sort of epitomize the whole idea what a small, cool looking, but industrial server should look like. I wished someone made an affordable case in a similar form factor that was a bit quieter, and equally attractive. About the closest thing I can find today is a SuperMicro SC513 or SC512 1U chassis. But at close to $180 for the case, it's not cheap. And to add cost, it requires expensive slim optical and floppy drives. It's certainly too rich for my tastes and I've never investigated the acoustics. Those old Netra's though were pretty quiet, if I recall.
A compromise of sorts is to go with a BookPC form factor chassis. The first gen of these cases were like the ones I described above; they came with small proprietary boards and power supplies and actually were the size of a large telephone book. The whole barebones kits could be gotten for under $100 + shipping. A good feature was they did take standard-sized optical and floppy drives, usually over the motherboard, but it got cramped inside pretty fast. Back in the late 1990's, not all DIMMs were low-profile. Some were 1.4 inches tall and too tall to allow sufficient clearance between drive bracket above and cabling that went over the DIMMs situated underneath or partially below the drive bays. Subsequent revisions of BookPC cases have gotten longer and deeper so the board is entirely clear of the drive bays. Prices have also gotten longer and deeper and they start around $60 and go up to $100 or $200 for some sleek all-aluminum cases. (If I'm gonna pay $200 for an all-aluminum case, it better protect the board from EMP from the next nuclear detonation in my neighbourhood! LOL!)
And the peeve I've had with cheaper BookPC cases are they're really noisy due to all the cheap small fans because the makers know that there will be folks out there who think they can save a few bucks and get a small case for that Quad Core/Quad GPU gaming system. So, yes, that's why they have a bunch of fans and they're noisy. And that's why I focus on finding low-power processors and motherboards to reduce volumetric heat generation on these small boxes. (Note: Before video games, kid brothers would watch their 2nd grade sisters use a Hasbro Easy Bake Oven with the 100 Watt light bulb heat source - it bakes -real- cookies. Understanding the heat generated by a little box might be something to bring back into 2nd grade education so big boys don't grow up and try to shove a 200Watt heat source into a small case. Note2: We could educate the manufacturers too... only, most aren't socialized in America with Easy Bake Ovens - they're just putting in noisy fans to cover themselves and their distributors against too many RMAs on melted chasses).
For all my complaining, the compact BookPC size and shape does appeal to my sense of aesthetics, and with some re-wiring and soldering skills, I can usually lower the voltage on case and power supply fans to make them slower and quieter, yet still sufficent to cool a lower-power system. And that's what I did to a couple Enlight 7396AM1 low-pro cases. These have a high quality, sound insulated chassis, front USB, comes with a fairly quiet power supply and Directron.com has them on clearance for $19.99 with $13.99 shipping, which gets cheaper if you buy more than one case. (See figure below - courtesy of Directron.COM):
Fig. 2. Enlight 7396AM1 micro-ATX case for $19.99 + S/H at Directron.COM
I bought two of these last month, and then realized that the drive bays take a normal floppy, but -slim- CD/DVD drive. I was sort of in a dilemma. I didn't want to shell out $75 each for 2 slim DVD burners when I had perfectly good regular NEC 3550As in stock. It would sort of defeat the whole idea of buying these cases for $19.99 + shipping. Then I asked myself the question if I really needed an optical drive or even floppy on the box, since the reason for these systems were to replace those two 8-year old AMD K6-2 450 MHz bookpc servers. Neither have any optical or floppy drives in them; they were removed after OS installation, leaving just the hard drive.
Chicken or Egg Solaris Install? How about transplants?
So I don't have slim optical drives to do the Solaris install using the small Enlight cases. In addition, the PCChips M863G with SiS chipset boards support Novell IPX Netware Boot but not PXE, so a DHCP boot of the system isn't going to be easy to implement. And even if the board supported PXE boot, the default Solaris install doesn't have the SiS900 network driver. I guess if we had PXE, I could disassemble the x86miniroot in Jumpstart to add Murayama's sfe driver (Howto provided in a previous Blog) and that would give us an active network interface to complete the Jumpstart install. But the boards don't have PXE, only IPX Netware boot so thinking about PXE boot without a PXE boot capable NIC would be moot. I checked my inventory of Intel and 3COM ethernet cards and none had the optional PXE boot ROM. And the only one with the PXE boot option ROM I have in stock is a PCI-e Intel e1000g. Great. More shopping, but that would take a bit of time and I wanted to install the systems, there and then.
So the best solution I initially thought of was to temporarily attach a standard DVD-ROM drive and have it perched outside the propped-open case and do the install once on each disk. That'd expose the system for just an hour or two and we could then close them up. But that's still an hour or 2 or more. Plus the whole hokey setup with cable/ribbon hanging out and a bare drive setting there spinning loudly for 2 hours while I'm trying to watch a remake of Van Helsing with Hugh Jackman, isn't what I had in mind. And looking over my desk, I had a bunch of IDE Seagate Barracuda IV ST380021A drives, in clear plastic clam shells, just sitting there, recently swapped out of a couple of test boxes with Solaris Nevada b52, which got newer SATA drives.
A conversation with a colleague earlier in the Spring of this year came to mind. We were all sitting outside the Sun Santa Clara Auditorium after a Silicon Valley Open Solaris Users Group (SVOSUG) meeting and a bunch of folks were going over to Denny's or IHOP for a late bite. Dan Price had just given an S10 Next Gen Overview and he covered a lot of things. And our SATA team had given a talk on the new SATA framework. And one of my colleagues who works on x86 boot and ACPI was there. I think we were waiting for a couple of guys still inside and probably cleaning up, and I asked my colleague about why if I switch disks from one box to another, I can't get Solaris to boot, except for Safeboot.
My colleague gave me that funny look, like, "Why would anyone wanna do that?" I explained that from a customer support perspective, it'd be cool to flash a Solaris image to disk, then ship it and let it boot up and do self-config. He countered that, no, this isn't a big feature demand for Solaris and asked rhetorically how many folks would ever use that feature anyway? Well, I wasn't sure. But on Linux, I do this all the time - preflash a disk and when I need to do a quick build, I slap the drive in there, Kudzu kicks in, and voila... system is configured in a couple of minutes. Solaris gets stuck in reboot-hell if we try this. So after some debate, which went nowhere, I brought up the Jerry Seinfeld episode that got me to start watching that sitcom. And this was the first episode I ever chanced upon where Seinfeld has his little comedy clips at the open and close of each episode. He's standing there talking about the black box.
"Ya know, when a plane crashes... the only thing that survives is the BLACK BOX... Ever wonder why they don't just make the WHOLE plane.... out of THE black box???" (laughter).
In all seriousness, we know that Solaris safeboot has the hooks to rebuild the boot-archive and device trees, and the installer figures out the devices and puts a permanent map of that onto the filesystem somewhere. So why can't we put those same hooks into the regular Solaris boot or multi-boot? That's what I asked my colleague.
He gave me that, 'James, you're naive and uninformed about Solaris x86 boot'-look and just told me that it's not a widely used feature, and that's not what multi-boot does, and started down the path of how the GRUB works with multi-boot and the whole secondary boot blah, blah, blah, blah. Which I took as obfuscating the issue with detailed specs that don't matter to end users. Anyways, to make a long story short, I was miffed and decided against Denny's, and went home to eat cold supper made by my wife, who's always been a lot warmer and at least seems more understanding.
Fast forward 9 or 10 months and here, I have two perfectly good disks, almost new, and pre-installed. I searched the web and managed to find a few Sun FAQ/Developer discussion board topics. It was clear that at least a couple other folks wanted to do the exact thing and actually had the gumption to ask us how. One guy got pretty far in the boot but still didn't get it going. Our standard answer was to boot the kernel in debug mode and look at the output. That's fine if you're a Sun Engineer and know how to look at the messages and hex scrolling off the console. But to for most folks, it's nonsense and just shows a lack of empathy on our part.
But within the various tips, I put together the various tips and gave it a try; magically, 5 minutes later, it was all working on the new SiS chipset motherboard. Here are my steps:
- Boot into Solaris Safeboot mode. You can get access at the Grub menu, usually the 2nd option.
- Mount the found Solaris partition on /a Safeboot will usually find the slice on the disk with Solaris and ask if you want it to mount on /a. Select Yes.
- Move /a/dev, /a/devices, and /a/etc/path_to_inst to another name (I just append .orig) and then create new directories, (mkdir) /a/dev and /a/devices, and touch /a/etc/path_to_inst.
- Run "devfsadm -r /a" to rebuild the device tree
- Edit /a/boot/solaris/bootenv.rc and modify the line with "setprop bootpath '/pci@0,0....' to match the path you'll find mounted for /a (i.e. run a 'df -k' command, and you should see /a mounted from /dev/dsk/c1d0s0 or something, then run 'ls -l /dev/dsk/c1d0s0' or whatever your device listed was, and you should see the actual link point to ../../devices/pci@0,0/...) The path to bootpath you want should be the hard disk which is mounted as /a and you just need to find the expanded /devices/pci@0,0/... path and put that in the bootenv.rc file on the Solaris root filesystem on the hard disk (sans the /devices/ prefix of course).
- Now run "bootadm update-archive -v -R /a" to rebuild the boot-archive on /a
- run a 'touch /a/reconfigure'
- Run "cd /; sync; sync; sync; umount /a"
- and finally reboot.
There may still be issues if you have changed the IDE priority (master/slave) of the hard drive or moved it to a different SATA socket. In these cases, you probably will successfully boot but run into problems with mounting filesystems not found, in which case, boot into safeboot again, and edit /a/etc/vfstab to correct.
Once the system boots, it does retain its legacy settings for network and naming service which may be totally bogus, in which case, I check for any missing drivers (like an SiS900 Fast Ethernet module) and either transfer the source/binaries for the modules via CD/DVD media or USB which usually will work. I'll then delete any /etc/hostname.[NIC#] files and run a sys-unconfig on the system, and reboot again.
Yes, not the most elegant of solutions, but it could be scripted and one of my colleagues down the hallway thought we could do our customers right but putting that script into the safeboot and documenting that we put it there. For now, I hope the instructions help others who might be in the same boat. It takes about 5 - 10 minutes to swap the drive and reboot and reconfigure the system if you know what you're doing. That's a lot better than an hour or two re-installing or upgrading and it also means I can keep pre-flashed drives just sitting around to save time on installs and testing.
BTW, Solaris b52 runs well on the PCChips M863G motherboard with AMD Geode NX 1750. The SiS 741GX/964L chipset functions normally, and Xorg even finds and configures the SiS onboard graphics with no work required. The graphics support on the SiS chipset, while supported in Solaris, is really grainy. And the graininess is particularly worse on one of the systems than on the other. I guess I could stick an AGP card into in the slot (and I did try an older ATI Radeon unit and it looked beautiful), but it eats more wattage with the optional graphics card. And since my plan is to mainly run these headless, I really shouldn't care. But this isn't the first time I've had graininess issues with Integrated graphics, especially on these small form factor boards. I suspect it has something to do with the number of layers in the board (usually cheap boards use fewer layers) and the integrity of the analog/vga signal that goes through the motherboard where there may be a lot of RF interference to get to the back I/O plate. I've used the same graphics chip and software driver on different boards and sometimes the image is crystal clear. And of course, with an optional AGP card in the slot, the board is raised and separate from the board and less likely to have interference due to wires too close in proximity. I get similar graininess with the VIA mini-ITX systems, but it's not as bad as on the SIS chipset on this particular board, plus an older celeron system I used to have with an SiS chipset and SiS Mirage graphics looked great, so it isn't the chipset.
The goal is to run them as network servers, so I plan to disable the graphical login and run only in text mode anyway. Next, I compiled and installed the sfe-2.2.0 gldv3-nemo driver from Masayuki Murayama's Free Solaris NIC collection and it just works. Honto ni, arigatou gozaimasu, Murayama-sama! (someone send this guy a case of Sapporo Nama Biru and Pizza Hut vouchers or a free Shinkansen Ticket to Sapporo where he can pig out at the beer factory at the "Ghengis Khan" Mongolian BBQ Tabe/Nomihoudai [all-you-can-eat and drink] !)
Note about Xorg on older Unichrome (not Pro) Graphics
I mentioned above I had an issue with graphics installing Solaris on the VIA Unichrome graphics. This was on the Biostar M7VIG 400 board. Back in Nevada b30-something timeframe, more than a year ago, I was having some graphics issues on my EPIA mini-ITX boxes which only impacted the graphical installer, and the VGA text modes. But Xorg, itself, worked fine. Only, if you selected Text Console as the login option, you'd get a dark and illegible screen or sometimes a pastel screen. But the version of Xorg did work and very stably if you always used the graphics. We had a few talks with some of the VIA folks who tried to reproduce this on newer mini-ITX boards and they didn't see the same problem. The funny thing was we knew this was partly a problem with the Solaris driver and partly in hardware rev, because in going from an older rev of a PCChips M789CG v2 to a newer rev, v3, both with VIA Unichrome graphics, suddenly the problems went away in the same build of Solaris and all the modes: the VGA text, console login and X graphics worked fine - no driver changes. A friend mentioned also that a 1.3 MHz Nehemiah on CN400 chipset with Unichrome worked fine, but I saw weird pastels or garbelled text consoles on my older EPIA ME6000, M10000, and EPIA 800 systems.
Some time around the build 48 time frame, I was going through and upgrading a bunch of my systems when suddenly, all of my VIA Unichrome systems would power-up in blanked out graphics mode (i.e. there was no VGA signal coming from the graphics port and the monitor would blank with a yellow blinking standby-mode light). The behaviour was very peculiar. And nothing was showing up in the logs. I tried to log in, and yes, it was as if X was thinking that it was running because I could see the disk spinning as if I succeeded in logging in, and ssh'ing from another machine showed that I had active Shell processes on the console, even though the console was blanked (like someone turned off graphics or blanked the screen in low-power mode or something like that).
Back in the old XFree86-to-Xorg transition days of Linux, I used to run a bunch of Biostar M6VLR boxes with the old Trident Cyberblade graphics embedded as part of the VIA PLE133T chipset. Fedora Core 1 and 2 had fits with the Trident Cyberblade. And even FC3 still had some issues. A more stable driver binary was available and the trick was to use it to replace the default driver module used by the Xserver. Using the same trick, I loaded a Solaris DVD from a previous build and found the ./Solaris_11/Product/SUNWxorg-graphics-ddx package and copied the ./archive/none.bz2archive to /tmp and unpacked it using:
# bzcat none.bz2 | cpio -C 512 -idukm
This created /tmp/X11 and inside /tmp/X11/lib/modules/drivers/ was the via_drv.so file.
I copied the old VIA graphics driver in /usr/X11/lib/modules/drivers to via_drv.so.orig and then copied the one unpacked in /tmp/X11/lib over to /usr/X11/lib, clobbering the old version. Because the sizes differ slightly with each build of Solaris, I couldn't tell which was the same or different. So I methodically went back checking each build of Solaris until about build 42 or 43 before I found a version of via_drv.so that worked with the graphics and didn't blank the screen. But it did cause the Biostar board with Geode NX cpu to revert back to bad console text. So at least it seems that the Biostar boards were old versions of the VIA Unichrome hardware (reaffirming my disappointment with that first Geode NX/Biostar combo). But I had a work around to get X graphics working.
I know build 53 put back some big changes to graphics, and improves AGPgart. I don't know if it fixes other graphics issues; but it's worth a try. And b54 just got posted internally. Unfortunately, that Geode NX box is up near Vancouver, Canada right now and in a powered-off state with no LOM (lights-out management). I'll be up there for Christmas in a couple of weeks and then I'll give it a try, unless the weather warms up a bit to let me get out there and go fishing for some winter steelhead. I'm usually more worry free in winter while fishing. The black bears are hibernating and not likely to be up stalking me while fishing. I'll save that for another blog.