Tuesday Mar 24, 2009

Adobe Reader 9.1 for [Open]Solaris on x86 finally here!

Sweet! Adobe finally released Adobe Reader 9.1 for [Open]Solaris on x86!

Get it from http://www.adobe.com/go/getreader

Wednesday Sep 26, 2007

Solaris Wireless on MacBook Pro

I've been playing with a nice MacBook Pro the past couple of weeks in a effort to evaluate if it would be suitable for Field Service engineers. Basically the model I have is this:
  • MacBook Pro
  • 2.2GHz Intel Core 2 Duo
  • 2GB of RAM
  • 120GB Harddisk

MacOS is an excellent OS and for the most part, it "Just Works". However, Solaris is an absolute requirement for Field Service. So, to that end, we've been looking at Parallels and VMWare as possible options for running Solaris. Those are fine and dandy too, but why should Solaris be limited to a virtual machine?

For the most part, of course, a virtual machine, would be just fine. However, when troubleshooting customer systems, the virtual machine would just add another level of "could be this problem." I don't like the idea of having to troubleshoot 3rd party software on my laptop when I should be troubleshooting customer systems.

That being said, the bare metal option is the most interesting to me.

After downloading Boot Camp from Apple, and going through the steps on Paul Mitchell's Blog about dual partitioning the MacBook Pro, I was up and running with Solaris Nevada Build 72. I updated to Build 73 shortly afterwards since it was out before I got to any serious playing.

Wired networking required adding in the Marvell Yukon driver and uttering this to get the wired network going:

# update_drv -a -i '"pciex11ab,436a"' yukonx

The wireless card is an Atheros card according to scanpci:

pci bus 0x000b cardnum 0x00 function 0x00: vendor 0x168c device 0x0024
 Atheros Communications, Inc.  Device unknown

Unfortunately, the Atheros driver that ships with Build 73 doesn't support that chip. Luckily, it looks like that driver has been updated to version 0.6. The OpenSolaris website has it here. A simple pkgrm of the old package and a pkgadd of the new package and I was in business talking with my Airport Extreme Base Station!

# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000 
ath0: flags=201004843 mtu 1500 index 2
        inet 192.168.10.38 netmask ffffff00 broadcast 192.168.10.255
        ether 0:1c:b3:b8:a6:c1 
lo0: flags=2002000849 mtu 8252 index 1
        inet6 ::1/128 
# wificonfig -i ath0 showstatus
        linkstatus: connected
        active profile: [Applesauce]
        essid: Applesauce
        bssid: 00:0a:95:f3:3d:f2
        encryption: wep
        signal strength: medium(10)

Score another one for Solaris on the bare metal!

The remaining things that I'd like to have working are sound through the built-in speakers and some better power management.

Tuesday Aug 22, 2006

Well it SHOULD just work....

I need to remind myself that I put alpha/beta software on my systems and such software may or may not mess things up at a later date.

In getting my Palm to sync with my laptop, I found that I had to change the permissions on the devices created by the ugen driver. I've been told that this shouldn't be necessary.

Being a troubleshooter, I took a system in my lab and did a fresh load of Solaris 10 06/06 on it and created a regular user. From there, I logged in, connected my Palm to it and used the pilot-xfer command to try to talk to my Palm while it was trying to sync. Specifically, "/usr/sfw/bin/pilot-xfer -p usb: -l" It worked without an issue.

Tracking this down further, I found that the devices that are being created on my laptop when the Palm is trying to sync are owned by root. The ones created on my lab system are owned by the user that is logged into the console. Specifically:

pwags@coredump:/devices/pci@0,0/pci1179,1@1d% ls -l
total 2
drwxr-xr-x   2 root     sys          512 Feb  6  2006 device@1/
crw-rw-rw-   1 root     sys       35,  2 Aug 22 11:09 device@1:830.61.cntrl0
crw-rw-rw-   1 root     sys       35,  3 Aug 22 11:09 device@1:830.61.cntrl0stat
crw-rw-rw-   1 root     sys       35,  1 Aug 22 11:09 device@1:830.61.devstat
crw-rw-rw-   1 root     sys       35,  4 Aug 22 11:09 device@1:830.61.if0in1
crw-rw-rw-   1 root     sys       35,  5 Aug 22 11:09 device@1:830.61.if0in1stat
crw-rw-rw-   1 root     sys       35,  8 Aug 22 11:09 device@1:830.61.if0in6
crw-rw-rw-   1 root     sys       35,  9 Aug 22 11:09 device@1:830.61.if0in6stat
crw-rw-rw-   1 root     sys       35,  6 Aug 22 11:09 device@1:830.61.if0out2
crw-rw-rw-   1 root     sys       35,  7 Aug 22 11:09 device@1:830.61.if0out2stat
crw-rw-rw-   1 root     sys       35, 10 Aug 22 11:09 device@1:830.61.if0out7
crw-rw-rw-   1 root     sys       35, 11 Aug 22 11:09 device@1:830.61.if0out7stat

Device permissions are controlled by /etc/logindevperm. These files match between my two systems so that's obviously not the problem.

Looks like I need to track things down some more here.

Wednesday Aug 09, 2006

Syncing Palm's with Solaris via USB

Way back when, in the days when I had a nice Palm III, I did a bunch of syncing with my CDE calendar using Pilot Manager (at least that's what I think it was called). It eventually morphed into PDASync, which can still be found in Solaris 10.

The problem was though that it required you to be attached via a serial port. When I got my Palm m505 after my Palm III experienced a sudden stop on the floor one day, life was still good because I bought a serial cradle to go with it. When I recently retired my m505 and got a Tungsten E2 to replace it, I didn't get the serial cradle to go with it. Instead, I started to use the built-in bluetooth interface to sync it with my Mac mini.

That of course, was fine until I realized that I had no way to get my Mac to talk to Sun's EdgeCal (which is our calendar server that is accessable from the internet.) iCal on the Mac isn't designed to talk to a server and the Palm Desktop doesn't talk to a calendar server. To top it off, even if I did get a serial cradle for my Tungsten, PDASync won't talk to EdgeCal.

So I was in a bind. I couldn't use the USB interface on the Tungsten to talk to Solaris. I couldn't use the software for the Mac to talk to EdgeCal and a serial cradle wouldn't buy my anything either. Not good.

Enter Solaris 10 06/06 (Update 2). It has the necessary glue to talk to the Palms via USB. This glue is provided via the SUNWpltlk (pilot-link - Palm Handheld Glue), SUNWgnome-pilot (PalmPilot Link Utilities), and the ugen driver in the SUNWugen (USB Generic Driver) and Evolution.

I suppose that you could just pkgadd the packages to a Solaris 10 install prior to Update 2, but its much easier to just upgrade to Update 2. Plus you'd get ZFS as a bonus.

To start with, configure Evolution (Located in the Java Desktop under "Launch" -> "Applications" -> "Internet" -> "Email and Calendar") to talk to your calendar server or just use your local files.

Also, with Evolution, you'll need to set your default folder to point to your Calendar, otherwise the Palm will always sync with the locally stored calendar. You can set this in Evolution by going to "Tools" -> "Settings" -> "Folder Settings" -> "Default Folders" and selecting the correct entry from "Calendar."

Next, you'll need to change the properties of the ugen driver so that an unprivledged user can read and write to the Palm device. By default, all ugen device entries that are generated have permissions of 0644. Without changing these to 0666, I've only been able to get root to talk to the Palm. So, become root and invoke the following:

# update_drv -a -m '\* 0666 root sys' ugen

Now, changing the permissions to rw for everyone, could be a security risk, so use your best judgement. If there is a better way of accomplishing this, I'd like to know.

From that point, all you need to do is, from Evolution, click on "Tools" -> "Pilot Settings" and follow instructions there.

From there on out, you should be in sync. :)

Monday Feb 13, 2006

Installfest on Thursday!

For those that are in the Minneapolis area, you might be interested in the InstallFest that I'll be running ...

Here's the registration link:

http://www.suneventreg.com/cgi-bin/pup_registration.pl?EventID=854

Wednesday Dec 21, 2005

Solaris 10 1/06 available!

Solaris 10 1/06 is now available for download.

Tuesday Jun 14, 2005

Adobe Reader 7.0 for Solaris on SPARC

Adobe has updated their Adobe Reader software for Solaris on SPARC to the latest version, version 7.0.

Now, if we can just get them to recompile it for Solaris on x86. That's all that it should take.

You could still get version 4.0.5 of Adobe Reader for Solaris on x86 from their FTP server last I checked. They had pulled it from the main download page due to a security bug in it that they didn't want to fix. At the time, not updating the Solaris on x86 binary probably made sense to them because I think it was around the same time that Sun had announced that Solaris 9 on x86 wasn't going to happen. Still, the 4.0.5 version was on their FTP server the last that I checked, but for some reason I can't seem to find where that is at the moment.

Maybe OpenSolaris will encourage them to put up 7.0 for Solaris on x86.

Monday May 09, 2005

Certified! (Certifiable?)

Back in the last week of January and the first week in February, I was involved in a little project to write questions for Solaris Operating System Certifcation for Solaris 10.

From what remember of the certification exams when I first was Solaris Certified was that they asked a lot of arcane questions on things that you didn't do very often. Things like setting up a printer or a modem from the command line. I hated the exam, but somehow passed it. Since then, seven years have passed and I'd like to think that I know quite a bit more about Solaris administration than I did before.

In any case, I was asked to join the group of people that were writing the questions for the three exams involved with Solaris 10 Certification. Specifically, I was asked to join for my Solaris on x86 based platforms experience. I've run a webpage internally for the past few years called Lapland (Located a http://webhome.central/lapland for Sun-internal people.) Lapland is more of a clearing house with various links to engineering groups and useful tools and packages both internal and external to Sun.

The experience of writing the exam was much different that what I had thought it was going to be. We had a wide group of people with many different focuses; security, service, engineering, presales, professional services, and education. We also had practically every english accent your could image because we had people from Italy, The Netherlands, England, Canada, US, South Korea, and Australia. A lot of experience and experiences to say the least.

We had to be taught how to write the exam questions. Basically, it boiled down to writing questions that followed the Solaris administration courses. We couldn't come up with obscure questions just to stump people. We couldn't write man page questions. (So asking what each option of the 'ls' command meant was out of the question). The hardest requirement was creating answers that were right all of the time as well as answers that were wrong all of the time, regardless of the situation. Finally, we couldn't put our groups' leader's name into questions. (Apparently she gets enough strange calls as it is.)

So, for a week, we churned out some 900 questions total. 300 for each exam. The week after that was the technical review of those 900 questions. We had to rewrite many, and throw out a bunch of them as well. Spending between 8 and 10 hours a day with the same group of people making unambiguous questions that had only one right answer no matter what was one of the more technically challenging things I have done.

The three exams went our for Beta testing in March. Two versions of each exam, each with about 160 - 180 questions each. I took each of the exams myself (They were free for anyone taking the Beta's.) That was challenging as well. Though I reconized many of the questions and had seen the answers, I still had to think about each one. 4 hours for 180 questions is not easy.

The exams that we came up were challenging, but not impossible.

After the exam questions were written, I had no other input into the whole exam process. However, I'm told that after the number crunching that was done based on how the Beta's were answered, a final set of exam questions was selected. Hard questions that no one was able to answer were thrown out, and easy ones that everyone answered were thrown out too. The final exam was whittled down to 60 questions.

I just go my final exam results from the Admin I and Admin II exams this past week. I had hopped that I'd get something north of 90%. I was surprised that I got a score in the mid 80%'s. It was enough to pass though.

I got in the mail my official certificate stating that I am a "Sun Certified System Administrator for Solaris 10 OS" I'd frame it if I had a permanent office assigned to me. Since I don't anymore, it'll just have to live in the folder with the rest of my certifications.

Thursday Sep 23, 2004

Capturing corefiles

One of the things that I run into occasionally is kernel panics. Kernel's panic for one reason and one reason only: TO PROTECT YOUR DATA! It seems like a really annoyance to have your big box crash constantly but its not nearly as bad as it not crashing and having it cheerfully scribble all over one's company payroll database.

So what is a panic? Well, basically its the kernel throwing its hands up saying "I give up!" due to some kind of error that it can't recover from in such a way that will guarantee integrity. It will stop whatever its doing and attempt to do some cleanup by flushing some filesystem buffers and then it will dump all of the kernel-owned memory to disk (dumping core) for a human to look at and figure out what went wrong.

Panics could be cause by a software issue. Software could be some buffer overrun, or a null pointer, or some mangled data structure. Really anything that doesn't make sense to the kernel will cause a panic.

The panic could also be caused by a hardware issue. If we have a page in memory that gets an uncorrectable ECC error (2 bits getting flipped) for example, the OS can't correct the bit flips because the ECC algorithms don't have enough information to determine which bits are incorrect. (A single bit flip would be OK, but more than one bit flip takes more information and more work and is more expensive than is practical to implement). As the system now has data that it knows is questionable, it really can't continue running.

Actually, it may not panic due to an uncorrectable ECC error depending on where this page was. If this page was in kernel space, we need to panic. The kernel knows that it can't trust itself (for lack of a better word), so its going to dump core and reboot.

However, if this page that we have an uncorrectable error in is in user space, in other words owned by a process, (say Mozilla, or Oracle) then we can do something a little bit different. As the kernel knows that is OK just that process can't continue the kernel will kill that process. Obviously, if this is a big database running month end batch jobs, simply killing Oracle and going on our merry way isn't a good thing. So the system will then do a graceful reboot. This of course is in the assumption that the system administrator has those processes configured to automatically startup and recover with a system boot.

In any case, no matter the cause, if we're going to panic, the system is going down. It has to go and write out its kernel corefile out somewhere that it won't be messed with and won't risk corrupting other data.

The place the corefile is written out is called the dumpdevice. The place that most systems have their dumpdevice set to is their swap device. Swap provides a nice place to write it out because its stable (on disk) and you're not going to corrupt anything by writing too it during the crash because its re-initialized whenever its used.

With Solaris 2.6 and earlier, the dumpdevice was always set to the first swap device that was added into the system. Unfortunately, with those earlier releases of Solaris, once the dumpdevice was set, it stayed set to that device until the system was rebooted and set again.

As of Solaris 7 and later, the dumpdevice is set according to whatever the administrator defines via the "dumpadm" command. By default, this is set to "swap" which gives the same behavior as Solaris 2.6 systems.

A good dumpdevice is a physical device as opposed to a pseudo device. This means that you'll want to use a single physical disk slice if at all possible. Mirrored devices don't necessarily make good candiates and often will not work as a dump device. Such examples would be mirroring of swap using Veritas Volume Manager (VxVM) or Solaris Logical Volume Manager (aka. DiskSuite). With Solaris 8 and greater, as disksuite is integrated into the OS, the metadevice for swap is an acceptable dump device.

If all of your swap is mirrored, which is should be if your boot disk is mirrored, then you'll want to set your dumpdevice to they physical slice of one side of the mirror. Normally, invoking access to one of the underlying slices of a mirror is a bad thing and corrupt data. However, as swap always writes data before reading it, this is an OK thing to do.

Okay, so at this point, we know we have to panic. We also know where we're going to dump our corefile to. So how do we do it?

Well, we start by dumping memory out to the end of the dump device and work our way to the beginning. We start at the end to give us some reasonable assurance that the corefile will not be overwritten by swapping when the system starts up before we can copy it to the filesystem. This was a very real problem early on with systems that did not have a lot of memory. Obviously a system with several Gigabytes of memory won't likely swap anymore at boot, if it ever swaps.

So we write out the contents of physical memory to the dumpdevice. With Solaris 8 and 9, this dump is compressed on the fly to save on diskspace. Once the dump is completed, we reboot.

Upon reboot, the savecore program is run. Savecore is run at /etc/rc2.d/S75savecore. It is the default for both Solaris 8 and 9 to run savecore at every boot to check for a new panic dump to save. With Solaris 7 and earlier, you had to manually configure savecore to run, which caused many missed corefiles.

Savecore will read from the dumpdevice and look for the magic numbers and dumpheaders in the dumpdevice that indicate there is a new corefile to be written out. Upon finding one, it will save it out by default to /var/crash//vmcore.X where X is determined by the presence of the "bounds" file. The bounds file will contain a number that will determine what X is in order to prevent overwriting new confiles on older ones. Savecore will also write out a file called unix.X which is the nameslist for the kernel. This nameslist is used to make sense of the differnt modules and structures that appear in the corefile. Both are very important and they should be sent in together for analysis. One is almost worthless without the other.

The final thing that savecore does is update the dumpheader for the dump that is on the dumpdevice to indicate that it was written out. This is necessary to prevent savecore from writing out the same corefile over and over again upon each reboot.

Contrary to popular belief, the corefile that is written to the dumpdevice is not overwritten or erased by savecore. It'll stay intact until something overwrites it (usually by being swapped to). It can stay there for days, weeks, or even years. Thus you can get it even if savecore fails to write it out to a file due to lack of filesystem space. Just run "savecore" and it will attempt to save it out to the default location. Running "savecore /directory/with/alot/of/space" will save the corefile to a location that you wish.

If a corefile has been written out already, you can still retrieve it. I've had many cases in which /var/crash was getting full and the sysadmin deleted the corefile he wanted to send out accidently. You can use "savecore -d" which tells savecore to disregard the dump headers. Some say the -d stands for "dangit!" (they actually said something else that starts with a "d" but this is a family show.) For example, here's a box that I crashed a while ago myself by doing bad things to it. The corefile has already been saved off to disk and erased, but the dump still lives on the dump device.

pwags@dazzle # savecore
pwags@dazzle # savecore -d
System dump time: Thu Aug  5 14:28:05 2004
Constructing namelist /var/crash/dazzle/unix.2
Constructing corefile /var/crash/dazzle/vmcore.2
100% done: 73282 of 73282 pages saved
The first savecore didn't work because the corefile had already been written. The -d option was then needed.

So, after you have the corefiles written to /var/crash/, you'll need to tar up vmcore.X and unix.X together and compress the tarball. PLEASE compress the tarball. Corefiles have recently gotten very large in size (several gigabytes) and transferring them up to the internet and then down internally within Sun for analysis would take hours or days. Compressing them is essential.

That's my take on capturing corefiles. Hope that you found it a bit informative.

Friday Sep 03, 2004

Ethernet Blueprint

I just came accross a nice Sun Blueprint that addresses and issue that is probably more commonly seen that most people realize but its not really all that well understood.

The Ethernet Autonegotiation Best Practice Blueprint, just released in July addresses in an easy to read form (with some excellent technical details for those that care for it) why its no longer the recommened practice to force ethernet interfaces. This is a problem that I run into more often than I'd like to.

I'd typically run into it when a customer forces their eri interfaces in /etc/system on the SC interface of a Starcat's and then wonder why they have domain communication problems through the SC's. As the blueprint explains, the SC to domain communiation is only 100mbps half-duplex. Forcing it to 100-full makes it look OK until you try to put some stress on it with say, jumpstart. All sorts of bad things can then happen.

Anyways, its a good quick article to read.

Friday Jul 02, 2004

Simple fixes to make life easier (ufs logging)

So I figure that I might start to ramble about the things that I see causing the most problems for admins.

I'm simply amazed at how many things in Solaris that I think are pretty obvious and have been around for a while that aren't used.

UFS Logging being the least of them.

UFS logging has been around since Solaris 7. Considering that Solaris 7 has been out since pre-2000 days and that there aren't lots of people that still use Solaris 2.6, you'd figure that we'd have these kind of features set on all of the systems out there.

Nope.

So what is logging? From the mount_ufs man page:

               logging | nologging
                      If logging is specified,  then  logging  is
                      enabled  for  the  duration  of the mounted
                      file system.  Logging  is  the  process  of
                      storing  transactions (changes that make up
                      a complete UFS operation) in a  log  before
                      the  transactions  are  applied to the file
                      system. Once a transaction is  stored,  the
                      transaction can be applied to the file sys-
                      tem later. This prevents file systems  from
                      becoming  inconsistent, therefore eliminate-
                      ing the need to  run  fsck.   And,  because
                      fsck  can  be bypassed, logging reduces the
                      time required to  reboot  a  system  if  it
                      crashes, or after an unclean halt.

                      The default behavior is nologging for  file
                      systems  less  than 1 terabyte. The default
                      behavior is logging for file systems greater
                      than   1  terabyte  and  for  file  systems
                      created with the -T  option  of  the  newfs
                      command or the mtb=y option to the mkfs_ufs
                      command.

                      The log is allocated from  free  blocks  on
                      the file system, and is sized approximately
                      1 Mbyte per 1 Gbyte of file system, up to a
                      maximum   of  64  Mbytes.  Logging  can  be
                      enabled on any UFS, including root (/). The
                      log  created  by UFS logging is continually
                      flushed as it fills up. The log is  totally
                      flushed  when  the file system is unmounted
                      or as a result of the lockfs -f command.

Clear as mud, right? Lets put it in simpler terms. If the system crashes, and you have logging on, you don't need to fsck the entire filesystem. All you need to do is look at the log. The log will tell you what was being changed from a filesystem metadata perspective (there is no way to guarantee the actual data was written completely, you'd need a different type of filesystem to do that.) This makes your fsck take seconds instead of minutes (or hours for really big filesystems).

This is even more important in that Solaris fsck will stop and await input if it finds an inconsistency in the filesystem that would cause data to be changed or files to be removed. So your big, busy E25K crashes due to something (bad driver, hardware problem, vodoo) you're going to possible have lots of temporary files around. As the fsck might encounter them and determine that the way to make the filesystem consistent is to remove them. So it sits an prompts the user. As this is done before the filesystem is mounted you're sitting there with your E25k in single user mode waiting for input while your business is at a standstill. Not good.

Enabling logging is simple. Just add "logging" to the /etc/vfstab for the options field of the filesystem that you want. All ufs filesystems can do this.

What about performance?

From everything I've read, all of the old performance bugs for UFS logging have long since been fixed. Performance should be the same or better with logging turned on.

With Solaris 10, logging will be the default.

About

Phil is an Area Technical Engineer in the Central Area of Oracle's Field Service in North America. He has 15 years of experience supporting Sun's entire product line.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today