sirius - OpenSolaris on System z - an update
By jsavit on Feb 16, 2009
OpenSolaris on System z - an update
Some months ago I reported on the prototype port of OpenSolaris to IBM mainframe under z/VM ("sirius"). Some things have transpired since then, and I figure its time for an update (also, some people asked me off-line, so why answer separately?)
Why we're helping - it's a Community thing
First, let me make Sun's position clear, as there's been a little confusion. We are obviously not advocating that people do more computing on mainframes! (This doesn't come as a surprise, right guys? :-) ) Remember, we make our own computers which we believe are far superior computing platforms, hosting the fully-supported, complete implementations of Solaris and OpenSolaris with a massive installed base and ISV partner product portfolio. We have our own highly effective, scalable, widely adopted virtualization, providing the best possible platforms for virtualized, consolidated Solaris operation. Solaris also runs on many SPARC, Intel and AMD systems from other vendors, insulating customers from lock-in or inflated prices.
Instead, our purpose is to grow the OpenSolaris community, and if that includes people who are traditionally mainframe customers, that's fine too. Maybe those folks will get experience with Solaris using excess capacity on their mainframes, see the light :-) and then deploy Solaris on SPARC and x86/x64 platforms, where the implementations are complete, fully supported, and widely deployed.
To that end, we've loaned SPARC server hardware to SNA for their cross-compile port work, set up an OpenSolaris project page and contributed time installing and testing the prototype, and providing bug reports and feedback. The last part is where I come in (in addition to having made introductions within Sun.)
Note: You can get started with OpenSolaris right now on your own Intel or AMD desktop or laptop computer: download it at http://www.opensolaris.com/get/index.jsp right now! You can run it from a LiveCD image so you don't have to overlay the OS you have installed on your computer, or you can run it in a virtual machine: download VirtualBox from the same URL above, and have your cake and eat it too!
Status update - things fixed and pending
So, that out of the way, some notes on the prototype. A lot of work has been done since my last post on this topic, though the implementation is still far from complete.
Last time, for example, there were no
prstat commands, both of which are now available.
There was no
hostid command, but one has now been written (it didn't make the image I'm now testing, but I'm told it will
be on the next one).
Problems that made
getconf -a and 64-bit
ls fail have been resolved.
There was a bug where you couldn't change your password - that's been resolved too.
One step in the right direction relates to the Solaris Containers (frequently referred to as "zones") feature of Solaris 10 and OpenSolaris (also see Wikipedia). Back in autumn, attempting to create a zone produced the following error:
# zonecfg -z zone1 zone1: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:zone1> create Segmentation Fault (core dumped)The good news is that
zonecfgnow apparently works. Unfortunately zone installation fails with a complaint that a script related to Solaris Live Upgrade (LU) is missing, saying
/usr/lib/lu/lucreatezone: not foundThis is confusing: LU doesn't actually exist per-se on OpenSolaris, and that file shouldn't be present! OpenSolaris makes use of a completely different way to manage boot environments, so there is an issue about implementing Solaris Containers in the way that is compatible with OpenSolaris boot and package system - both of which have yet to be ported. Still, that's a step in the right direction, and one that I'm sure will be straightened out in due time.
This does open the big issue of differences between OpenSolaris and Solaris 10. For example, the prototype boots off the Unix File System (UFS), whereas OpenSolaris boots off ZFS (see also ZFS at Wikipedia). This is a substantial difference, because OpenSolaris uses ZFS snapshots and clones for managing boot environments. This entire infrastructure is part of what makes OpenSolaris, and needs to be implemented.
A number of important features are still missing: DTrace (see also DTrace at Wikipedia) for example, is not implemented. That's a key feature of Solaris 10 and OpenSolaris, and is even available on FreeBSD and Mac OS X. There's a deep need for a package/patch management system - currently there's no way to apply a patch, though the Image Packaging System is now going to be ported over.
One of the obvious things you do with a prototype system is you debug and diagnose things.
Annoyingly, if you use commands like
to diagnose what a program is up to, it can die or kill the target process, as that set of tools
isn't quite working yet (Guess what happened to me one time when I tried to run
sshd, while logged in via ssh! One time is all it took to teach me to not do that again!)
More frustrating, using one of those commands to look at a
core file, could have the tool crash and overlay the core
file with its own dump. Ow.
Other bits and pieces are missing:
If you issue the
format command, it still says "No disks found" (even though there are definitely disks available and mounted). The
prtconf -vp commands don't work.
In general, there are things you bump into now and then. This should be expected - this is a prototype, and that's what prototypes are like. The developer is outstanding, but this is a Herculean task (there's a pun here - and I'll mention Hercules again in a moment) and things like this should be expected. It's been my task to look for errata - so it shouldn't be a surprise that I find them.
In the queue now
In my latest testing I came up with new problems - or at the very least, newly discovered ones.
ping didn't work from a non-root userid, which was quickly solved.
$ ping 10.80.63.130 ping: socket Permission denied $ ppriv -De ping 10.80.63.130 ping: missing privilege "net_icmpaccess" (euid = 103812, syscall = 230) for "devpolicy" needed at common_specvp+0x5e ping: socket Permission deniedAdding 'setuid' bit to the binary fixed that problem. Ain't Solaris privilege bracketing neat?
Right now, there are a few new problems that have been head-scratchers: one is that programs like
ftp dump core if I use DNS name resolution (they worked fine
if you just put in the target IP address). Previously I hadn't set up
/etc/nsswitch.conf to use DNS
so the issue never arose.
This provoked a problem where
sshd would refuse to accept new
sessions: any previously established session was fine, but attempts to establish new connections failed.
Eventually (smack forehead!) I figured out that the forked instance of the ssh daemon was dieing while trying to
do a reverse-DNS lookup of the client, so just turned off DNS resolution altogether.
This is important, so I'll be looking into this again soon.
Around this time I started having
recursive crashes in the Service Management Facility (SMF) daemons;
every time I looked,
svc.startd were running and chewing up as much z9 CPU time as they could get.
Back in the summer, I was getting error messages about a corrupt SMF repository - maybe that error persists and has just bit me.
Even more odd, I was getting crashes rebooting the system!
So, for sake of starting afresh, I'm reloading the test system from the disk restore image (you install this by doing an image restore of a disk volume, and manually configuring network identity). That way, we can start debugging again (or see if the above problems never recur) from a system where the tracks haven't been muddied.
Please take note: As I said before, this is a prototype. Errors and odd situations are what happens in the laboratory when working with experimental software, in this case a port of a complex operating environment. Please have expectations that are consistent with reality and not expect this software to have the properties of the production grade Solaris OS, which has the benefit of development teams, test organizations, QA groups, and years of baking in.
Speaking of Hercules
I previously alluded to the Hercules System/370, ESA/390, and z/Architecture Emulator. Early in the sirius project I advocated that sirius run only under VM, as that simplifies the porting effort.
A consequence of that, which I regret, is that it makes it impossible to run sirius on commodity, developer personal computers. Hercules provides a virtual mainframe, but sirius makes use of hypervisor functions provided by VM. Fortunately, the Hercules community is looking into fixing this - either by modifying the sirius port code so it can run without the hypervisor functions, or by adding them to the Hercules emulator. That will make it possible for a lot more people to use this prototype port, as not everybody has access to a mainframe! That will help grow the OpenSolaris community, which is why Sun is participating.