Tuesday Jun 29, 2010

Solaris, seriously

Seen this one before?

Bet you have!

Ever since the acquisition has closed, many end-users, customers and some industry analysts have registered their expressions of angst toward figuring out exactly what WILL Oracle's OS strategy shape up to be ? If the previous Sun/Oracle welcome events held globally, or numerous webcasts with Oracle executives and public statements that have been made by them are of any indication, followed by the progress that is being made since the acquisition had first been announced, then it would appear to be self-evident that things are looking really good, and that the OS strategy is comprised of 2 operating systems: Oracle Enterprise Linux and Oracle Solaris, each of which have registered their place in the appliance segment of the market.   Having just
completed yet another whirlwind set of meetings with some of our key financial customers across the country, I see a number of parallels in terms of common challenges being faced by them: aggresive strides toward power consumption reduction, whilst facing the need to consolidate diverse workloads, all while handling datacenter moves/consolidations from coast to coast (this is happenging globally, as we have learned  European banks are facing similar types of problems). We like to think of these as opportunities. 

All of this brings about questions of 3-5 year investment roadmaps, OS strategy and platform decisions that come up for review only so often - and the eye is really laser focused on what Oracle is doing with Sun's assetts, as well as what Oracle is doing in terms of delivering on the integration of the stack benefits ala database appliance, network attached storage appliance, and others in the works.  I did manage to squeeze in a few recaps of some of the highlights from recent World Cup matches; enough to make me wonder whether FIFA can begin to accelerate its acceptance of the notion of appliances and leveraging available technology to avoid repeating judgement mistakes (see my November 2009 blog entry about FIFA's lack of using technology for the benefit of sportsmanship and accuracy) - just this past week we witnessed 2 more World Cup matches falling victim to errors made on the field (Argentina vs. Mexico and England vs. Germany). 

When it comes to our "playing field", however, things aren't always cookie-cutter - or at least they ought to be looked at through a chronological lens, exercising restraint in jumping to judgements whilst taking into account the order of events having taken place over the past few years.  Without pretending to be an oracle, I would like to offer some thoughts on, what a general investor would consider to be, an important set of avenues that lie ahead for operating systems.  In the mid-90's, Oracle and Sun had gone separate ways in their strategy. Sun went on to focus exclusively on SPARC (practically abandoning most of the x86-specific bits in Solaris development) and Oracle went on, almost orthogonally, to leverage the commodity platforms available to run its business software with better price/performance ratios.  It made business sense where and when it did. Moreover, for Oracle,  it clearly wasn't enough to simply rely and depend on the OS variants being made available by open-source OS suppliers so Oracle differentiated itself by (1) innovating with Oracle Enterprise Linux, adding the necessary and unique ingredients to linux, giving back to the community where the community would take it -  to ensure that Oracle's customers choosing Oracle software would benefit by having it run (function and perform) better on Oracle's distribution of linux - AND - (2) creating a solid, enterprise-grade support system known as Oracle Unbreakable Linux.  Henceforth one can begin to see the idea of datacenter appliances beginning to emerge:  Oracle software, Oracle operating system - not as a full/complete stack (yet), but the foundation and business benefits were clearly there! One of the challenges that this approach brings about is the notion of ownership and control (think as an investor now, if operating systems are an asset (and they are) where are you likely to get the most benefit from your assets: when you own them, or when you control them - or both?).
Now, I'll admit that things aren't always "black and white" and that there're a few other ingredients that make for a tasty recipe - such as decisions, budgets, changes in ther market demands, time lines, etc. Here's one way to see what OS makes sense for your business challenges: OEL is owned by Oracle, but because its based on (and tracks) RHEL,  effectively it is not really
controlled by Oracle.   

Fast forward to 2010 - Oracle buys Sun Microsystems - and obtains ownership and control of the Solaris ecosystem - an operating system that Sun had continued to innovate  (even though there were elements of defocus from commodity platform support in the mid-90's, Solaris 8 had been the last Solaris distribution available for x86), it didn't take Sun engineers too long to realize, and ramp up support for, Solaris on x86 platforms with the release of Solaris 10. The key fact that needs to be pointed out to those who may not appreciate this, is that Solaris is derived from the same source tree that OpenSolaris comes from and is (roughly)
95% the same across all platforms it runs on: Intel/AMD/SPARC/virtual machines/hypervisors - etc. - the small differences reside in the low level architecture and driver implementation differences (or elements of paravirtualization, as is the case in VMs).  Further on deriving bits from the Solaris source tree - the notion of storage appliances begins to get productised by Sun - brought to life by the same engineers that brought (and continue to enhance) DTrace - in the form of a network attached storage appliance (hint: DTrace, itself, came about as a solution to the whole notion of making it easier to troubleshoot systems wholistically, by asking questions of the system and formulating a hypothesis that leads you down the path of understanding systems better, thus reducing the time it takes you to get to the resolution of your problem).   Sun had started building appliances even prior to that.  Have you heard of the desktop appliance ala Sun Ray and the business benefits it gets its users in terms of desktop security, mobility and reduction of licensing and desktop power consumption ? (Yep, we've got Banks, Telco's, Universities, and other industries using these devices to save money).  And while Oracle had begun building appliances, in parallel, with its Exadata machine, Sun and Oracle continued to leverage their strengths - and now these strengths are working cross-organizationally, accelerating development and value to clients.  Sun brought Java, Hardware, Solaris to Oracle's existing portfolio of business software. So - putting our investment cap back on: Oracle now owns and controls Solaris and hardware systems.  Oracle has talked about investing in these MORE then Sun had - have a look at Oracle's job postings, particularly in TX!  See some of the recent benchmarks and, before you bait and switch away from Solaris "just because everyone claims its the good-enough thing to do" (and in certain few ISV-led cases it might be today - as an artifact of where Sun had been).  But, don't judge Sun on where it had been, but rather where it is now and where Oracle is taking it; pause and ask yourself whether taking that turn makes sense for you today and, if you do take it, whether it will continue to make sense for you tomorrow.  Solaris has a few of its gaps and those are being addressed; there is intent and accelerated development to ensure that, over time, the availability of software portfolio for Solaris is on par (where it may not be today) - and that the software portfolio is a 1st class citizen of Solaris, that the stack gets integrated in a performant manner with Solaris; and elements that go into making it better for middleware or databases will also benefit other 3rd party and open-source workloads as well...and that it will continue to give you the rich platform choice, continuing to be the world's #1 enterprise OS that lends itself to Very Many flavours of situational computing out there.  Would love to hear your thoughts, your successes or challenges.

I look forward to seeing you at Oracle OpenWorld 2010 in San Francisco this September! Duncan Hardie,  Jeff Victor and I are
hosting a session on the 21st of September, title of which is "Optimize Legacy/Modern Application Environments with Oracle Solaris Containers" (part of "Oracle Solaris" track) at 2pm in Moscone South, Room 301

'till then!

Saturday May 31, 2008

GRUB me the wrong way and I just may lose some data

Earlier this month we released OpenSolaris 2008.05 - a distribution of OpenSolaris that, so far, runs on any modern Intel/AMD/VIA processor.  This past weekend I decided to enter the fray into the
world of multiboot-ability with GRUB and see just how fun prowling the multiboot world can get! 

Thanks to lots of good documentation floating on the 'net  (that our friends at google make so easily accessible at the dyer time of need) - and most importantly thanks to our very own Partition Magician - good colleague Bob Netherton, for offering
some very inquisitive pointers at the very right time ....thus saving me from myself on more then one occasion. Bob - you r0ck!  Additionally this is one good GRUB reference I would recommend anyone beginning to tinker with GRUB.

So - what have I learned through it all?   Don't use parttype
in grub w/out having first jotted down what your disk geometry looks like - for starters.  Grub's parttype command allows you to specify the type of the partition.  Given the original PC's limitations of 4 primary partitions - and only one Solaris parition being available at any given time, it took some whimsical thinking to plow through some of the challenges that I wedged myself with right off the bat ...

What did I start off with ?

I started off with a fairly common entry-level environment - a PC that has 1 physical disk with the following partitions:

WinXP, Recovery Partition (provided by the PC vendor), Solaris Express Developer Edition 1/08 (that I had previously installed), and a FAT32 partition that I have data on, that I use for sharing across the Windows and SXDE partitions.  Without knowing any better, right off the bat, I managed to wedge my PC into thinking that the FAT32 partition is now a ext3 partition...or was it ext2... or was it fat.... and not having known to just be careful about plowing head-in, I almost ran for the recovery DVD's to re-start the entire process from scratch. 

Scratch is what I did, but not the PC but my own head. It hurt quite a bit, to have to realise that the only way forward was to succumb to the fact that with an innocentlly executed parttype command, a partition that was formerly known to be of one type, now could no longer be seen by anything.  OUCH. I guess I could've anticipated that - and in retrospect I am not surprised by my actions.  Thank G-D for Backups! :-)

Now, what did I set out to achieve ?

I wanted to have a partition running with OpenSolaris 2008.05  in such a way that would still allow me to boot into SXDE and Windows XP.  There can only be one Solaris parition on a disk, and so the trickery that is involved is (although well documented by Bob here) still needs to take place. And with the version of GRUB that OpenSolaris comes with, relying on the reviously only-supplied-by-Fedora Core-version-of-Grub on this machine was not necessary.

I ended up deleting (in a planned fashion) a Recovery Partition and shrinking the Windows XP partition, to combine a 11GB worth of disk space to be made available to OpenSolaris. Now, it doesn't really need that much - only about 3GB or so, but I know I'll be installing various types of packages over time so thought may as well have the space available now.  I had formatted that newly created partition into ext3 (Linux) for the purposes of then relying on Grub to act as a switch between ext3 and Solaris.  Because of the limitation that there can only be one Solaris partition (not sure where that comes from - if anyone knows, please feel free to comment - I haven't googled this yet :-), the boot process entails making the Solaris partition that you want to boot into the active one, anytime you want to boot into it.

To install OpenSolaris 2008.05, what I had to do was to, prior to loading the OpenSolaris LiveCD into memory, jump out at the GRUB screen and instruct  GRUB to take the ext3 partition and convert it into a Solaris partition. At the same time, I had to instruct GRUB to convert the existing Solaris partition (that has my SXDE OS image loaded on it) into a ext3 partition.  The syntax for that (as is on my PC) looks like this:

grub> hd0, <HIT TAB>

<you get a screenfull of currently defined partitions, then:>

parttype (hd0,1) 0xbf
parttype (hd0,3) 0x83

What this does is it takes the 1st partition on the 0th disk (everyone starts their counting differently, Grub too - so don't grub it the wrong way else you're in for a treat!)

The geometry (hd0) command comes in handy, too.  My map looked like the following, prior to installing OpenSolaris:

hd0,0 - 0x7 (NTFS) 
hd0,1 - 0x83 (ext2fs)
hd0,3 - 0xbf (Solaris/ufs)
hd0,4 - 0xb (FAT)

After the installation of OpenSolaris into this multiboot environment, what I ended up with is a map that essentially looks like this:

hd0,0 - 0x7 NTFS
hd0,1 - 0x83 OpenSolaris
hd0,3 - 0x83 (zfs)
hd0,4 - 0xb (FAT)

The Grub menu.lst file lives in each respective Solaris partitions that, depending on what environment I want to boot into, needs to be made active.  The syntax for making that happen entails modifying each of the menu.lst files and reversing the order of the previously executed step.

title OpenSolaris 2008.05 Release - snv_86
parttype (hd0,1) 0xbf
parttype (hd0,3) 0x83
root (hd0,1)
chainloader +1

Right now the user experience looks something like this. By default the system comes up and gives me a choice whether to boot into OpenSolaris, Windows or Solaris Express Developer Edition.

If I opt to boot into the Solaris Express Developer Edition, the following command is executed behind the scenes and I am presented with the following screen - which is being read from the other Solaris partition that I have on my PC:

title Solaris Express Developer Edition 3 snv79
parttype (hd0,1) 0x83
parttype (hd0,3) 0xbf
root (hd0,3)
chainloader +1

Of course, bootability into Windows looks familiar and is not modified:

title Windows XP
rootnoverify (hd0,0)
chainloader +1

The unlucky partition that I managed to blow away was that very shareable FAT32
partition. Of course, not figuring out exactly what syntax must apply to FAT32 partitions, it took some thinking. Turns out, the help parttype syntax in Grub does not really reveal acceptable types - only the Hex range is the best I could find in the documentation - and that didn't help me.  At the end of it all, though, parttype (hd0,4) 0xb is what dresses up a partition in the FAT32 armor.  Betcha didn't know that. I didn't either... and still am somewhat not fully convinced I understand the differences between 0x6, 0x1b and 0xc - all of which appear as FAT paritions in GRUB.  But I'll leave that for another day as that lawn that I haven't touched for 2 weeks (having been out of the country on biz travel) is just beggin' for some mowin'...

Saturday Jan 12, 2008

Sharing The Wealth (as an atomic operation)

Knowledge wealth is not a measure of material quantity; it is a relative measure of metadata's meaning, begging to be set free.

I often feel that people can be much more successful and productive if they share the information they know - openly.  So to do my part, I gladly welcomed an opportunity to participate in Sun's Tech Days 2008 WorldTour (details) and traveled to Atlanta, Georgia, this week, to present on some of Sun's very cool technologies (our Solaris operating environment and ZFS - two of which have recently been recognized as the Best Server OS and Best Filesystem (respectively) by InfoWorld in their annual technology review

While there, I had joined a dozen other Sun engineers, collegues and partners in a 2-day event focusing on OpenSolaris, Solaris and Sun's developer-centric technologies and tools (and we've got quite a community). The entire event took place at the Cobb Galleria Center; and the agenda was packed!

If you're a developer, an existing customer or a prospect, please do take the time to participate in these free events when they come to your town.  There were hundreds of attendees in Atlanta (I don't have the actual numbers yet) - and they all came to hear about cool Sun technologies: development tools, operating system features and services - and most improtantly, (I think), to interact and hear directly from Sun engineers involved in development of (and day-to-day activities with) these technologies. 

Speakers included (in random order) Jeet Kaul, Ian Murdock, Michael Ingrassia, Valerie Fenwick, Sowmini Varadhan, Scott Dickson, Don Deal and many others whose sessions I did not get a chance to attend. You can see the entire list of speakers and presentations from the OpenSolaris day.  I lead a session on OpenSolaris (A Definition) and on ZFS (with a focus on why developers should think of using it). Would love to hear your thoughts about these technologies and what creative uses you've come up with to entrust your business to them.

I also dived into examples of what I've dubbed Solaris Multiplicity -  a practice  of using various Solaris technologies jointly to come up with an economically malleable index representative of deriving increased levels of value for your enterprise,  rather then partially using subsets of these technologies and come up short of the full potential.

More on this soon...

1/2/2010 Update - I organized my thoughts on this further into a slide set forming a presentation.  Whilst with examples, it takes about 1.5 hours to go through the entire deck.  I had presented it at Immersion Week (a Sun conference) in 2008. If <PG DOWN> is your key of the day, it is conceivable that you might be able to go through the slides much faster ... Your thoughts are welcomed!

Sunday Nov 18, 2007

High Performance on Wall Street

What is Sun doing in the High Performance Computing space ?  The answer to that question is likely to surprise you, especially if you have not followed Sun for the last few quarters.  Generally, what contributes to the answer are components that help enable technology solutions that meet specific requirements of latency-sensitive workloads. But the answer does not stop there.  A solution typically spans various parts of the architecture stack, from the hardware, through the operating system software and up into the end-user's application. Availability of the operating system's libraries and operating system's intrinsic observability, virtualization and high availability capabilities become essential elements   of an overall solution that business depends on.  To that end, I had an   opportunity to share in the fun of hosting a Sun solutions exhibit at this   past September's High Performance on Wall Street  event, at the   Roosevelt Hotel in New York City.  There were lots of interesting   keynotes from customers who are actually facing these challenges and   are working through them. Additionally, there were a number of   interesting exhibitors sharing the exhibit halls with us.  I, of course, talked  about successes seen by our customers through the use of Solaris  DTrace facility, our rich x64 AMD/Intel product line and various  developer tools and solutions that have HPC requirements and  capabilities inherent in them and examples of our joint work with Cisco,   Intel and Reuters that has allowed for ....very interesting low-latency   performance solutions for financial services customers.  There's a global group of customer-facing archtiects and performance engineers (lead by   Ambreesh Khanna) that is actually leading the charge on a number of   solutions geared toward aiding our global financial customer base. Overall, you can  read more about what Sun is doing in the High Performance Computing space by following this link:  http://www.sun.com/servers/hpc/index.jsp

(Photograph courtesy of Dov Friedmann. More photographs from the event are available here)

Thursday Oct 11, 2007

OpenSolaris IS a glimpse into Solaris futures

ave you caught the OpenSolaris wave yet?  Back in the Summer, Brian Gupta and I kicked off a New York OpenSolaris User Group  for the purposes of educating the community and soliciting input from the community on how people use OpenSolaris and Solaris technologies. This is meant as a two-way street. Check out some of our previous meeting pages, as well as an upcoming meeting on November 15th.   Why is all of this important? If you are not yet aware of how Solaris development works, you may be pleasantly surprised that all of the development of new technologies goes into the next release of Solaris that is currently being developed in the open - dubbed Nevada, i.e. SunOS 5.11. You can actually get a taste for some of the technologies that have been opensourced as part of that process and experience them before SunOS 5.11 is released commercially.  If you look back at the calendar, eventhough Solaris 10 came out commercially in January of 2005, our engineers had already been running builds of it back in 2002.  That means enhancements to (or new) features like CIFS, ZFS, FMA, enhancements to crypto technologies, DTrace, Intel-specific enhancements ....are all available today, via opensolaris distributions, such as SXDE 9/07 release.  Check out the free SXDE release...

Monday Apr 30, 2007

Unplanned Uptime of Loco Zones

Last week I had been in Mexico City, presenting at Sun's Immersion Week 2007 conference.  Speaking on 2 topics, twice each in one day, does pose its challenges.  As one example from this presentation, I recall catching myself uttering phrases  like "Unplanned Uptime" (in the context of Solaris features like SMF, FMA, Zones).  Of course, my intent was to elaborate on how Solaris 10 helps minimize unplanned downtime.... but jinxing myself with phrases like unplanned uptime was certainly a jaw-dropper, to say the least.  

Another, as shared over diner with Scott, Antonio, Mauricio, Jazmin and Enrique later in the day, was to maximize unplanned uptime of non-global (loco) zones.  But you'll haveta buy me a drink to get the full story. :-)


Saturday Apr 14, 2007

Solaris 10 Adoption and Minimization

Solaris 10, as well as OS minimization in general, still seems like a hot topic these days.  Following my Customer Engineering Conference '06 blog entry from San Francisco, I've uploaded  a Solaris 10 Adoption and Minimization presentation [pdf] that I gave there last October.

Would love to hear how you're doing in this space and if you need help, would love to come and talk with you and your organization about lessons learned in the field on this very interesting subject.

Friday Mar 30, 2007

Florida's OpenSolaris User Group Meetings

    Last month I was invited to help kick-off a monthly set of OpenSolaris User Group meetings in Florida. We had started with Ft. Lauderdale and Tampa, and in the months ahead are planning to be adding Orlando. We had a number of participants and, as promised, I wanted to post the presentation that I had given. The topic was OpenSolaris and, as we'd say in Russian, "what would you eat it with", or more precisely: What it is, how it works and why it may be of interest to you - the community.

 The slides for Ft. Lauderdale's discussion are here whilst the presentation deck for Tampa is located here. The content was pretty much the same. Both are in .PDF.   You can use Evince or gpdf to view these (in OpenSolaris).

Big kudos to Forsythe, Inc. for helping sponsor this event, and in particular to John McLaughlin, who runs SystemNews e-newsletter.


OS Ambassadors Spring 2007

Today was the conclusion of yet another successful OS Ambassadors conference in Menlo Park, CA. OS Ambassadors is a Sun internal program that exists to foster cross-pollination of ideas and trends between R&D and field/customer-facing engineers. Over the years, there have been many different Ambassadorship programs, focusing on topics such as Operating Systems, Security, Data Center, Commercial Frameworks & Tools, Networks, Cluster, Campus (universities), etc.
For those that may not know what this is, the OS Ambassador program consists of roughly 100 deeply skilled technical architects and customer-facing engineers with membership spanning the entire globe and representation from various organizations such as Globals Sales, SunService and Professional Services Delivery. We help shape the trends in Solaris by being a consistent conduit between our engineering organisation and the customers we work with on daily basis. There are many interesting perspectives that get voiced during such meetings from people facing a myriad of unique customers' business problems and technology challenges.
Topics discussed this week included: ...And I had a relatively seamless installation of Solaris Express Developer Release (based on Solaris Next, or as we call it "Nevada" build 60). I did find and file a bug with GNOME, but a day later thought about it some more and figured out what was causing the failure. (The failure, by the way, is apparently present in some of the previous builds as well).
So stay tuned. And if you're not on OpenSolaris yet, please take a look and become a member of this rapidly growing community. That is the one place where you can see and experience new features in the land of Solaris development and get a sense for some future directions!

Wednesday Jan 31, 2007

Thinking in terms of Dtrace

Ever feel like your actions are setting off a number of events in motion ?

We do this every day, I suppose -but w/out getting too philosophical (or geeky) here's a way I think of this now:

If I were dtrace, I'd be witnessing probes getting fired.


Monday Jan 15, 2007

Processor bindings in Solaris - quick and dirty

A note I had written to...someone...very...close... and - thought would be worthwhile to convert to a blog :-)
Ok, so I remembered our conversation from last week about binding processors to CPUs and seeing which processors these processes are actually bound to.

I wanted to refresh this in my memory and went through this last night and here are my notes.

I am doing this on a Solaris 10 server with 2 CPUs, but am assuming that this should work on Solaris 8 as well. The only caution I have is around not being completely sure whether the 'ps' command was actually enhanced to take '-o psr' as one of the 'format' arguments to it back in Solaris 8 or not. (You can validate this just by running '/usr/bin/ps -o' alone on the command line and looking for a line that includes 'psr' and/or 'pset' in the output.

Before I start, the output should look like the following:

root@ultra2cd:/export/home/isaac: 23:35 > /usr/bin/ps -o
/usr/bin/ps: option requires an argument -- o
usage: ps [ -aAdeflcjLPyZ ] [ -o format ] [ -t termlist ]
        [ -u userlist ] [ -U userlist ] [ -G grouplist ]
        [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ]
  'format' is one or more of:
        user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid
        pri opri pcpu pmem vsz rss osz nice class time etime stime zone zoneid
        f s c lwp nlwp psr tty addr wchan fname comm args projid project pset

Now, onto the experiment. The goal is to find out whether, given a certain process that is running (and you knowing its PID) we need to understand which CPU it is bound to (if any, in fact) ?

On my system, I have 2 CPUs: CPU 0 and CPU 1, as seen from the output of 'mpstat 1' below:

root@ultra2cd:/export/home/isaac: 23:29 > mpstat 1
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0  112   401  301   49   11    2    0    0    16   34   0   0  65
  1    0   0    0   114  102   54   12    2    0    0    15   34   0   0  65
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  140   0  273   623  524   90    2   10    5    0   374    2   4   0  95
  1  489   3  215   112  106   57    4    4    6    0   364    2  14   0  84
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  381  11  183   590  490  106    5    8   63    0  1318   14  21   0  66
  1   83   0  186   110  105   98    2   12   64    0   365    1  20   0  79
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  209   7  366   596  490  110    8   10  121    0   864    5  10   0  85
  1  601  11  100   116  111   99    4   11  122    0  2132   23  23   0  54
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0   41   1  290   638  535   99    2    9   73    0   263    1   5   0  94
  1  407   8  103   117  108   59    7    6   68    0  2467   18  34   0  49

Now, in my environment, I don't YET have a process bound to a CPU, so I will pick a random process and bind it to CPU #0. I pick my 'tcsh' process, and note its PID (16029).

root@ultra2cd:/export/home/isaac: 23:29 > ps
   PID TTY         TIME CMD
 16014 pts/3       0:00 sh
 16029 pts/3       0:00 tcsh
 16130 pts/3       0:00 ps

Then, I run 'pbind -b 0 16029' (after glancing over 'man pbind')

root@ultra2cd:/export/home/isaac: 23:30 > pbind -b 0 16029
process id 16029: was not bound, now 0

This tells me that process was not bound and now is bound to CPU 0.

I can check that with 'pbind -Q', which will tell me all bindings.

root@ultra2cd:/export/home/isaac: 23:30 > pbind -Q
process id 16029: 0
process id 16331: 0

Notice that now I have another process, 16331. What is that ?

That's the 'pbind -Q' itself, which the shell forks to execute the query.

Notice if I keep doing it, the PID of this "mysterious" process will change. That is because there's a new process ID being created every time I invoke 'pbind -Q'.

root@ultra2cd:/export/home/isaac: 23:30 > pbind -Q
process id 16029: 0
process id 16353: 0

So now I have a process bound to a CPU, so this should be somewhat equal given your original question of 'how can I see what CPU a process is bound to?'.
Armed with a PID of a process of interest, I feed it to 'ps' and ask 'ps' to shape its output format to include the actual processor, and I get my answer below:

root@ultra2cd:/export/home/isaac: 23:31 > ps -p 16029 -o psr

Looking at the various additional formats that 'ps' is capable of arranging, I can include things like the pid itself, or the fname, or even the memory address of the process (see '/usr/bin/ps -o' for the additional types of formats supported).

root@ultra2cd:/export/home/isaac: 23:31 > ps -p 16029 -o psr,pid,fname
  0 16029 tcsh

root@ultra2cd:/export/home/isaac: 23:31 > ps -p 16029 -o psr,pid,fname,pset,addr
  0 16029 tcsh       -      300021ab7f8

Now, what if I had a number of processes that I kicked off from the current shell, and the list was too numerous to provide ? I could simply omit the '-p PID' option to 'ps' and here's what I'd get.

root@ultra2cd:/export/home/isaac: 23:31 > ps -o psr,pid,fname,pset
  - 16014 sh         -
  0 16029 tcsh       -
  0 16527 ps         -

(Note that 'pset' is a processor set, something that I have not yet configured on this system, hence the corresponding output is simply a hyphen)
What if I wanted the address ?

root@ultra2cd:/export/home/isaac: 23:31 > ps -o psr,pid,fname,pset,addr
  - 16014 sh         -      3000219afc0
  0 16029 tcsh       -      300021ab7f8
  0 16558 ps         -      300021bb350
root@ultra2cd:/export/home/isaac: 23:32 >

What if I wanted more things, including user, time, scheduling class, priority ?

root@ultra2cd:/export/home/isaac: 23:48 > ps -p 16029 -o psr,pid,fname,pset,addr,time,class,pri,user
PSR   PID COMMAND  PSET             ADDR        TIME  CLS PRI     USER
  0 16029 tcsh       -      300021ab7f8       00:00  FSS  59     root
root@ultra2cd:/export/home/isaac: 23:48 >

root@ultra2cd:/export/home/isaac: 23:48 > ps  -o psr,pid,fname,pset,addr,time,class,pri,user
PSR   PID COMMAND  PSET             ADDR        TIME  CLS PRI     USER
  - 16014 sh         -      3000219afc0       00:00  FSS  11     root
  0 16029 tcsh       -      300021ab7f8       00:00  FSS  52     root
  0 18453 ps         -      300021c1348       00:00  FSS  59     root
root@ultra2cd:/export/home/isaac: 23:49 >

I think you get the point.

One last comment: notice how the 'sh' process has a '-' hyphen in the PSR column. That is because the 'sh' process is a parent of 'tcsh' and, since I only bound 'tcsh' itself, only tcsh's children will be bound, not the parents. Processor bindings are inherited from the parent.

Hope this was interesting and helpful.

Wednesday Aug 31, 2005

Backing up via a one-liner shell alias

In the never ending saga of comparing disk vs. tape backup technologies there is clearly space for both. And not to turn this into a rant on why one is better then the other (with an everseeming conclusion that the right answer is always: "it depends"), I figured I'd just blog a self-serving response to a long outstanding request I've made to the customer (I'm supporting) for getting my lab machine backed up to tape while I am doing some development work for them.

I am currently working on a project involving definition of a Solaris 10 core image, based on the Solaris' jumpstart framework. This requires a respectable size of file systems to work with in terms of customizing the jumpstart environment. One of the things that I've requested is for a backup agent to be attached to this host in the lab, so that these file systems would be backed up on a nightly basis.

In the meantime, since there's plenty of spare disk that's already available on this machine, I challenged myself with writing a shell alias to do a backup routine on a daily basis. (A shell script just didn't seem that challenging anymore). So here I was, trying to see how much shell-execution I can really fit inside of an alias. Took some testing and at the end here is what I came up with. (I use tcsh as my shell and put this one-liner in my .cshrc file)

alias backup 'cd /export/gsjumpstart; mkdir /backups/export/gsjumpstart_`date | cut -b5-10 | sed 's/\\\\ /_/'`; find . -print | cpio -pmdv /backups/export/gsjumpstart_`date | cut -b5-10 | sed 's/\\\\ /_/'`'

(No big deal, really.)

This allows me to ensure that a new backup image consisting of the same file system and directory structure is created (whenever I kick off 'backup') and is dumped into its own directory marked with a date. You can easily expand it to cover time of day as well.

'till next time.

Monday Feb 14, 2005

[!out]sourcing .cshrc

Unsurprisingly, in this day and age of Googlemania(tm), I came across a Tcsh built-in variable set in a recent quest for leveraging the most out of my shell prompt. Not having touched this in ages makes you take too many basic capabilities for granted...and so as Cupid's arrow hit me I, in turn, hit Google.

Since I use Tcsh and already have a reasonably useful prompt, I wanted to add a time stamp to it.

Previously looking like this:


...and being defined as:

set prompt="`whoami`@`hostname`:`pwd`> "
.. I really did want a time stamp in there...somewhere.

Giving a few whirls with the /bin/date command made me realize that there must be a better way.

And as it turns out there is and its only 2 bytes long: %t

What I really wanted was something that looks like this:

isaac@unixman:/export/home/isaac: 12:20pm

So, I've added %B and %b around %t, and now my prompt in my .cshrc file is defined as:

set prompt="`/usr/ucb/whoami`@`hostname`:`pwd`:%B %t %b> "

You can also use %T, instead of %t, to change time format of the time printed.

Cool, huh ?

Dtracing your rootless self...

Solaris Dynamic Tracing Guide is by far an excellent starting point for learning about the dynamic tracing framework for Solaris production (yes!) environments.

Of special interest is Chapter 35, on Security (see details on Casper Dik's weblog). In this chapter, we can familiarize ourselves with the 3 dtrace_\* privileges that would be required for a non-root user to dtrace their own processes (or possibly even not their own processes).

Here is an example on what needs to be configured, and how to verify the configuration

In my case, as root, I executed the following to give user isaac dtrace_proc and dtrace_user privileges.

usermod -K defaultpriv=basic,dtrace_proc,dtrace_user isaac

which updated /etc/user_attr with the following line:


You could also add 'dtrace_kernel' to these priveleges as well to be able to dtrace the kernel.

You have to re-login for these settings to take effect.

Verify on your own, running process as such:

$ /usr/sbin/dtrace -n 'pid$pid:libc.so.1::entry'

Additional references that might be of interest are:

  1. DTrace Case Study For Developers
  2. DTrace (for those of you who read Russian)

Thursday Feb 10, 2005

Onward to Zonedtrace land ...

So you heard about zones and you'd like to tinker with them a bit. Well, here are my notes from what I recently did on my workstation. Hopefully you'll benefit from what you read about here; let me know. The things used here are results of my own observations. Definitely consider visiting Sun's online documentation site for more specific details.

Let's configure zones. I have a recently released version of Solaris 10 (build 74L2a), hostname: unixman, and the 3 zones I'm creating are called: zunixman, zunixman1, zunixman2. One of the commands that we need to become familiar with is: zonecfg For example: zonecfg -z where could be, say, zunixman will cause a new prompt to appear, like so, allowing you to start populating characteristics for a zone:

Use 'create' to begin configuring a new zone.

Now, for those of you who have tinkered with computers for a while, this is a comfortable prompt that will take, as input, various commands pertaining to sculpting a zone as per your desire and available system resources. If you are unsure of what to do, type "?" at the prompt and you'll get an equivalent of paging a SysOp and getting some guidance as to what your options are. Very similar to the way Cisco IOS, and other configurable tools (SMC, LDAP, Sun's Application server, MySQL) and the like, work.

Please note: There are things here that may not be necessary for you to begin with, such as resource controls perhaps. Our searchable website over at http://docs.sun.com has the documentation of zones available and you should check it out for the complete list of things that might be of interest in your environment. Essentially, you start by feeding attributes of what how you want your zone configured, one command at a time:

set zonepath=/export/allzones_configs/zunixman                        
Note, this directory must be created manually.
set autoboot=true
add fs
set dir=/export/home
          set special=/export/allzones_data/zunixman/export/home
Note, this directory also must be created manually. You continue ...
          set type=lofs
add options [rw,nodevices]
add net
set address=
set physical=eri0
add rctl
set name=zone.cpu-shares
add value (priv=privileged,limit=20,action=none)
add attr
set name=comment
set type=string
set value="This is zone named zunixman"

At the end, type:

...to ensure that the configuration syntax is verified and saved. See ? This \*does\* have a database-like approach to things. Whoever said that "commit" was just an Oracle database thing? :-) Verify that everything was fine:
zoneadm -z zunixman verify
If errors appear, fix them.

Install a zone:

zoneadm -z zunixman install
Check the status based on the response to the above command:
zoneadm -z zunixman list -v
(Note, in my case the zone is already running. At this point, the STATUS would be different).

root@unixman:/: 4:15pm > zoneadm -z zunixman list -v
  ID NAME             STATUS         PATH                         
  17 zunixman         running        /export/allzones_configs/zunixman

If the status says "incomplete", there was a problem. After you fix the problem, un-install the zone first:
zoneadm -z zunixman uninstall
Then make the corrections specified in the message, and try the 'zoneadm -z zunixman install' command again. Once any errors (if any were) are fixed, make it ready and boot it - after its installed:
zoneadm -z zunixman ready
zoneadm -z zunixman boot
You will see a process that copies necessary file structures for the zone (root and dev) directories into the /export/home directory tree we had specified above. This process may take a few minutes, so while its happening, here's a trick that will help you ease the deployment for zones. You can save all of the zone attribute commands into a file, and then just pass the name of the file every time you're building a new zone (remembering, of course, to make slight modifications to IP/hostname/zonepath/others, as necessary) values to reflect the new zones you're building. On my machine, it looks like this, for the last zone I built, called: zunixman2 Note that the indentation is not necessary; but it helps make the file more readable, as the pieces pertaining to various objects are input.
set zonepath=/export/allzones_configs/zunixman2
set autoboot=true
add fs
set dir=/export/home
set special=/export/allzones_data/zunixman2/export/home
set type=lofs
add options [rw,nodevices]
add net
set address=
set physical=eri0
add rctl
set name=zone.cpu-shares
add value (priv=privileged,limit=20,action=none)
add attr
set name=comment
set type=string
set value="This is zone named zunixman2"
When all of your zones are built and booted, you can run 'zoneadm' to see all of them, and their state.
root@unixman:/: 4:15pm > zoneadm list -vi
  ID NAME             STATUS         PATH                         
   0 global           running        /                            
  14 zunixman1        running        /export/allzones_configs/zunixman1
  15 zunixman2        running        /export/allzones_configs/zunixman2
  17 zunixman         running        /export/allzones_configs/zunixman
root@unixman:/: 4:25pm >
To login to the zone, use 'zlogin', and in fact you should do it as root for the first time because that will allow you to create further accounts, and ensure the run-time environment is ok. Sorta like an old MUD game. Login, look around, see if there's anything you don't like.... Of course, there's no one to talk to, so you're on your own. :-) To login to the console of a zone, use: 'zlogin -z zone_name' The first thing you'll notice is that the zone is sharing the kernel space with the global zone (the zone with the ID of 0 in the output of 'zoneadm list -vi'). One thing you can check to validate if you are in a zone or not, is to check for the 'zsched' process. If its in your process table (run: pgrep zsched), you're likely in a private (non-global) zone. So you've configured the zones and now you want to run things inside the zones. In this example, we'll consider that you are using a binary called 'nspin', which was bundled in the previous release of Solaris Resource Manager (that was version 1.3 on Solaris 8 environments). Now, you don't have to use that binary, but I am only using it to illustrate it as a simple cpu hog. So, I copy that binary to each of my zones from within the global zone. The nice thing is that I don't have to "go through" the global zone. I could scp or ftp (use scp if you can) the binary directly into the zone. Once the binary ends up in the zone, it can be seen from the global zone by running a "find ..." command with the name of the binary, but that's not our goal here. We \*know\* that the binary is in the zone and now we want to execute it.

So, we execute it just by running "nspin &". This default behaviour creates one cpu-bound thread. We do this in 2 of our zones (we don't have to, but I did for the fun of it).

Then, in our global zone, we can observe the usage between zones by using 'prstat' with '-Z' argument.

 26037 root     1136K  760K run     10    0   0:15:07  52% nspin/1
 25987 root     1136K 1072K run     20    0   0:14:26  44% nspin/1
   535 root       74M   63M sleep   59    0   1:53:37 0.5% Xsun/1
 26148 root     4992K 4512K sleep   49    0   0:00:08 0.4% prstat/1
 26244 root     4992K 4688K cpu0    59    0   0:00:00 0.4% prstat/1
 26177 root     4768K 4336K sleep   59    0   0:00:01 0.1% prstat/1
   721 root     7888K 2296K sleep   59    0   0:16:07 0.1% sdtperfmeter/1
   728 root     2112K  800K sleep   59    0   0:16:42 0.1% rpc.rstatd/1
  4559 root       77M   54M sleep   59    0   0:20:33 0.0% mozilla-bin/4
  4750 root     7864K 3336K sleep   49    0   0:00:09 0.0% dtterm/1
 21723 root     3768K 2760K sleep   49    0   0:00:00 0.0% xterm/1
 26204 yan      8160K 2584K sleep   59    0   0:00:00 0.0% sshd/1
   867 root     4064K 2880K sleep   59    0   0:01:26 0.0% nscd/24
   777 root     9688K 6048K sleep   49    0   0:03:07 0.0% dtterm/1
 23056 root     3736K 2960K sleep   59    0   0:00:04 0.0% nscd/25
  2636 root     5440K 1152K sleep   59    0   0:02:22 0.0% ssh/1
 26215 root     2944K 2368K sleep   59    0   0:00:00 0.0% tcsh/1
 24596 root     9936K 6952K sleep   59    0   0:00:08 0.0% svc.startd/12
ZONEID    NPROC  SIZE   RSS MEMORY      TIME  CPU ZONE                
    14       31  111M   71M   7.0%   0:15:41  53% zunixman1           
    17       30  106M   67M   6.7%   0:14:57  44% zunixman            
     0      136  656M  319M    32%   3:15:25 1.7% global              
    15       36  138M   86M   8.6%   0:00:32 0.0% zunixman2           
Notice how there are 2 nspin's being reported, with with 1 LWP/thread (the first top-most lines).

We can also see what is the usage amongst various zones. The bottom-most 4 lines show you distribution of CPU% among the zones based on the work requirement of each zone at this time.

What else can we do ? Assuming we DID NOT KNOW what nspin is, or where it is located.... and we were called in to troubleshoot "performance problems" on this system. What would we do ?

Well, we can either use the traditional means of trying to find what 'nspin' is. Nothing wrong with that. Go off and run "find / -name nspin -print" and you'll see where 'nspin' lives on the file systems. May not be enough though. What would be nice is to get an answer to the question as to what is REALLY being done by nspin... ?

Any guesses as to what new facility in Solaris 10 we could use ?

Those of you who thought about Dtrace should pat yourselves on the back. Yes, the Dynamic Tracing framework that is introduced in Solaris 10 is built to instrument the kernel on "live" systems. Being armed with this, we can identify an offending process (say, 'nspin' from our process table) and run the following:

# dtrace -n 'pid$target:::entry{ @[probefunc] = count() }' -p 25987 

You can substitute the process ID of your own process that you see fit using "-p" above. Here, what I've done is asked the kernel to fire (show) probes as various events occur by printing the name of the function being entered and the quantity of times this takes place.

After I type that command, almost immediately the following is printed back:

dtrace: description 'pid$target:::entry' matched 2945 probes

This says that given the current conditions specified, the dtrace consumer (the command called "dtrace" up above) has matched over 2945 probes in the kernel. Now, after this line is printed, nothing happens. But why ? Well, the fact that nothing is printed does not mean that nothing is happening. What is in fact happening is that the kernel has enabled various probes and, based on your command above, is working to count how many times this pid enters a function. You can think of this as allowing the kernel to collect statistics about what its doing on the behalf of this process - and maintain those statistics that filter through the rule in your command line.

When sometime has gone by (a few seconds, minutes is usually enough) you can hit 'ctrl-c' to stop the 'dtrace' consumer. What you'll get as STDOUT is something like the following:

  docpu                                                          4206

Which tells you that the process you were interested in, called 'docpu' routine 4206 times. The complete picture of this Dtrace example looks like this on my workstation:
root@unixman:/export/home/isaac> dtrace -n 'pid$target:::entry{
@[probefunc] = count() }' -p 25987                
dtrace: description 'pid$target:::entry' matched 2945 probes
  docpu                                                          4206

Just for grins, imagine that you were interested in another process that runs in the zone, say a 'tcsh' process, with a pid of 25941, like so: (from another shell window)

root@unixman:/: 3:22pm > ps -fe | grep 25941
    root 25987 25941  44 14:28:11 zoneconsole   20:22 ./nspin
    root 25941 24696   0 14:27:37 zoneconsole    0:00 tcsh
    root 26275 26215   0 15:22:11 pts/17      0:00 grep 25941

Now, you want to run the same Dtrace command only on this process. Run the below command, and get to your other shell (tcsh) and invoke some activity on it. You would have to invoke some activity on it, or on another process had it been the one you're looking at, if you really wanted to see what it does when it does what it should :-) So here goes:

root@unixman:/: 3:22pm > dtrace -n 'pid$target:::entry{ @[probefunc] =
count() }' -p 25941
dtrace: description 'pid$target:::entry' matched 5980 probes
  _sbrk_unlocked                                                    2
  sbrk                                                              2
  _brk_unlocked                                                     2
  tcsetpgrp                                                         4
  __setcontext                                                      4
  setcontext                                                        4
  enthist                                                           4
  __sighndlr                                                        4
  unsleep_self                                                      4
  pchild                                                            4
  sigacthandler                                                     4
  __sigsuspend                                                      4
  sigsuspend                                                        4
  call_user_handler                                                 4
  _sigpause                                                         4
  sigpause                                                          4
  pjwait                                                            4
  sigdelset                                                         4
  __schedctl                                                        4
  setup_schedctl                                                    4
  copylex                                                           4
  pwait                                                             4
  srchx                                                             4
  setpgid                                                           4
  cond_signal                                                       4
  palloc                                                            4
  _libnsl_parent_atfork                                             4
  atexit_unlocks                                                    4
  mutex_held                                                        4
  fork_lock_exit                                                    4
  stdio_unlocks                                                     4
  libc_parent_atfork                                                4
  _postfork_parent_handler                                          4
  pthread_rwlock_unlock                                             4
  __fork1                                                           4
  suspend_fork                                                      4
  Dfix                                                              4
  continue_fork                                                     4
  isbfunc                                                           4
  rw_read_held                                                      4
  rw_wrlock_impl                                                    4
  rwlock_lock                                                       4
  stdio_locks                                                       4
  atexit_locks                                                      4
  libc_prepare_atfork                                               4
  pthread_rwlock_wrlock                                             4
  _prefork_handler                                                  4
  lastchr                                                           4
  unparse                                                           4
  _libnsl_prefork                                                   4
  fork_lock_enter                                                   4
  fork                                                              4
  pfork                                                             4
  putn                                                              4
  getsid                                                            4
  tcgetsid                                                          4
  job_cmd                                                           4
  GetSize                                                           5
  Refresh                                                           5
  tty_getchar                                                       5
  tty_gettabs                                                       5
  tty_cooked_mode                                                   5
  tty_geteightbit                                                   5
  tcgetattr                                                         5
  tty_getty                                                         5
  ResetInLine                                                       5
  Inputl                                                            5
  btell                                                             5
  lex                                                               5
  sched_next                                                        5
  atoi                                                              5
  short2str                                                         5
  setalarm                                                          5
  precmd                                                            5
  period_cmd                                                        5
  sched_run                                                         5
  watch_login                                                       5
  pendjob                                                           5
  check_window_size                                                 5
  postcmd                                                           5
  rmstar                                                            5
  continue_jobs                                                     5
  alias                                                             5
  savehist                                                          5
  unreadc                                                           5
  memcpy                                                            5
  memmove                                                           5
  Cookedmode                                                        5
  PastBottom                                                        5
  e_newline                                                         5
  times                                                             8
  _waitid                                                           8
  _waitpid                                                          8
  waitpid                                                           8
  set_parking_flag                                                  8
  rw_write_held                                                     8
  tglob                                                             8
  ___errno                                                          9
  execute                                                           9
  syntax                                                            9
  _cerror                                                           9
  tputs                                                            10
  Itoa                                                             10
  printprompt                                                      10
  alarm                                                            10
  tcsetattr                                                        10
  so_write                                                         10
  MoveToChar                                                       10
  freelex                                                          10
  StrQcmp                                                          10
  any                                                              10
  setjmp                                                           10
  ClearDisp                                                        10
  tty_setty                                                        10
  s_strlen                                                         12
  s_strcpy                                                         12
  _syscall6                                                        12
  sigon                                                            12
  trim                                                             13
  freesyn                                                          13
  setq                                                             13
  set1                                                             13
  set                                                              13
  blkfree                                                          13
  findenv                                                          15
  getenv                                                           15
  assert_no_libc_locks_held                                        15
  getsystemTZ                                                      15
  localtime_r                                                      15
  pthread_getspecific                                              15
  tsdalloc                                                         15
  localtime                                                        15
  tprintf                                                          15
  e_insert                                                         15
  sigismember                                                      15
  __sigaction                                                      15
  sigaction                                                        15
  sigset                                                           15
  MoveToLine                                                       15
  offtime_u                                                        15
  str2short                                                        15
  RefPlusOne                                                       15
  c_insert                                                         15
  ltzset_u                                                         15
  set_zone_context                                                 15
  cfgetispeed                                                      15
  tty_getspeed                                                     15
  calloc                                                           16
  time                                                             19
  __time                                                           19
  _read                                                            20
  mutex_unlock                                                     20
  GetNextChar                                                      20
  mutex_lock_impl                                                  20
  Load_input_line                                                  20
  read                                                             20
  mutex_lock                                                       20
  readc                                                            25
  write                                                            25
  Rawmode                                                          25
  _write                                                           25
  strcmp                                                           30
  ioctl                                                            33
  flush                                                            40
  lmutex_lock                                                      44
  lmutex_unlock                                                    44
  s_strsave                                                        49
  value1                                                           53
  sighold                                                          55
  sigrelse                                                         55
  _save_nv_regs                                                    61
  free                                                             83
  malloc                                                          101
  pthread_sigmask                                                 129
  sigprocmask                                                     129
  sigaddset                                                       129
  __systemcall6                                                   133
  __lwp_sigmask                                                   133
  sigemptyset                                                     135
  block_all_signals                                               137
  adrof1                                                          140
  s_strcmp                                                        144
  sigvalid                                                        148
  putraw                                                          240
  SetAttributes                                                   245
  putpure                                                         275
  memset                                                         4100

Now, this is quite an extensive list, but what you can gather from this is that 'memset' was the function that was entered the most. If you are a developer or a Systems Administrator trying to troubleshoot this process, you should ask yourself why is this happening ? This is the way to troubleshoot performance problems, and this is a small, and not by all means, an exhaustive way to go about it in Solaris.

Nice, huh ? There's plenty more. In fact, plenty is not even enough to describe this. Go ahead, have fun. Start making accounts and doing all other sorts of things. Read up on zones at docs.sun.com and our other public forums, as well as the BigAdmin site and the developer.sun.com site.

'till next time.


Isaac Rozenfeld is a Product Manager for Oracle Solaris; current responsibilities include the portfolio of networking and installation technologies in Solaris, with a focus on easing the overall application deployment experience

You can follow Isaac on Twitter @izfromsun


« April 2014
Tech Reference

No bookmarks in folder