Wednesday Dec 14, 2005

Xeon is just as Good as Opteron (says HP!)

Thanks to the folks at theInq for calling our attention to this one. Aside from the Inq's warning that you need an HP signon, also be prepared for a Microsoft Word source document. It's actually a rather nice writeup, and it's conclusion seems to be "true". If your workload is bottlenecked on I/O having a faster processor with a faster memory interface won't help.

Of course, that really is pretty obvious and doesn't take 21 pages to justify now does it?

In a variety of places, their detailed analysis favors the AMD processor, for example from page 14
1.The more content that can be cached in memory the greater the Opteron advantage. This performance difference is tied to the different designs of the two processors... detailing the FSB vs. HT issues

"Countered with"
2.If any of the server sub-components become a bottleneck, the Opteron memory access speed advantage is negated.

So the obvious solution is to favor systems with fast processors, fast and ample memory bandwidth and fast I/O subsystems.

Unfortunately  that doesn't result in "all processors being equal" at least not if you buy the right subsystems ;>

Monday Dec 12, 2005

The sad saga of xemacs vs. gnu emacs

I'd been a longtime user of the Xemacs that came packaged with the Sun Studio tools years ago. I knew there was a split, and a dreamed of merge ... but I'd never really quizzed Ben Wing or Martin Buckholtz (sp?) (two of the Sun engineers who contributed the linkage code, and were Xemacs maintainers) about the how and why of the split and fork.

Here is a pointer to at least side's worth of the sad saga.

Thursday Dec 08, 2005

Caches Considered Harmful

For what seems like forever, designers have been adding more and more cache to systems to reduce latency to memory. This has been successful, but it hasn't been the only approach, but it has been the most typical.

But has it been Good?
  • Caches are very energy intensive (essentially large amount of SRAM close to the CPU). The larger they are, the more energy wasteful they are.
  • Caches, on average, produce a benefit on the order of sqrt(size), so the heat outpaces the benefit.
    • Of course, with heat you pay several times. You pay for the electricity to create the heat, you pay in the system design to cool the device, you pay in the data center to cool the entire system, and you typically pay a price in RAS because heat kills.
    • Notably, adding cores (providing enough memory bandwidth has been provided) provides nearly linear improvement in throughput.
    • And for cache experts, increasing the associativity increases their effectiveness.
  • Caches help us avoid dealing with the underlying issue of doing useful work while waiting for memory. Putting off the harder work of innovation, or at least limiting the innovation to the process level rather than the architecture level is a form of laziness.
  • When the data one wants isn't in the cache, it's often worse than it would have been had there been no cache (so fancy non-cache polluting loads and stores may be added to the ISA, and compiler, etc.
So what are the alternatives to caching?

As in the citations above, the key observation is that if one has additional "threads" ready to do useful work, that work can be done while awaiting the data to be returned (from memory, from cache, from disk, ... wherever) rather than keeping all that hot and possibly expensive iron (silicon) hot. And that's precisely what UltraSPARC T1 does  ).

So when you hear someone making a spurious claim about the UST1 being cache starved, ask them how big a cache they think it should have, and why? What level of associativity? What's the downside? And, of course, point out that the application performance is what counts, and it doesn't support the contention that the UltraSPARC T1 family systems are cache starved.

NB: of course, caches aren't all bad. If you are focused on minimum latency (fastest response time for a single thread) they can be very effective. But if your goal is the most aggregate work for the least power, they are certainly not your friend.

To learn more about caches

Wednesday Dec 07, 2005

With All Due Respect Jonathan

9.6GHz is not the clock of our UltraSPARC T1 processor. As any hardware engineer knows, clock speed is a simply measurable entity, you stick  test leads on the appropriate wires and count the phase transitions. The correct number is 1.2GHz.

Now, as software engineers we know that folks like to compute theoretical maximum operation rates, and 1.2GHz \* 8 cores does yield 9.6 GOPS as a theoretical limit. And thanks to many years of industry confusion, probably due to the infamous Dhrystone benchmark, clock rate and operations per second have become hopelessly confused in the minds of many. 

But fundamentally there is no link between clock and work done (you can have really, really simple instructions (the limiting case is a single instruction (but then you need a very large number of instructions per useful work done ;>)

As proved previously by our Opteron based systems should have demonstrated, clock speed isn't a solid predictor of performance (Intel has a faster clock, and poorer performance).

I suppose one can argue that it is "pleasing that in lots of scenarios we do see scaling that is in fact more or less linear with the core count". See a benchmark focused blog for examples.

Tuesday Dec 06, 2005

What I want to work with next.

Typically one blogs about what one has done, or has discovered. In this entry, I'll talk about an area I want to spend some time working in RealSoonNow.

As a performance analyst, much of my work has been reductionist. That is, I take some application and make it go faster. Step 0 is to measure it before doing anything to it, followed by figuring out little bits that don't go as fast as they ought (or determining that the wrong algorithms were used, and doing some wholesale surgery ;>) and iterating. The key has always been to isolate the smallest bits possible (faster turnaround, better leverage, etc.). And such work has been rewarding in many different ways. But most of the time, my computers are not focused on running just one application.

My laptop, for example, currently has over 300 threads in 80 processes running. I'm not even driving it very hard. If I want to focus on any of the specific processes or threads, tools such as Performance Analyzer (or it's earlier, more primitive predecessors, such as gprof) are fine.

But if what I really want to do is to maximize the performance of the overall system (throughput) I've largely been toolless.

Worse, everything that my friends at Intel (see their last couple of Intel Developer Fora) have been saying is that they are going to move to a strongly multicore strategy (Justin Rattner spoke of hundreds to thousands of cores, and ElReg reported this as 

With the DProfile utility (keyword dataspace if you want to search for it at developed by Nicolai Kosche and friends, it's now possible to see how all the various threads and processes actually interact inside the memory hierarchy.

Of course, this took a lot of infrastructure, SPARC needed to supply enough runtime instrumentations, Solaris the APIs (including Dtrace), the compilers instrumentation (for optimal results), and extensions to the Performance Analyzer to collect and display the appropriate information (this is where that keyword dataspace comes in handy for searches).

No doubt Intel and Microsoft will eventually have as many threads in a chip as Sun does today with Niagara (2010++??) No doubt, someday Windows+++ will have mature support for highly threaded applications (in addition to robustly supporting heavily loaded systems). Intel has, of course, purchased several suppliers of threaded tools so ... and to be fair, the hardware threads only have to be on a single board to provide much of the same software opportunity (of course, the RAS is much better with just one chip ;>)

But why wait? Clearly such "complicated" environments are no longer the sole province of supercomputer users and major IT departments (and a power desktop user probably has a lot more challenging apps than I have on my laptop, visual processing is easily parallelized....) so getting started now with the next generation of tools is going to be a lot like it must have been for the first radiologists. Lots to learn, with brand new shiny technology!

So keep your eyes peeled for anything from \* with words like DProfile or dataspace and dig in!

[ T:]

[ T:

Amazingly stupid competitor quote

You have to wonder if they've been misquoted:

Don't they even bother to read the literature?

UltraSPARC T (nee Niagara) does break a lot of ground for a microprocessor. But effectively reducing latency (which caches are intended to do) is something they multithreading is known to be good for. So megacaches aren't required.

[ T:]

[ T:

Dec. 6th notable events

Of course, today is the big announcement of the first UltraSPARC T based systems (nee Niagara).

It is also the 2nd birthday of Jerry Sandor Bierman. When available, pictures from his birthday party will be located on Flickr

Which Evolves Faster: Hardware or Software?

Conventional wisdom has always had it that hardware is the "long pole"
in system design. Software can be changed up until the last moment (and
even beyond via patches). So the conventional answer is, of course, that
software evolves faster.

But, for large complex software is it really true? Let's consider the
new UltraSPARC T chips (formerly known as Niagara). As can be found
#link to hw_blog (anyone know the best pointer?) there are 32 hardware
threads per chip.

Given that these hardware threads are quite fundamentally different than
having 32 separate cores, just how does an OS such as Solaris deal with
them? The answer is by ignoring the differences and to a first
approximation treating them as 32 "CPUs".

This mostly works well; but it's interesting performance corner cases
that cause great confusion ... because the tools (e.g. mpstat) haven't
really evolved to keep up with the hardware for more details.

Sometimes the hardware does evolve faster.

Of course, the point of software layers such as an Operating System is to provide abstraction of hardware details. Just which hardware details need to be directly exposed is a deliberate process.

Sunday Nov 13, 2005


In HP's latest bit of FUD they say
Sun has stated that Niagara will be binary compliant with previous SPARC designs  but this fact does not tell the entire story regarding how well a SPARC binary compatible program will run on Niagara.  It's not enough to simply run - the program must also run well to be of actual value.  Significant software optimization may be  required to ensure that software will work well with Sun's Niagara.
As Niagara is not yet a released product, a detailed rebuttal is not yet "kosher". However, HP's warning makes it sound like Niagara is unique in providing a new pipeline; but almost every new generation of SPARC processor has had a new pipeline, and compiler optimizations to exploit them fully. That has seldom meant more than a small amount of work for most developers and it has never meant anything generally disruptive. Why anyone would think that this would be different in this generation requires more foundation than HP has bothered to provide.

That Niagara will support simultaneous execution of many threads is clear from the presentations that HP references. That, as HP claims. that this will require

....To fully exploit Sun's Niagara systems, developers may have to change how applications are architected...  Sun has stated that Niagara changes the minimum application scalability demands from 1-4 threads to 32 concurrent threads...

Strikes me as pretty absurd. My laptop (a PowerBook G4) currently has 295 threads running (according to Apple's accounting). I've often had more than 500. Perhaps the authors of HP's FUD believe that servers customarily run a single instance of a single application (and that is a Best Practice of some sort).  It's not the way I've run most of my computing systems over the years; nor am I likely to start ;>

That HP can use so many words, and so many "bullet points" to say the same thing is a tribute to something.

I think it's also worthy noting that HP's primary CPU technology supplier (with the impending end of the PA family, and the already buried Alpha family; both RIP), is busily crafting large multi-core (which translates into "threads" more or less, in this context) is saying things such as:
So to the extent that HP's contention is correct, that folks should be concerned about making their applications scale to large numbers of hardware threads, it would seem that such tinkering will bear fruit (albeit further in the future) on x64 chips from Intel as well as "Today" with SPARC.

Consolidation (running multiple instances, or different applications) on a single Niagara chip doesn't require any application changes... and any application changes made for Niagara scaling are very likely to stand one in good stead for future x64 chips. So why not go with the future today?

Friday Feb 04, 2005

On Trust

Over on groklaw, there's the usual Sun bashing.

webmink has an excellent writeup on the IP issues that pre-CDDL licenses fail to deal with (and, despite the current wailing, by some, I bet GPL3 addresses the problem ... either in a fashion akin to the CDDL or at least inspired by it).

But putting aside all the confusion about patents, GPL and Open being synomnyns, and the like, one particular quote on groklaw caught my attention:

That's what I'd say. Use it only if you trust implicitly in Sun

This immediately reminded me of the classic Turing paper by Ken Thompson Reflections On Trusting Trust (1983).

When programmers build ontop of a system, they exhibit trust. Any system with hundreds of thousands of lines of code (or worse, millions) is simply too complex for nearly any programmer to individually inspect each line for subtle security traps (and if the system is still evolving, how would they have any time to develop their application?)  Open source may make it possible for someone to do their own proofs, but it's computationally infeasible.

Nor, of course, is trust limited to programming. When we get on an elevator, we exhibit trust in the manufacturer of the elevator, in the installer, in the maintainer, in the government body which audits them, etc.

In my limited experience dealing with corporate lawyers, their focus is not on "how can we cheat" or "how can we plant trapdoors in a contract" but it's "how can we ensure that both sides understand what's expected of them and write it down in a mutually agreeable fashion" (no doubt, there exist organizations that other ethics, Enron comes to mind).

The CDDL seems, to this reader, to make it pretty explicit that all contributors have to not only put in code, but put into the "common" pot the appropriate rights to use and protections for the code. That strikes me as fundamentally fair and useful.

Those that think that being precise about IP issues is somehow indicative of poor ethical behavior, and think that the GPL is the superior approach (in this regard) are exhibiting an incredible amount of trust ... in everyone that holds any software patents ... that no one will take them to task for patent infringement. When the code in question is simply shared among a small body of students that's a pretty safe bet. But for folks building multi-billion dollar businesses ought to assume that someone might not see their efforts in the same noble light.

It's sad that pointing this out, and trying to do something about it is seen as an attack or a threat.

Wednesday Nov 17, 2004

An amazing floating point misoptimization

My thanks to David Hough for bringing this gem from Microsoft to my attention. For those not interested in slogging through the entire page; here are my favorite bits; verbatim, although elided and colorized by me. Red for their most amazing decisions, and blue for my commentary.

I pray that no application whose results matter to anyone use this flag!

This is from the Visual C++ 2005 compiler, and the flag in question is:"fp:fast" and specifically regarding sqrt (but I think it's a more generic problem in their thinking)

Under fp:fast, the compiler will typically attempt to maintain at
least the precision specified by the source code. However, in some
instances the compiler may choose to perform intermediate expressions
at a lower precision than specified in the source code. For example,
the first code block below calls a double precision version of the
square-root function. Under fp:fast, the compiler may choose to
replace the call to the double precision sqrt with a call to a single
precision sqrt function. This has the effect of introducing additional
lower-precision rounding at the point of the function call.

Original function
double sqrt(double)...
. . .
double a, b, c;
. . .
double length = sqrt(a\*a + b\*b + c\*c);

Optimized function
float sqrtf(float)...
. . .
double a, b, c;
. . .
double tmp[0] = a\*a + b\*b + c\*c;
float tmp[1] = tmp[0]; // round of parameter value
float tmp[2] = sqrtf(tmp[1]); // rounded sqrt result
double length = (double) tmp[2];

Although less accurate, this optimization may be especially beneficial
when targeting processors that provide single precision, intrinsic
versions of functions such as sqrt. Just precisely when the compiler
will use such optimizations is both platform and context dependant.

Furthermore, there is no guaranteed consistency for the precision of
intermediate computations, which may be performed at any precision
level available to the compiler. Although the compiler will attempt to
maintain at least the level of precision as specified by the code,
fp:fast allows the optimizer to downcast intermediate computations in
order to produce faster or smaller machine code. For instance, the
compiler may further optimize the code from above to round some of the
intermediate multiplications to single precision.
float sqrtf(float)...
. . .
double a, b, c;
. . .
float tmp[0] = a\*a; // round intermediate a\*a to single-precision
float tmp[1] = b\*b; // round intermediate b\*b to single-precision
double tmp[2] = c\*c; // do NOT round intermediate c\*c to single-precision
float tmp[3] = tmp[0] + tmp[1] + tmp[2];
float tmp[4] = sqrtf(tmp[3]);
double length = (double) tmp[4];

This kind of additional rounding may result from using a lower
precision floating-point unit, such as SSE2, to perform some of the
intermediate computations. The accuracy of fp:fast rounding is
therefore platform dependant; code that compiles well for one
processor may not necessarily work well for another processor. It's
left to the user to determine if the speed benefits outweigh any
accuracy problems. khb: unfortunately this would require the user to read the disassembled code, do a rigourous numerical analysis, and to redo it everytime the code is modified or the compiler updated (and then recompiled). This is, of course, totally impractical.

If fp:fast optimization is particularly problematic for a specific
function, the floating-point mode can be locally switched to
fp:precise using the float_control compiler pragma. khb: this is, of course, backwards. If you are going to define a basically insanely liberal fp optimization, it ought to enabled for the smallest bit of code practical (preferably with scoping, so it can't accidentally impact the whole compilation unit).

Monday Oct 04, 2004

Some ruminations about software application licensing

The  Problem

Many important software packages (e.g. Oracle) are licensed on a per processor basis. That is, if you purchase a license for a 72 processor machine, the price is on the near order of 72times more expensive than a single processor license. Is this sensible? What are the unintended consequences for Society at large?


Consider a timesharing service a veryprevious company of mine employed from time to time, BCS (BoeingComputing Services).

BCS  had a large ensemble (more than 100) CDC mainframes. They were binary compatible, and setup to have automatic failover, so that a job that started on one system could end up on another (or having run on a series of different systems).This was necessary to provide adequate reliability.

Most software was metered, that is, one was billed as the sum of resources consumed, such as

  • Nc Dollars per CPU minute 

  • Nd Dollars per amount of diskspace used

  • Ni Dollars per I/O transaction

etc. a complicating factor, however, was that while all the systems were binary compatible, they ran at different speeds. It wasthe clear goal of the software providers (most of the computer vendorthemselves! Or the timesharing system operator itself) that the price for running a job should either be independent of the speed of thesystem, or (more frequently) carry a premium based on the faster CPU.

Since jobs frequently wound up running on more than one system,the billing algorithm was adjusted so that one was charged as if oneran on the original selected system (so if the job “failed over” to a faster system, the rate was adjusted so that the final charge was the same as if it had run on the original system).

When the industry moved away from Timesharing, and thus charge per unit time and towards a software purchase model, there remained avague notion on the part of software vendors (now, more frequently someone other than the computer vendor itself) that if the customerhas 10 slow machines, or machine that is 10x faster, the payment due to the software vendor should remain the same.

As different vendors processors are more (or less) capable, there are software vendors who establish a base price for different vendorsthat differ. This may or may not be acceptable (legally or economically) for some.

Even where it is Legal and Accepted, when a vendor has multiple microarchitectures there may be no single processor whose performance can act as a reliable base.

As a result, many software vendors have simply relied on "processor count", it being a crude metric for the power of a system.

There are, of course, other ways to price software licenses(including per user, per system, per site, and per actual user, andper employee arrangements). However, we'll focus on the lamentablycommon practice of pricing per “CPU”...


Unfortunately, there is no objective definition of a processor.For example, a VLIW machine (like the extinct Multiflow 28) has avery large number of functional units --- more than a quad coreSPARC. The result may well be faster performance for the “singleprocessor”, but  the  licensing fees for the multi-coreprocessor are 4x more expensive!

Current trends in computer design make such issues increasinglyproblematic.

One could argue, as IBM does, that eachidentifiable “processor element” (what sun calls a“core”) is an objectively identifiable“processor”.

But that a “core” exists asan identifiable physical entity is merely a side effect of current design tools and methods (viz. define a single core, step andreplicate). With more advanced CADtools, all of the logical corescould be instantiated and then baked into one huge monolithic mass (which might well have technical benefits beyond that of confusing licensing schemes).

An Aside: Software engineers may recognize this as essentially what a “globally optimizing”compiler does (full interprocedural analysis, etc.) vs. separate compilation.

An Aside: Yes, to all the CAD developers and hardware engineers reading this, I appreciate the manifold reasons why we don't do this (today), and why it's hard (it's possible we might never do it)

Another  complication comes about from the software (or firmware) concept of “virtualization”. Schemes such as those touted by IBM and Microsoft (one physical processor can appear to be any number of processors) provide another confusing view of the actual system from the perspective of software licensing (not to mention how exciting it can be to maintain 20x the number of OS configurations on a single box...).

Yet another complication is provided bythe concept of hardware threads (such as are found in chips ranging from Intel's Xeon to IBM's Power5 and various Sun chips). These threads are typically exposed to the programmer as “virtual CPUs” that is, if the program inquires from the system how many processors there are, a single Xeon chip currently reports 2. The performance of most such hardware threading schemes has been poor (that is, the second thread adds 10%-30%performance, and therefore has been ignored by ISV software licensing..but as hardware threading matures, the performance may well approach N where N is the number of hardware threads).

In the event that the problem is not clear, let us consider the  case of  a chip such as described by:Ace's Hardware (this is not to say that I am confirming any or all bits of speculation and assertions made by that author). But to sum up, they claim it can be described as a single chip, with 8 SPARC cores, each of which has 4 hardware threads. Let us assume for this discussion that they are correct, then....

Is this a single processor with 32 threads? (If so, the license fees for Oracle, would be approximately $15K, using my understanding of  the current rules which ignore threads) Or is it “8 processors”? (If so, the price might be more like $320,000 because larger processor counts start with a higher base cost), or would it be even higher due to the large number of hardware threads?

Getting very speculative, what if some  key hardware resources  were not replicated all N  times? Indeed, what if there was a key pipeline resource critical to Oracle performance  shared amongst all N cores? Does this change the picture? If not, isn't that a truly inequitable licensing algorithm?

It should be clear to the reader that the current situation, where software is licensed by number of“processors” is hardly architecture neutral and has no objectively measurable basis.

In marketplaces where the software vendor has a near monopolistic position, having no objective basis for pricing across platforms may represent a litigation risk (I Am Not a Lawyer, this is my opinion and not that of any member of a Bar Association).

So what can we do instead?

A Solution (starting from the basics)

Every modern computing device is composed of a collection of many chips each of which has a number of transistors. Each transistor has performance characteristics, such as switching speed.

In an Ideal implementation, all hardware vendors would disclosethe number of total transistors in the system, and a breakdown by speeds (most frequently, all the transistors have similar characteristics in a given chip, but in a large system, different chips may have radically different characteristics. Also, in some cases, someparts of a chip may have very different characteristics (e.g.Different clock rates)).

In an Ideal implementation a software vendor would compute a billing factor (BF) for a given system model. BF would be defined as the sum of  all Ti\*Ni fori from 1 to the number of transistor types, and where each T is acost per transistor type and each N is the number of each transistor type. The total price can then be computed  as the product ofBase_Price\*BF. Base_Price will be the same across all platforms (it may be adjusted on a per customer basis based on volumeor other applicable discounts). The BF provides for an architecture and CAD tool neutral platform adjustment.

As most large computer systems are composed of replicated elements, so it may be the case that the most common sub-block may be used to compute a BF and the system BF can be reasonably approximated by Nsubblock\*BF.  This technique will be most useful in “Capacity on Demand” systems where entire sections of the machine are only enabled at some later date.At the time additional subblocks are enabled, the incremental software license charges are trivial to compute.

In the event that hardware vendors fail to publish the precise counts and transistor speeds, the number of transistors can be approximated based on the size of the chip and the particular geometry (130nm, 90nm, etc.). An “average value”may be employed where various transistor speeds are unavailable.


The primary advantage of this system is that it is architecture(both micro and macro) independent, and it is independent of the State of the Art in CAD tools. Having an objective system would be anice thing to have.


Is this a unique optimal solution? Perhaps; but if one is environmentally focused, there's another approach that may have some appeal...


In the Ideal implementation, the end user's computing system would be augmented by a machine readable watt-hour meter, and it's operating system would have fine grained accounting facilities.

At the start of the licensed applications execution,the current value of the watt-hour meter would be recorded. At theend of the licensed applications execution,current value of the watt-hour meter would be obtained, and subtracted from the initial value. This represents the entire power consumed by the system during the licensed applications execution.

Since most computing systems execute more than one program at atime, the Operating Systems accounting facilities will be employed to determine the fraction of the machine's resources that were consumed during the execution of the licensed application. The cost, per execution will then be computed as a Base_Factor\*watt_hours\*Usage_factor (where UF is typically less than 1. It can only be larger than 1 when there issome sort of “Capacity on Demand” functionality deployed).

This would represent a return to “price per usage” asin the days of mainframe computing.

<Un>intended Consequences

The current "processor count" metric encourages users to buy machines with the fastest single thread performance (admittedly, this isn't the only encouragement ;>). The most reliable way to deliever that, generation after generation, has been to "chase" very high clock rates. Unfortunately this is exceedingly energy inefficient (as well as driving up Fab costs rapidly). While California's energy problems certainly weren't created by simply having too many fast clocked computers, it provides a graphic lesson in the downside to power hungry approaches (as well as the laws of economnics and the consequences of incompetent government).

The transistor count approach would reward designers (and consumers) who got the most performance per transistor; while this is appealing from an engineering/logical sense, it's hard to see how that provides a useful benefit to society per se.

The price per watt approach has the intended  consequence of rewarding consumers and designers for providing better performance for lower engergy consumption.

[and before anyone asks, yes, I think it would have been more sensible than setting fleet mpg goals to have made the taxes on gasoline vehicle dependent, and had a factor tied to fuel economy. More efficient cars should be charged less, and less efficient ones should be charged more. Show cars and other historical vehicles could remain unmodified ... but would pay ruinious rates if run as daily commuters ;> ]     

If you like these ideas, please pass them on. Write about them. Lobby for them. Implement them in your products.

Tuesday Aug 24, 2004

Some random pointers

Well, not entirely random. I found them interesting.

The End is Near!

Earth at Night
HP silences User Group though I can sympathize with the Interex leadership. Really annoying the vendor can cause a trainwreck (as, regrettably, caused the Sun User Group (at least the US body) to die 13+ years back).

Word considered harmful?

Dvorak makes an interesting argument for why Word must die. As the last version I used with any great regularity was Word97, I can't really see things entirely his way; but it's a good read.

Not that it matters, but my first version of Word was 1.05 for the Macintosh. It was greatly inferior to the Xerox word processor I'd grown fond of (but we couldn't afford for our little aerospace consulting practice). It was much slower to use than WordStar on our shared buss 8mhz Z80's (with a RAM disk big enough for my documents, the compilation system and the project du jour); but combined with a laserprinter was vastly faster at printing equations (we had been using a roff-like package on the Z80, "Fancy Font" with multiple Epson printers, each being driven to early destruction by using them exclusively in graphics mode, and striking each line about seven times on average).

I think my favorite word processor (after the Xerox dedicated hardware) probably remains "FullWrite" a package for the Mac that eventually disappeared into the void. If memory serves, it had pretty good (better than Word2K) pagelayout facilities bulit right into the wordprocessor, with wordprocessor ease of use. But as it lost vendor support long ago, I eventually gave it up. An unrelated vendor (also long disappeared, as best I can tell) produced my favorite spreadsheet of all time, "Trapeze".

A combination of Star/Open/NeoOffice tends to do most of my needs reasonably well (with the odd dip into Framemaker for some larger more elaborate documents). But I do a lot less mathematical typing these days, so comparing my needs of today and of 17+ years ago is probably not terribly meaningful ... even to myself.



« July 2016