Monday Jan 28, 2008
Wednesday Nov 28, 2007
By thorsten on Nov 28, 2007
Finally, I did what I should have done in the first place: I used a different hex editor. And lo and behold, the trusty midnight commander showed all the files with correct 0x0A. Caught in the act and proven guilty: emacs' hexl-mode.
Well, kind of. It turned out to be a rather unlucky combination of things going wrong: emacs with its buffer encoding scheme, which tries to be smart and detect the actual encoding - if there's a 0x0A in the file before a 0x0D, emacs assumes raw-text-unix and leaves line ends alone. If there's a 0x0D in the file before any 0x0A, emancs assumes raw-text-mac, and converts all assumed lineends to 0x0D. And since hexl-mode seems to work on top of an existing buffer (and never loads the file directly from disk), this resulted in the described mess.
So, to save yourself some trouble: don't trust your tools. And make sure to have files you want to work on with hexl-mode match something in file-coding-system-alist's content, that enforces binary reads (elc, png, jpeg, ar, gz etc. for my setup).
Friday Nov 23, 2007
By thorsten on Nov 23, 2007
On related news, the resulting CWS got recently merged to HEAD, so those of the adventurous kind can now build this as an optional feature: just issue "build ENABLE_OPENGL=TRUE" in the slideshow module to be blessed with an ogltrans shared lib. And thanks to Rene, after CWS configure22 is integrated, one can even give "--enable-opengl" on the configure line. After that, register the ogltrans lib into OOo via unopkg, and use this presentation file to see what others are missing.
This could even be shipped as a separate extension after 2.4 is out...
Monday Oct 15, 2007
By thorsten on Oct 15, 2007
After an impromptu dead parrot sketch at the OOoCon ("the beamer works" - "no, it's dead" - "bah, it's not dead, it just needs a reboot"... (you might look forward to the still-missing video)), I had a rather rough ride with the proverbial last 10 percent of the browser functionality - I'm still not 100% happy with the representation in OOo's Calc, and the xcs schema parser leaves something to desire. But surely the thing is now ready for config hackers to play with it - grab your copy here:
(for those of you who don't know what I'm talking about: OoConfig is a Python extension, providing a tool quite similar in spirit to Mozilla's about:config page - a way of tweaking the complete (even hidden) set of configuration items. See also the wiki for further information)
Wednesday Aug 08, 2007
Tuesday Jul 24, 2007
Sunday Nov 12, 2006
By thorsten on Nov 12, 2006
parse_cell_4 parse_cell_5 | | parse_cell_1 parse_cell_2 parse_cell_3 | | | -------------- parse_cell_6 -----------------
The partial ordering of cell evaluation equivalent to this tree is as follows:
parse_cell_4<parse_cell_1 parse_cell_5<parse_cell_3 parse_cell_1<parse_cell_6 parse_cell_2<parse_cell_6 parse_cell_3<parse_cell_6
Wanting to employ what we've talked about earlier, to parallelize this calculation, one would need a static expression (i.e. one fixed at compile time):
as_func( as_func( parse, as_func( parse, 4), 1), as_func( parse, 2), as_func( parse, as_func( parse, 5), 3))
as_func denoting that the argument should be
evaluated lazily, and parse being a Calc cell parsing unary function,
expecting the cell number as its sole parameter).
Now, having the formula dependencies fixed in the code isn't of much value for a spreadsheet, thus we need to handle this a bit more dynamically. Futures and Actions in principle provide the functionality that we need here, only that there's one slight catch: the pool of threads processing the Actions or Futures might actually be too small to have all cells resolved, that in turn are referenced from other ones (with circular cell references being the extreme example. But see below). This is because forcing a Future or Action to evaluate will block the thread requesting that value, which will eventually lead to starvation, unless there's at least one more thread active to process cell values other Actions are waiting for.
Of course, one could remedy this by using
N threads when
N cells. But that would get out of hand
pretty quickly. Alternatively, the cell parsing function can be split
into two parts: the first generates a parse tree, thus extracting the
referenced cells (depending on the heap allocation overhead, one could
also throw away the parser outcome, except for the cell
references. But OTOH, with a decent thread-local allocator, the full
parsing could happen concurrently. YMMV). Given the list of references
for each cell, one again gets the partial ordering over the value
vector< vector< int > > intermediate; parallel_transform( cells.begin(), cell.end(), intermediate.begin(), &parse_step_1 );
For each cell, this step yields a vector of preconditions (other cells, that need to be calculated before). Pushing the actual cell value calculation functions into a job queue, and handing it the dependency graph (represented by the individual cell's references) generated above, we arrive at a fully dynamic version of the concurrent cell parsing example:
int i=0; job_queue queue; vector< double > results(intermediate.size(),0.0); transform( intermediate.begin(), intermediate.end(), results.begin(), bind( &job_queue::add_job, ref(queue), bind( &parse_step_2, ref(results), ref(cells), var(i)++ ), _1 )); queue.run();
This looks weird, but peeking at the prototypes of the involved functions might help clear things up:
/\*\* adds a job functor @param functor Functor to call when job is to be executed @param prerequisites Vector of indices into the job queue, that must be processed strictly before this job. Permitted to use not-yet-existing indices. \*/ template
job_queue::add_job( Func functor, vector const& prerequisites );
/\*\* calculates cell value @param io_results In/out result vector. Receives resulting cell value, and is used to read referenced cell's input values. @param content Cell content @param cell_index Index of cell to process \*/ parse_step_2( vector
& io_results, string const& content, int cell_num );
This job queue can then decide, either globally or via policies, what to do in various situations:
At any rate, fine-tuning this to various hardware, operating systems and deployment settings is much easier than for manual thread creations. Plus, given a few (falsifiable) attributes of the functions called, it's also much safer.
Saturday Nov 04, 2006
By thorsten on Nov 04, 2006
Why not going further? If there's a way to know that a function is pure, or that an object is thread-safe, then it would be nice to have automatic parallelization employed. The underlying issue that needs to be controlled here are races - prohibiting unserialized access to shared state. So, either a function is pure, i.e. does not depend on shared state at all, or the called subsystem takes care of itself against concurrent modification (this is most easily achieved by the environment concept of the UNO threading framework: the UNO runtime implicitely serializes external access to thread-unsafe appartements).
Although C++ provides no portable ways to express those concepts on a language level, for pure functions, there's a way of using a limited subset of lambda expressions, that inhibit access to shared state on the syntax level. And it's perfectly possible to mark objects (or even subsets of a class' methods) to be thread-safe. One straight-forward way to do this are specializations of UNO interface references, i.e. ones that denote thread-safe components.
Given all of this, we can form statements that contain:
So, in fact, a runtime engine could reason about which subexpressions
can be executed concurrently, and which must be serialized. If you
treat method calls as what they are, e.g. implicitely carrying a
this pointer argument, a possible data flow graph might
look like this:
new object1 new object2 | | +->object1::methodA +->object2::methodA | | +------------------------------>+->object2::methodB(object1) | v object1::methodC new object3 | +->object3::methodA
That is, the
this pointer is carried along as a target
for modifications, and as soon as two methods have access to the same
object, they need to be called sequentially. This does not apply for
UNO interface calls or objects that are tagged as thread-safe, of course. To be specific, a forest of data flow trees can be generated, which
defines a partial ordering over the subexpressions. If neither
exp1>exp2 can be deduced
from this ordering, those two subexpressions can be executed in
parallel. Really basic stuff, that compiler optimizers do, as well -
only that plain C/C++ doesn't provide that many clues to safely
parallelize. From the example above, it is obvious that
object3::methodA can be executed concurrently to all other
object1::methodC must be execute strictly
object2::methodA, and that
also be executed concurrently.
Okay, this is largely crack-smoking. But there is something to be made of it. Stay tuned.
Wednesday Oct 25, 2006
By thorsten on Oct 25, 2006
To recap, the free lunch is over for us application software developers, i.e. either OOo gets seriously multi-threaded soon, or we stop adding features and only optimize for speed, or OOo will get significantly slower within the next few years. But as you, the gurus and yours faithfully have noticed by now, becoming concurrent with osl::MutexGuard, osl::Thread & friends is tedious, error-prone, and neither race nor dead-lock free for a project of OOo's size. With the expressive power of a general programming language, it is even theoretically impossible to verify against these properties (for a sufficiently large project. OOo probably satisfies this constraint).
So, Stephan already showed what's possible in other languages. To boot, the trick is to raise the level of abstraction above digging-in-the-dirt, hand-made, explicit, and ugly locking or thread fiddling. Whereever possible, parallelism should be detected either implicitely, or at least denoted declaratively (instead of manually telling the machine what to do). John Ousterhout has a nice writeup about that. It's basically another instance of the lack of transparency and ease of use that is so ingrained with day-to-day tasks in C++. For stuff like object lifetime management, serialization and so on, boost provides transparent solutions. For large-scale concurrency management, the UNO threading framework does. For local, statement-level parallelization, there ain't much - or is there?
In fact, there actually are several attempts to extend C++ in that direction. To name a few:
Basically, the advanced approaches listed above employ declarative semantics, i.e. one no longer needs to create threads by hand, assign a thread function, and manually serialize access to shared state. Instead, one simply states that a loop body shall be executed in parallel:
parallel_transform( src.begin(), src.end(), dest.begin(), bind(_1 \* 42) );
This multiplies each element in
src with 42, and stores
the result in
dest. How many threads are created, and
where they are taken from is completely up to the implementation - a
typical threading runtime on a machine with
N cores will
hold a static pool of
N threads, each of them receiving a
chunk of the
parallel_transform workload from above. In
the old way of doing multi-threading, this behaviour would potentially
be spread over the whole code base (making it a cross-cutting
concern, just like explicit method guarding is today - with thousands
osl::MutexGuard(m_aMutex) invocations all over the
One of the crucial points of doing stuff like the above in a safe way
is to avoid hidden access to shared state. Access to the container can
be serialized by
parallel_transform, but if the called
functor modifies shared objects concurrently, all bets are off. Thus,
the listed frameworks usually have some safe-guards in place, most of
them using a way of explicitely annotating free-standing functions and
classes to be thread-safe (note that this tagging of thread-safeness
can easily be achieved by means of the UNO
threading framework as well - Kay already has different
uno::Reference types on his list, to distinguish between
thread-safe and thread-unsafe components during compile-time). Another
important concept noteworthy here are pure functions,
i.e. functions that, given the same input parameters, always return
the same result, no matter where or when called (related, but not
exactly the same are referentially
transparent functions, which could also be exploited by an
optimizing parallel execution runtime). It's always safe to call pure
functions on disjunct parameters concurrently with no extra
synchronization effort. A safe (because syntactically limited) way of
defining pure functions is via boost's bind or
lambda library. There, expressions that modify shareable state could
simply be banned from the available syntax (they aren't - but it's
possible to do that).
Need an example? Good, take the standard one, that so naturally asks
for parallelization, that it's usually been sped up via SIMD in the past (you
might have noticed that those
algorithms principally apply the same instructions (the functor to
run) in parallel to all container elements - that's SIMD), namely operating on image
// color-convert an image from RGB to CMYK color space parallel_transform(src_img.begin(), src_img.end(), dst_img.begin(), bind(&rgb2cmyk,_1,cref(conversion_parms));
This is the archetype of inherent parallelism, having a huge amount of data (image pixel), that all need to have the same operation applied to. Disregarding memory bandwidth for the moment, this snippet scales linearly with the number of available cores.
conversion_parms parameter to the bound
function, which can pose problems if
const_casts it to something mutable. This is the dark
corner of C/C++'s non-enforcable constness, and therefore, a minimal
amount of developer discipline is necessary: if objects or functions
employ magic behind their public interfaces (on-demand creation of
data, buffering, caching, copy-on-write), this still needs to be made
thread-safe explicitely. Fortunately, for each of the aforementioned
items, there's a thread-safe helper available in OOo (use rtl::Static
for late-initialization of static data, boost::shared_ptr
for lazy initialization of class members, caching and buffering, and
To be continued.
Friday Sep 01, 2006
By thorsten on Sep 01, 2006
(if you inspect this carefully, you'll see the difference between the preview image on the left side, which is still rendered the old way, and Draw's edit pane content to the right, which now gets displayed via cairocanvas).
Sunday Jul 16, 2006
By thorsten on Jul 16, 2006
- points, vectors, boxes, ranges, bezier curves
- polygons and collections of polygons ("poly-polygons")
- 1D, 2D, 3D
- integer and double-precision floating point
In addition, especially the polygon ADTs are shipped with a set of associated algorithms (provided as free functions, instead of member functions):
- a poly-polygon clipper (comparable to what gpc is doing)
- a poly-polygon triangulator (convert a polygon to triangles, which can be fed to a GPU)
- a function to generate stroked paths from a given polygon
- plus a lot of basic, but very useful stuff (like svg im/export, scan conversion, arc and circle generation, etc.)
What's missing, though, is everything related to bitmaps. That's where basebmp comes into play, started a few weeks ago and set to provide for pixel processing, what basegfx provides for the symbolic graphics primitives.
I've initially started this work by using AGG, but quickly found out that adding pixel formats other than integer true color ones is pretty hard (Because the code is largely undocumented. Because the assumption that pixel values can be modified by shift left and right operations, and are PODs (i.e. non-class types) is all over the place. And because basically, a way to somehow get a hold to the famous extra level of indirection is missing. Which, as usual, would have solved this software engineering problem). Instead of starting from scratch, we've based this on vigra, a framework for generic image processing, which provides an elegant way to add 'special processing' before an actual pixel gets assigned a value.
In order to really get to the point here, I first need to sketch the way generic bitmap processing works: it's the idea of separating data access and the actual bitmap manipulation algorithm, using C++ templates. The concept is borrowed from the STL - bitmap pixel can be accessed by (two-dimensional) iterators, which in a very basic sense behave like a pointer:
while( i!=end ) \*i++ = rand();
i is a bitmap iterator, the pixel will be set to
random values with this algorithm. The type of i can be chosen freely,
and so can the actual pixel format of the bitmap. If one wants to
e.g. add palette lookup to this algorithm (or scale the random values
into the appropriate color range) without changing the algorithm's code, either extremely ugly hacks
involving proxy objects are necessary (because operator\*() is expected
to return an lvalue, which gets assigned the new value directly), or
one needs a direct way to 'see' both the iterator and the new value
prior to assignment:
while( i!=end ) acc.set(i++, rand());
acc variable is a pixel accessor in vigra lingo,
which takes the iterator and the new value, and accesses the
corresponding pixel. Now, adding palette lookup, scaling, blending,
mask operations and much more is as easy as providing a different
accessor to the algorithms is. Plus, it's completely orthogonal to
both the iterator implementation, and the algorithm used. That means,
you define a set of iterators (for packed-pixel images, for r8g8b8
images, for r5g6r5 images), a set of accessors (for true color pixel,
for palette lookup, for blitting through a binary mask), and a set of
algorithms. And you combine all those freely, i.e. you have the
cartesian product of variations to choose from.
Sunday Apr 23, 2006
By thorsten on Apr 23, 2006
- DEPS, a tool to extract dependencies (e.g. includes) from source code
- cinclude2dot, a tool plotting a project's include dependencies as a GraphViz dot graph
Tuesday Feb 28, 2006
By thorsten on Feb 28, 2006
The downside: you need a fairly recent X server and cairo 1.0.x on your system.
Tuesday Feb 14, 2006
By thorsten on Feb 14, 2006
- pygccxml, a gcc plugin to generate XMLized output from the AST (plus a python framework to actually work with this representation).
- gccxml, quite similar to pygccxml, focused on extracting declarations only.
- synopsis, a standalone parser plus analysis framework - a bit lacking for c++, which is inherently hard to parse.
- opencxx, same as above (actually a predecessor/inspiration for synopsis)
- rtlcheck, a gcc plugin that extracts code on the RTL (register-transfer level). Currently, seems to be used only for c, extending that to c++ should be both easy & rewarding (as rtlcheck appears to be the most elaborate project of all listed).
- Also quite interesting, though with different scope (being a runtime-analysis tool) is the Design Recovery Tool, roughly characterizable as a means to get from UI actions to the code involved processing them.
Friday Feb 03, 2006
By thorsten on Feb 03, 2006
Thus, this needs some larger fixups, work for that is going to happen in CWS cowfixes01.
- Slight changes...
- Don't trust your tools, if they are software...
- GSoC 2007 miscellanea
- A Dead Parrot Learns to Fly
- A hacker's way of app-bundling
- Germany leads in installed photovoltaic power generation
- Intel opensourced their Thread Building Blocks
- OOo my Threading (3)
- OOo my Threading (2)
- OOo my Threading...