August 19, 2008

Glibc bug in RHEL4.6, OEL4.6, RHEL5.1, OEL5.1

There is a bug in glibc with calloc() not returning zeroed memory when the process address space is locked, for instance by a call to mlockall(). Versions of glibc affected are:

* RHEL-4.6 and OEL-4.6: glibc-2.3.4-2.39, all architectures.
* RHEL-5.1 and OEL-5.1: glibc-2.5-18, all achitectures.
* SLES10 SP2: glibc-2.4-31.54 , all architectures

This produces various symptoms including segmentation faults. Oracle CRS, the cssd process, as of 10.2.0.4 and higher, use the mlockall() call, and can fail. CRS versions prior to 10.2.0.4 do not use this call, and aren't effected.

Details on the problem are posted here:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=473812
https://bugzilla.redhat.com/show_bug.cgi?id=405781

This bug has been fixed in RHEL5.2 glibc and OEL5.2 (glibc-2.7-54). It is also fixed in RHEL4.7 and OEL4.7 (glibc-2.3.4-2.41). We are currently testing a fix from Novell for SLES10. The bug was not present in earlier versions of glibc and it is not present in RHEL3.
Please see Metalink Note 732847.1

August 8, 2008

RHEL4.7 Kernel bug

The kernel errata that was made available by Red Hat on August 6 2008 fixes the RHEL4.7 kernel problem mentioned in my last entry.
We have issued a corresponding OE4.7 kernel errata, even though we already had the fix in the stock OEL4.7 kernel. You do not need to install this latest errata to fix the problem. You can get the details here:
http://oss.oracle.com/pipermail/el-errata/2008-August/000703.html

The Corresponding Red Hat bugzillas are:
RH bug 455525
RH bug 455074
RH bug 453507

Explanation of the changes (quoting from the Red Hat errata text):

"A set of patches detailed as "sys_times: Fix system unresponsiveness
during many concurrent invocation of sys_times()" and "Minor code cleanup
to sys_times() call" introduced regression which caused a kernel panic
under high load. These patches were reverted in the current release. "

August 4, 2008

OEL 4.7 is out

OEL 4.7 was released on ULN on August 1 2008.
There are 6 new channels, el4_u7_i386_base, el4_u7_i386_patch, el4_u7_x86_64_base, el4_u7_x86_64_patch, el4_u7_ia64_base, el4_u7_ia64_patch.
ISOs will be available as usual on eDelivery and OSS will have the SRPMs.
Updated documentation:
ULN Whitepaper
Unbreakable Linux FAQs
Unbreakable Linux Data Sheet

Please see
metalink note 729543.1
for more information about a critical RHEL4.7 kernel bug which affects every Oracle customer running Oracle Enterprise Manager, EL4.7 is not affected.

Release Notes
Source RPMs


April 21, 2008

Oracle VM Information

One other member of our team, Alison Holloway, has been writing interesting blog entries about Oracle VM. They have useful instructions about how to install OVM, and more to come in the near future. Do take a look at Alison's Blog.
 

UTF-16 and UTF-32 support in GCC added

Our team has made the first contribution to GCC last week.  This is
quite exciting news for us, and hopefully it will be the first of many
contributions to come.  Kris Van Hees has implemented UTF-16 and
UTF-32 char data types support for the C and C++ languages in gcc.

His work is based on the ISO/IEC draft technical report for C (ISO/IEC
JTC1 SC22 WG14 N1040) and the proposal for C++ (ISO/IEC JTC1 SC22 WG21N2249)  here
Neither proposal defines a specific encoding for UTF-16. This
implementation uses the target endianness to determine whether
UTF-16BE or UTF-16LE will be used.

Support was added for the following wide character datatypes (internal
for C, Fundamental types for C++) with the given underlying data
types:

    * char16_t: short unsigned int
    * char32_t: unsigned int

Support was added to the tokenizer to accept the following new
character and string literal notations:

    * u'c-char-sequence' char16_t character literal (UTF-16)
    * U'c-char-sequence' char32_t character literal (UTF-32)
    * u"s-char-sequence" array of char16_t (UTF-16)
    * U"s-char-sequence" array of char32_t (UTF-32)

Support was also added to the C parser and the C++ parser to handle the
following concatenations of string literals:

    * "a" u"a" -> u"ab"
    * u"a" "b" -> u"ab"
    * u"a" u"b" -> u"ab"
    * "a" U"b" -> U"ab"
    * U"a" "b" -> U"ab"
    * U"a" U"b" -> U"ab"

This behaviour is only available in the gnu99, c++0x, and gnu++0x
compiler modes.

See the initial patch submitted upstream:
http://gcc.gnu.org/ml/gcc/2007-11/msg00021.html
And the revisions:
    * http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00827.html
    * http://gcc.gnu.org/ml/gcc-patches/2008-03/msg01474.html
    * http://gcc.gnu.org/ml/gcc-patches/2008-03/msg02025.html
    * http://gcc.gnu.org/ml/gcc-patches/2008-04/msg01235.html

The patch was commited to the GCC trunk on Apr 18th, 2008.
Thanks to Jason Merrill, Joseph Myers, Andrew Pinski and Tom Tromey for the reviews.

April 1, 2008

Tracing Workshop Wiki is Public

The Montreal Tracing Workshop wiki is now publicly accessible.
This is the link:
https://ltt.polymtl.ca/tracingwiki/index.php/TracingSummit2008



January 31, 2008

Tracing Workshop in Montreal

January 29 2008

I am currently at a two day tracing workshop in Montreal Canada. It is
mainly an industry workshop hosted by Ericsson. There are a few Open
Source folks, like Mathieu, Andrew, Nathan and me. Too bad Frank
couldn't make it.  I decided to drive from Boston, it seemed like a
good idea at the time, but now I am not so sure anymore, it was a
fairly long drive. Fortunately the rental car has a 6 CDs changer.
There was an opening reception Monday evening, with nice French and
Canadian food. I tried the caribou steak, not bad at all.

The presentations so far have been of good quality, even though some
were a bit too detail oriented. There are representatives from IBM,
Red Hat, University of Montreal and Ecole Polytechnique, Canadian
DoD, Freescale, Oracle, MontaVista, WindRiver and others.  There is
a wiki for the workshop which includes the slides. The wiki will be
publicly accessible in a week or so. I'll post the link then.

The first day talks provided an overview of the current tracing
platforms including the challenges that they face. Systemtap, Frysk,
Eclipse/TPTP, QNX. There was a presentation about old IBM projects,
and the experience gathered about tracing as part of them.  I gave a
short presentation about the use of Systemtap within Oracle, and the
problems that deploying the tool in a commercial/enterprise
environment entails. I don't think these problems are specific to
Oracle, on the contrary they are a good representative set of what
Systemtap needs to address as a project in order to step up to the
next level, and move beyond the hacker community. Let's hope we can
figure out what to do about them, as they are general challenges that
would impact gdb and frysk too. I also talked a bit about some other
usage of systemtap that we tried in my team, specifically to generate
kernel modules to force the kernel into a resource starvation
state. These scripts were based on some previous work that Bryce did,
and the source code for those fragments is available here.  Here are
my slides.  Mathieu presented about LTTng and tracing at
Google. Interesting talk, especially now that the markers
infrastructure has been released in the 2.6.24 kernel. Speaking of
which, there is still a lot to do there, because now there will need
to be negotiations about where to insert the actual markers in the
various kernel subsystems. Once those are in, we need to write the
systemtap scripts that use them.

The evening had a couple of BOFs, one about Eclipse and modeling, and
the other about tracing events. I attended the latter. Discussion was
a bit all over the place, but there was some interesting brainstorming
about tracing and time counters in virtualized environments. The notes
should be up in the wiki soon.

In the evening I met again with a bunch of folks from the workshop,
and a colleague, Martin, kernel person from Oracle who lives in Montreal.


January 30 2008

On the second day, somebody from Enea presented a tracing framework
that is based on evenpoints and actions which is very similar in
spirit to Systemtap.

The next talk was from ZealCore. They talked about replay debugging,
and tracing. They do flight recorder type tracing, with very small
overhead.

Next was Eclipse PTP. Nice CDT extensions for parallel programming,
supporting all sort of interactive static analysis for OpenMP and MPI.
This effort was mentioned, called "Scalable Tools Communication
Infrastructure" at http://www.scalabletools.org. Interesting, worth
keeping an eye on them.  The PTP project uses SDM: Scalable Debug
Manager that is based on gdb. There are multiple instances of gdb one
for each process and a manager program that controls the various gdbs.
Also these projects were mentioned. TAU (from University of Oregon)
and PAPI.

Windriver presented about OS agnostic visualization tools, they used
LTTng, and plugged their graphical interface and visualizer on top of
it.

Unfortunately at this point I had to leave, and I skipped the last
presentation, also from WindRiver, where a tool was presented that had
similar functionality to Systemtap.

Oracle Open World 2007 and Systemtap talk

Let's talk a bit about Oracle Open World. This took place November 11 through November 15 2007, in San Francisco. It is a giant expo, taking place in all the buildings of the Moscone Center with overspill in adjacent areas and hotels. I was staffing the Unbreakable Linux and the Oracle VM booths. It was incredibly busy, every day. The announcements about OVM and the demos of the OVM with live migration given in the various keynotes contributed noticeably to increase booth traffic as well. You can see the daily news from the conference here

I also gave an introductory talk about Systemtap within the developer track. There was very good turnout, and lots of questions. I have posted the slides in PDF format here.

August 20, 2007

OLS 2007 Conference Impressions

This is a bit late, as OLS took place at the end of June, but I think it may be useful for people interested in knowing what's happening around debugging and low level OS kernel interaction. 

The reprints can be found here

Many meetings, Many presentations.

Kris and I met with the systemtap people. IBM, Hitachi, Intel and RedHat.  Had all hands meeting, to discuss roadmaps, and the like. Had also group dinner, as tradition.

For Systemtap we also had a meeting with IBM to discuss testing, and how to add the IBM test results for Systemtap for s390 and ppc to the Labrat system.  Main outcome of that is that for now Kris will receive tarballs of the current test runs, and will parse them into the labrat webpage. Eventually we hope to solve the possible crash-the-system problems, by using Xen for Labrat. At that point we can run all the tests (make install-check) instead of just the "smoke" tests (makecheck).Will Cohen of Red Hat is also working on more testing, specifically adding test coverage information for the tapsets themselves (as opposed to coverage for the translator/systemtap itself).

Systemtap BOF

Vara Prasad of IBM, presented. General status of Systemtap, and discussions with audience. Main problem remain around the debugging information.  Problems are twofolds and almost orthogonal:
  • size of the debuginfo. Files are big, we really only read part of   the info in anyway. Is there a way to shrink the size or a way to   split in smaller chunks, that are related to all that a tapset   needs.  (I personally would not split things up too much, because   then one needs to carry around the whole thing anyway for debugger   usage).
  • availability of the debuginfo. Customers using stap right now are   running into this issue. Where is the kernel debuginfo, where to   get it, it's cumbersome. Should there be a debuginfo server from   which to get it. Problem with become much worse with the userspace   probing. You need debugging info for all sorts of apps. 
  • Most apps that big enterprise customers might be using are closed   source, and there is no debugging info around. for Instance Oracle   is one of these apps). Is there any way to provide debug info?

Utrace/Uprobes talk

Jim Keniston of IBM. Very good overview talk (slides already sent to list).  Nothing new here for me, but it was good to see most of the ptrace related stuff all in one place. There isn't such an overview around.

7 Dwarves talk: Arnaldo Carvalho de Melo

This was a bit misguided. The abstract was giving the impression that the talk was going to be about how to reingeneer Dwarf2 to make it better and smaller, but that wasn't the topic. Arnaldo wrote a bunch of utilities to gather information about the layout of structures and to rearrange the structures so that they are packed better with less space wasted. He has the utilities available for download onkernel.org:The set of tools is called Pahole.

Kernel development overview: Greg Kroah-Hartman

I got there a bit late. This talk was a summary of who is doing what in the kernel community. Quite interesting, Greg charted all the patches that went into the 2.6.22 kernel into a giant graph, with an arch going from the patch author to each of the persons that signed off on the patch. Pretty big graph. He had that printed out and it was hung from the wall of the confroom, so people coud sign their name next to their node.
Anyway, he also ranked the top contributors by number of patches: Oracle is at 20th place with 365 patches for the current release. Top contributors are "unknown developers" (i.e: not clear who they work for), Red Hat # 2, Novell #3. Google, excluding Andrew Morton, hasonly 48 patches.  Red Hat is the top $$ funding company (better said as employer) and the "consultants" group is at about 15th or so. Bottom tier are companies that we all know are modifying/using/selling Linux but are not contributing back.

Linux on Cell: Arnd Bergmann

This was a fun talk. It seems that the cell processor is improving, but it's still a huge beast, with many moving parts. To compile for it with gcc, you have to generate 2 types of .o files for the 2 types of cores and then the linker will put them together, for instance. Applications are limited to supercomputing and of course the gaming area with the PS3. This was interesting to me because of my previous work with gdb in embedded space.

Frysk: Andrew Cagney

This talk concentrated on the testing of Frysk. I.e. how frysk has been able to find kernel bugs (or better how kernel bugs manifest in debugging applications). Of course frysk pushes the boundaries set by gdb a bit harder, and the ptrace/utrace infrastructure runs into its own limitations. The Frysk tests include a set of kernel tests.
Talk was a bit disappointing because it concentrated on the Frysk command line and the GUI demo consistently didn't work.
Kris's Labrat system got mentioned in the "more info" slide at the end.

Asynch System Calls: Zach Brown

This was good and jaw dropping. In the sense that the new Syslets stuff is going to impact utrace/ptrace quite a lot. We need to synch up with Roland and talk. Read the paper.

DJProbes: Masami Hiramatsu and Satoshi Oshima of Hitachi

Description of DJprobes. DJprobes are now used to improve kprobes performance by 50% in certain cases on x86. Detailed description of the limitations of the mechanism in various cases was discussed.  It seems to me that the limitations and the workarounds are getting a bit Byzantine, and that with all the workarounds implemented, the worst case scenarios cannot possibly be any faster than just using kprobes. One limitation is due to the instructions replaced by Jump being the target of other jumps, and how the boundaries of the instructions can be misaligned. Another limitation is due to concurrent execution of the same area by multiple threads. Some more problems occurred with self-modifying instructions. I kind of lost track of things after awhile, because the talk was a bit too detailed.

LSB BOF: Mats Winchmann

Just attended this one to catch up on what LSB was doing, having worked with them at previous employer. Not many big news, LSB 3.2 will contain some bugfixes and more tests.
  • Missing interfaces will be added (about 45 of them)
  • freetype, pft, xrender
  • application testkit manager
  • printing support
LSB 4.0 is still way out in the future, and include some but not necessarily all of the following items:
  • runtime languages
  • LSB for mobile devices (smaller footprint LSB version)
  • library uplift (new versions of glibc, opengl, etc)
  • cairo, dbus
  • audio
  • much increased test coverage (contracted out, so this will actually make it)
They have revamped the website and now it includes a database of all the interfaces and it's easily queryable. LSB is now and ISO standard, freely available for download.

Perfmon2 BOF: Stephane Eranian of HP

Ah this was interesting. It ended up lasting 2 hours, and we got kicked out of the room because they wanted to shut the place down for the night. Stephane is writing this unified framework for accessing performance monitors in the kernel for a variety of processors.  HP, Intel, IBM, AMD are contributing their architecture dependent part. This Perfmon2 effort started in 2005 from and impromptu OLS bof, where it was decided that less fragmentation was needed, because there were 2 such mechanisms already and none of them was covering all the needs (perfmon and perfcounter). So 2 years later what has happened?  The interface looks nice, and it's getting polished. Stephane hopes to have it for inclusion in the 2.6.24 kernel. He discussed also some of the problems he is facing, (now I don't remember all of them) one of which is getting access to the real counters from a virtualized environment. So he is facing real challenges. Unfortunately he is the main/only person working on this. He is going to set up a git repo onkernel.org, though and that hopefully will speed up the testing and encourage more participation (as opposed to publishing a patch once in a while). However there is still no sign of an in-kernel API. There is one for Userspace use. Justification is that there are no"customers" ready yet for an in kernel API. This may be bit debatable, since Mathieu (LTTng) is crying for one, and so are others. Maybe the first client of such interface could be the NMI watchdog on i386.

Branch tracing and test framework: Niro Yoshioka

This guy is from Miracle Linux, and the stuff he works on is on sourceforge:Quite interesting. Maybe useful to look at it for our Labrat.

Another tracing BOF: Alan Stern and Jim Keniston

This was supposed to be a talk by Prasanna of IBM, but he wasn't able to come to Ottawa, so Alan Stern talked about the Hardware Breakpoint support he is working on. Basically another interface will be in place in the kernel to deal with all the architecture dependencies details of the hardware assisted breakpoints and watchpoints. Number of registers varies per processor, per architecture. Triggering also varies, and position of the IP varies also after the trap is taken from procesor to processr (ie sometimes it points after the trap instruction, sometimes it points at the trap instruction). But the same event will be presented to the user (gdb/frysk, etc) and all the details will be transparent and hidden within ptrace/utrace. Alan is actually working with Roland on this, to integrate it as part of utrace. Jim Keniston did the rest of the BOF, talking some more about utraceand uprobes. Nothing new there.

The Linux Tools and Languages Team at Oracle: An Introduction

After a few months at Oracle, it's time to talk a bit about what our team has been working on.

Kris Van Hees has started an open source (licensed under the GPL) automated and loosely distributed build-and-test reporting tool, called Labrat. It collects and displays test results for various projects. Right now Labrat is used for daily testing of Systemtap and Frysk. Alongside with running the testsuite for the projects and collecting the results, the build logs and the test logs, it also collects coverage information from each of the test runs. The project will be hosted here: OSS.oracle.com. At the moment it is temporarily fully accessible here: Automated Build-and-Test Tool Page.


Another area we are working on is Scripting Languages, mostly with PHP and Ruby. Christopher Jones and Alison Holloway are in the team, and their work is highlighted in their blogs:
Chris's Blog
Alison's Blog
Chris has put together the PHP RPMs for OEL, they are downloadable from Oracle's PHP project site.

Another project we are contributing to is Systemtap. I used to manage the project from the Red Hat side, and it has been refreshing to get to work again with the old crew, made of folks from IBM, Hitachi, Red Hat, Intel, and now Oracle. Oracle is starting by contributing to various scripts/tapsets, and fixing a few bugs, Wenji is the engineer working on this. In addition, Kris has been working with IBM to integrate the IBM test results for Systemap in the Labrat framework. Various new features for Labrat will fallout from this work.

And last but not least is Frsyk.I used to also manage this project when I was at Red Hat. The developers group for Frysk is still formed mostly of Red Hat people. IBM is contributing the PPC porting work, and they have recently incorporated upstream the brand new PPC port of libunwind. I have two engineers working on the project, Kris and Pearly. Kris has been looking at the testing of Frysk, and while working on that task, Labrat (see above) was created. Labrat mails daily test reports to the frysk-testresults mailing list. Beyond that, Kris has so far been looking at Frysk testsuite failures and hangs, and low level implementation cleanups. Pearly has been looking at the Frysk GUI, while getting more familiar with the code base, and has been fixing the memory and the disassembly windows.

I'll post more details on each project later.

About

IMG_0806-small-5.JPG

I am the manager of the Linux Engineering Tools team at Oracle. People in our group do toolchain and language development. We contribute to GCC, binutils, libstdc++, Systemtap. We also work on PHP, OVM and Yast.

Top Tags

Categories

Powered by
Movable Type and Oracle