Wednesday May 25, 2011

Build Infrastructure Project

So the Build Infrastructure Project for OpenJDK has finally gotten started. Please go to the project page for details, this is just some ramblings and babbling that people may find interesting. Hopefully not containing too many outright lies and mistakes, but if there are any, they belong to me and me alone. Most importantly:

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

So what is this Build Infrastructure project all about? Does "makefiles".equals("BuildInfrastructure")? No, but close, maybe "BuildInfrastructure".startsWith("makefiles")&&"BuildInfrastructure".endsWith("makefiles") :^)

For a long time now, most changes to the JDK makefiles and build process has been evolving slowly, some may say glacially, and it has certainly been fragmented. :^( I've been involved for many years now in trying to do some simplifications, and changes to the original Sun JDK Makefiles via the initial Peabody (JRL) project, and then OpenJDK.

I can't speak to the very original JDK Makefiles (JDK 1.1), but from where I entered the picture (and this is just my opinion) the makefiles for the most part, served the Release Engineering (RE) team for building the product, and developers just had to navigate around and find useful patterns that allowed them to get their job done, often times only building small parts of the JDK any way they could (and avoiding complete JDK builds at all costs). The important builds came from RE, and as long as their builds were successful, always from scratch, all was well with the world. But the developers suffered from:

  • No good or reliable incremental builds
  • Slow builds of Java source
  • Incorrect assumptions about *.java timestamps and what needs to be compiled
  • Implicit compilations by javac confusing matters
  • The "one pile" of "classes/" build scheme (related to the implicit issue)
  • Poor automation around source list selections and when and how javah is run
  • Like plastic bags, it was important to avoid sticking your head into the makefiles too completely:

Multiple events happened over time that triggered evolutionary changes to the JDK build processes. I don't have actual dates for these, but you get the general idea:

  • Hotspot, fastdebug builds, plug&play shared libraries (removal of java_g)

    These may seem unrelated, but when the Hotspot team started building "fastdebug" VM libraries (using -g -O and including assertions), that could just be plugged into any JDK image, that was a game changer. It became possible to plug&play native components when building this way, instead of the java_g builds where all the components had to be built the same way, an all or nothing run environment that was horribly slow and limiting. So we tried to create a single build flow, with just variations on the builds (I sometimes called them "build flavors" of product, debug, and fastdebug). Of course, at the same time, the machines got faster, and perhaps now using a complete debug build of a jdk make more sense? In any case, that fastdebug event influenced matters. We do want different build flavors, but we don't want separate make logic for them.

  • Ant scripts

    Ant scripts seem to create intense discussions. I originally found Ant tolerable, and actually pretty efficient for small vanilla Java projects. The fact that IDEs (like NetBeans) worked so well with them also made them interesting, I jumped on board and let the ants crawl all over me, or rather my manager had me work on the JavaFX Ant scripts and they crawled all over me. :^( (Trust me, large disjoint sets of Ant scripts are very hard to manage.) Over time, I discovered that the Ant "<parallel>" capabilities are not useful, the XML scripting is verbose and difficult to use when you go beyond that simple Java project model, and I probably tracked down more issues involving Ant and Ant scripts since they were introduced into the JDK build than any other change to the build process. Ant can be very tricky. It is a Java app, which is kind of cool, but when you are actually building a JDK, it becomes more of an annoyance. I have always resisted any situation where Make runs Ant which runs Make, or where Ant runs Make which runs Ant, I figured my sanity was more important. So the bottom line is, I think we need a better answer, and it probably looks more like Make than Ant, and will likely not even include Ant. Sorry if that offends the Ant lovers. :^(

  • Peabody (JRL) builds and the need to build on a wider variety of platforms, and by a larger population of developers.

    Before Peabody and the JRL sources were exposed, the number of developers was limited and the fact that it was so difficult to build wasn't as big a factor. A developer would spend a day or so getting his system ready to do a build, and then would never have to look at it again, until he had a new system. It was a one time build setup, per developer. But as the number of people wanting to build the JDK exploded (note I said "build" not develop) it was obvious that the complex build setups were more of a problem. In addition, the antiquated Linux/Solaris/Windows systems used previously did not match what this new crop of JDK builders had access to. The various makefile sanity checks that worked so well for RE now needed to become warnings instead of fatal build errors.

  • OpenJDK, and a further explosion of developers and builders of the source.

    These same build issues continued with OpenJDK, we tried to make life easier, and provided the README-builds.html document to try and help guide people. But we knew, and we were being told day after day, it was not as good as it could be.

  • Separate repos for hotspot, langtools, jaxp, jaxws, corba, ...

    The fact that we used nested repositories has been an issue from day one. The isolation it provides for various teams and the extra dimension to distributed development it creates does have it's benefits, but it also comes with some issues. Many tools that say they "support Mercurial" don't actually support nested repositories, and without using something like the Mercurial "subrepos" functionality, coordinating changesets in between repositories is still an issue. The Forest Extension upon which we relied has not become any kind of official extension and continues to be problematic. But for now we are sticking with nested repositories but expect some changes in the future here.

  • The jaxp and jaxws source drop bundles

    These source drops served their purpose, but we need a better answer here. It does create a complexity to building that needs to be fixed.

  • The complications of building a product from open and closed sources

    This is unique to our JDK builds that use all or most of the OpenJDK sources but then augment the build with additional sources. We have tried to have this not impact the actual overall build process, this is our pain, but certainly impacts what we do in the OpenJDK from time to time.

  • And in comes GNU make's "$(eval)" function.

    At the time I thought this feature was significant, I did some experiments and discovered that it only worked well with GNU make 3.81, and we seemed to be stuck on GNU make 3.78.1 or 3.79, and in some cases people were using GNU make 3.80. This GNU Make eval function allows for the value to be evaluated and parsed again as make syntax. This might not seem to be a sane or important feature, but it actually is a very special and powerful feature. So an effort was made to get us up-to-date and working well on GNU make 3.81. In any case, this turned out to be a critical feature for what people will see moving forward in the Build Infrastructure Project.

So this is what we need to keep in mind:

  • Different build flavors, same build flow
  • Ability to use 'make -j N' on large multi-CPU machines is critical, as is being able to quickly and reliably get incremental builds done, this means:
    • target dependencies must be complete and accurate
    • nested makes should be avoided
    • ant scripts should be avoided for multiple reasons (it is a form of nested make), but we need to allow for IDE builds at the same time
    • rules that generate targets will need to avoid timestamp changes when the result has not changed
    • Java package compilations need to be made parallel and we also need to consider some kind of javac server setup (something that had been talked about a long time ago)
  • Continued use of different compilers: gcc/g++ (various versions), Sun Studio (various versions), and Windows Visual Studio (various versions)
  • Allow for clean cross compilation, this means making sure we just build it and not run it as part of the build
  • Nested repositories need to work well, so we need a way to share common make logic between repositories
  • The build dependencies should be managed as part of the makefiles

So as the Build Infrastructure Project progresses, expect some revolutionary changes.

-kto

Monday Nov 29, 2010

JDK7: EA Java vendor properties have changed

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

A few months back I tried to give a heads up on the so called "rebranding" efforts in JDK7 where the patterns "Sun Microsystems, Inc." were being changed to "Oracle Corporation". Well, with a little sadness in my heart, the bulk of these rebranding changes have happened, not all of them, but enough of the major ones to at least start shaking out potential issues. The OpenJDK7 sources (including the source bundles) and the JDK7 binary bundles should include most of the changes to the values of the various "vendor" java properties and also the Windows COMPANY file properties.

Specifically, in our latest JDK7 EA binaries, these Java system properties now contain:

java.vendor = Oracle Corporation
java.vendor.url = http://java.oracle.com/
java.vm.vendor = Oracle Corporation
java.specification.vendor = Oracle Corporation
java.vm.specification.vendor = Oracle Corporation

NOTE: this only applies to JDK7 and newer releases.

If you experience any problems with these changes or the latest JDK7 EA builds, we really want to hear from you.

-kto

Thursday Sep 23, 2010

JDK7: Java vendor property changes

Very soon now, the OpenJDK7 sources (including the source bundles) and the JDK7 binary bundles will include changes to the values of the various "vendor" java properties and also the Windows COMPANY file properties.

If you have code that depends on any of these settings starting with "Sun", now might be the time to include "Oracle" as part of that comparison so that your use of OpenJDK7 or JDK7 builds will operate properly.

Specifically, these Java system properties currently contain:

java.vendor = Sun Microsystems Inc.
java.vendor.url = http://java.sun.com/
java.vm.vendor = Sun Microsystems Inc.
java.specification.vendor = Sun Microsystems Inc.
java.vm.specification.vendor = Sun Microsystems Inc.
And will soon contain:
java.vendor = Oracle Corporation
java.vendor.url = http://java.oracle.com/
java.vm.vendor = Oracle Corporation
java.specification.vendor = Oracle Corporation
java.vm.specification.vendor = Oracle Corporation

In addition, on Windows, the DLL and EXE files will have their COMPANY file property changed to match the java.vendor property value.

There are other vendor properties throughout the JDK7 implementation that will also be changing, but it may take some time to complete all the other changes.

NOTE: this only applies to JDK7 and newer releases.

-kto

Monday Jan 11, 2010

Whack a Mole Testing

Of late I seem to have entered a Twilight Zone game of Whack a Mole with the jdk tests. It appears that the odds that a test can fail on some particular OS or machine, with or without any jdk change is higher than I ever thought possible. Very frustrating. Why is that? I have a list of possible contributing factors:

  • Some tests are just downright unpredictable and need to be fixed.
  • Minor differences in the environment variable settings can cause failures.
  • Minor differences in the machine or OS configurations can cause failures.
  • Some tests are rarely run on all platform (OS/ARCH) combinations.
  • The ramification of any jdk change is often poorly understood, and all jdk tests, on all platforms are rarely run.
  • The use of jtreg -samevm can influence the stability of a testrun, if any test is not 'samevm' safe.
  • Using the same machine and/or same user to run multiple sets of tests can also influence the stability of a testrun, if any test is using shared resources (like port numbers or shared directories like /tmp).

Maybe I just have too much Fuzz (unpredictable environment) in the test runs? I can certainly nail down many of the above items to increase stability, but until I have some kind of Hudson or continuous build/test system watching every changeset, this will be a difficult task. And to be effective, it would need to build and test all possible platforms, a platform list that seems to be growing lately.

I found this cartoon from the health care industry (that's scary), anyway I thought was appropriate for product releases too:

If I knew how to draw cartoons, I'd draw one for the testing matrix.

-kto

Thursday Dec 10, 2009

Mercurial Forest: Pet Shell Trick of the Day

Now be careful with this, but here is a simple bash shell script that will do Mercurial hg commands on a forest pretty quickly. It assumes that the forests are no deeper than 3, e.g. \*/\*/\*/.hg. Every hg command is run in parallel shell processes, so very large forests might be dangerous, use at your own risk:

#!/bin/sh

# Shell script for a fast parallel forest command

tmp=/tmp/forest.$$
rm -f -r ${tmp}
mkdir -p ${tmp}

# Remove tmp area on A. B. Normal termination
trap 'rm -f -r ${tmp}' EXIT

# Only look in specific locations for possible forests (avoids long searches)
hgdirs=`ls -d ./.hg ./\*/.hg ./\*/\*/.hg ./\*/\*/\*/.hg 2>/dev/null`

# Derive repository names from the .hg directory locations
repos=""
for i in ${hgdirs} ; do
  repos="${repos} `echo ${i} | sed -e 's@/.hg$@@'`"
done

# Echo out what repositories we will process
echo "# Repos: ${repos}"

# Run the supplied command on all repos in parallel, save output until end
n=0
for i in ${repos} ; do
  n=`expr ${n} '+' 1`
  (
    cline="hg --repository ${i} $\*"
    echo "# ${cline}"
    eval "${cline}"
    echo "# exit code $?"
  ) > ${tmp}/repo.${n} ; cat ${tmp}/repo.${n} &
done

# Wait for all hg commands to complete
wait

# Cleanup
rm -f -r ${tmp}

# Terminate with exit 0 all the time (hard to know when to say "failed")
exit 0

Run it like: hgforest status, or hgforest pull -u.

As always should any member of your IMF forest be caught or killed, the secretary will disavow all knowledge of your actions. This tape will self-destruct in five seconds. Good luck Jim.

-kto

Monday Dec 07, 2009

Have I become a Fuzz Tester?

Everything was a little fuzzy in Livermore, California this morning:

A very rare snowfall for this area, we are at 500 feet above sea level, snow was falling as low as 200 feet early this morning. But that's just my "Fuzz" picture intro...

I read an article on Secure Software Testing and it talked about Fuzz Testing, and now I'm wondering if I've become a Fuzz Tester, and what exactly does that mean? :( Should I see a doctor? "Fuzz", what a funny word, back in the 70's that was one of the more polite slang words for the Police.

Years ago, when I worked on C compilers, one of the first things my fellow Fortran compiler developer did to test my C compiler was to feed it a Fortran source file as input. If my compiler core dumped instead of generating an error message, it would have failed the "garbage input" test. I consequently would cd into his home directory and run "f77 \*", but I could never get his damn Fortran compiler to crash, maybe it just liked eating garbage. ;\^) Anyway, it appears this kind of "garbage input" testing is a form of "Fuzz Testing", feed your app garbage input and see if you can get it to misbehave.

So back to my primary topic at hand, OpenJDK testing. Lately I've been trying to get to a point where running the jdk tests is predictable. I've done that by weeding out tests that are known failures or unpredictable, and running and re-running the same tests over and over and over, on different systems and with slightly different configurations (-server, -client, -d64).

After reading that Fuzz article, I've come to the conclusion that I've been doing a type of Fuzz Testing, by varying one of the inputs in random ways, the system itself. But that's a silly thought, that would make us all Fuzz Testers, because who runs tests with the exact same system state every time? Unless you saved a private virtual machine image and restarted the system every time, how could you ever be sure the system state was always the same? And even then, depending on how fast the various system services run, even a freshly started virtual machine could have a different state if you started a test 5 seconds later than the last time. And what about the network? There is no way to do that, well I'm sure someone will post a comment telling me how you can do that. And even if you could, what would be the point? If everything is exactly the same, of course it will do the same thing, right? So there is always Fuzz, and you always want some Fuzz, who really wants to be completely Fuzz-less? My brain is now getting Fuzzy. :\^(

When looking at the OpenJDK testing, I just want to be able to run a set of robust tests, tests that are immune to a little "system fuzz". In particular system fuzz created by the test itself or it's related tests, "self induced fuzz failures" seems to be a common problem. Sounds like some kind of disease, H1Fuzz1, keep that hand sanitizer ready. Is it reasonable to expect tests to be free of "self induced fuzz failures"? Or should testing guarantee a fuzz free environment and only run a test one at time?

I'm determined to avoid tests with H1Fuzz1, or find a cure. So I recently pushed some changes into the jdk7/tl forest jdk repository:

http://hg.openjdk.java.net/jdk7/tl/jdk/rev/af9346401220

With more improvements on the way, and the same basic change is planned for OpenJDK6. This should make it easier to validate your jdk build by running only the jdk regression/unit tests in the repository that are free of H1Fuzz1. They also should run as quickly as possible. You will need a built jdk7 image and also the jtreg tool installed. To download and install the latest jtreg tool do the following:

wget http://www.java.net/download/openjdk/jtreg/promoted/b03/jtreg-4_0-bin-b03-31_mar_2009.zip
unzip jtreg-4_0-bin-b03-31_mar_2009.zip
export JT_HOME=`pwd`/jtreg

Build the complete jdk forest, or just the jdk repository:

gmake
-OR-
cd jdk/make && gmake ALT_JDK_IMPORT_PATH=previously_built_jdk7_home all images

Then run all the H1Fuzz1-free tests:

cd jdk/test && gmake -k jdk_all [PRODUCT_HOME=jdk7_home]

There are various batches of tests, jdk_all runs all of them and if your machine can handle it, use gnumake -j 4 jdk_all to run up to 4 of the batches in parallel. Some batches are run with jtreg -samevm for faster results. The tests in the ProblemList.txt file are not run, and hopefully efforts are underway to reduce the size of this list by curing H1Fuzz1.

As to the accuracy of the ProblemList.txt file, it could be wrong, and I may have slandered some perfectly good tests by accusing them of having H1Fuzz1. My apologies in advance, let me know if any perfectly passing tests made the list and I will correct it. It is also very possible that I missed some tests, so if you run into tests that you suspect might have H1Fuzz1, we can add them to the list. On the other hand, curing H1Fuzz1 is a better answer. ;\^)

That's enough Fuzz Buzz on H1Fuzz1.

-kto

About

Various blogs on JDK development procedures, including building, build infrastructure, testing, and source maintenance.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today