Sunday Jun 24, 2012

JPRT: A Build & Test System

DRAFT

A while back I did a little blogging on a system called JPRT, the hardware used and a summary on my java.net weblog. This is an update on the JPRT system.

JPRT ("JDK Putback Reliablity Testing", but ignore what the letters stand for, I change what they mean every day, just to annoy people :\^) is a build and test system for the JDK, or any source base that has been configured for JPRT. As I mentioned in the above blog, JPRT is a major modification to a system called PRT that the HotSpot VM development team has been using for many years, very successfully I might add. Keeping the source base always buildable and reliable is the first step in the 12 steps of dealing with your product quality... or was the 12 steps from Alcoholics Anonymous... oh well, anyway, it's the first of many steps. ;\^)

Internally when we make changes to any part of the JDK, there are certain procedures we are required to perform prior to any putback or commit of the changes. The procedures often vary from team to team, depending on many factors, such as whether native code is changed, or if the change could impact other areas of the JDK. But a common requirement is a verification that the source base with the changes (and merged with the very latest source base) will build on many of not all 8 platforms, and a full 'from scratch' build, not an incremental build, which can hide full build problems. The testing needed varies, depending on what has been changed.

Anyone that was worked on a project where multiple engineers or groups are submitting changes to a shared source base knows how disruptive a 'bad commit' can be on everyone. How many times have you heard:
"So And So made a bunch of changes and now I can't build!".
But multiply the number of platforms by 8, and make all the platforms old and antiquated OS versions with bizarre system setup requirements and you have a pretty complicated situation (see http://download.java.net/jdk6/docs/build/README-builds.html).

We don't tolerate bad commits, but our enforcement is somewhat lacking, usually it's an 'after the fact' correction. Luckily the Source Code Management system we use (another antique called TeamWare) allows for a tree of repositories and 'bad commits' are usually isolated to a small team. Punishment to date has been pretty drastic, the Queen of Hearts in 'Alice in Wonderland' said 'Off With Their Heads', well trust me, you don't want to be the engineer doing a 'bad commit' to the JDK. With JPRT, hopefully this will become a thing of the past, not that we have had many 'bad commits' to the master source base, in general the teams doing the integrations know how important their jobs are and they rarely make 'bad commits'. So for these JDK integrators, maybe what JPRT does is keep them from chewing their finger nails at night. ;\^)

Over the years each of the teams have accumulated sets of machines they use for building, or they use some of the shared machines available to all of us. But the hunt for build machines is just part of the job, or has been. And although the issues with consistency of the build machines hasn't been a horrible problem, often you never know if the Solaris build machine you are using has all the right patches, or if the Linux machine has the right service pack, or if the Windows machine has it's latest updates. Hopefully the JPRT system can solve this problem. When we ship the binary JDK bits, it is SO very important that the build machines are correct, and we know how difficult it is to get them setup. Sure, if you need to debug a JDK problem that only shows up on Windows XP or Solaris 9, you'll still need to hunt down a machine, but not as a regular everyday occurance.

I'm a big fan of a regular nightly build and test system, constantly verifying that a source base builds and tests out. There are many examples of automated build/tests, some that trigger on any change to the source base, some that just run every night. Some provide a protection gateway to the 'golden' source base which only gets changes that the nightly process has verified are good. The JPRT (and PRT) system is meant to guard the source base before anything is sent to it, guarding all source bases from the evil developer, well maybe 'evil' isn't the right word, I haven't met many 'evil' developers, more like 'error prone' developers. ;\^) Humm, come to think about it, I may be one from time to time. :\^{ But the point is that by spreading the build up over a set of machines, and getting the turnaround down to under an hour, it becomes realistic to completely build on all platforms and test it, on every putback. We have the technology, we can build and rebuild and rebuild, and it will be better than it was before, ha ha... Anybody remember the Six Million Dollar Man? Man, I gotta get out more often.. Anyway, now the nightly build and test can become a 'fetch the latest JPRT build bits' and start extensive testing (the testing not done by JPRT, or the platforms not tested by JPRT).

Is it Open Source? No, not yet. Would you like to be? Let me know. Or is it more important that you have the ability to use such a system for JDK changes?

So enough blabbering on about this JPRT system, tell me what you think.
And let me know if you want to hear more about it or not.

Stay tuned for the next episode, same Bloody Bat time, same Bloody Bat channel. ;\^)

-kto

Wednesday May 25, 2011

Build Infrastructure Project

So the Build Infrastructure Project for OpenJDK has finally gotten started. Please go to the project page for details, this is just some ramblings and babbling that people may find interesting. Hopefully not containing too many outright lies and mistakes, but if there are any, they belong to me and me alone. Most importantly:

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

So what is this Build Infrastructure project all about? Does "makefiles".equals("BuildInfrastructure")? No, but close, maybe "BuildInfrastructure".startsWith("makefiles")&&"BuildInfrastructure".endsWith("makefiles") :^)

For a long time now, most changes to the JDK makefiles and build process has been evolving slowly, some may say glacially, and it has certainly been fragmented. :^( I've been involved for many years now in trying to do some simplifications, and changes to the original Sun JDK Makefiles via the initial Peabody (JRL) project, and then OpenJDK.

I can't speak to the very original JDK Makefiles (JDK 1.1), but from where I entered the picture (and this is just my opinion) the makefiles for the most part, served the Release Engineering (RE) team for building the product, and developers just had to navigate around and find useful patterns that allowed them to get their job done, often times only building small parts of the JDK any way they could (and avoiding complete JDK builds at all costs). The important builds came from RE, and as long as their builds were successful, always from scratch, all was well with the world. But the developers suffered from:

  • No good or reliable incremental builds
  • Slow builds of Java source
  • Incorrect assumptions about *.java timestamps and what needs to be compiled
  • Implicit compilations by javac confusing matters
  • The "one pile" of "classes/" build scheme (related to the implicit issue)
  • Poor automation around source list selections and when and how javah is run
  • Like plastic bags, it was important to avoid sticking your head into the makefiles too completely:

Multiple events happened over time that triggered evolutionary changes to the JDK build processes. I don't have actual dates for these, but you get the general idea:

  • Hotspot, fastdebug builds, plug&play shared libraries (removal of java_g)

    These may seem unrelated, but when the Hotspot team started building "fastdebug" VM libraries (using -g -O and including assertions), that could just be plugged into any JDK image, that was a game changer. It became possible to plug&play native components when building this way, instead of the java_g builds where all the components had to be built the same way, an all or nothing run environment that was horribly slow and limiting. So we tried to create a single build flow, with just variations on the builds (I sometimes called them "build flavors" of product, debug, and fastdebug). Of course, at the same time, the machines got faster, and perhaps now using a complete debug build of a jdk make more sense? In any case, that fastdebug event influenced matters. We do want different build flavors, but we don't want separate make logic for them.

  • Ant scripts

    Ant scripts seem to create intense discussions. I originally found Ant tolerable, and actually pretty efficient for small vanilla Java projects. The fact that IDEs (like NetBeans) worked so well with them also made them interesting, I jumped on board and let the ants crawl all over me, or rather my manager had me work on the JavaFX Ant scripts and they crawled all over me. :^( (Trust me, large disjoint sets of Ant scripts are very hard to manage.) Over time, I discovered that the Ant "<parallel>" capabilities are not useful, the XML scripting is verbose and difficult to use when you go beyond that simple Java project model, and I probably tracked down more issues involving Ant and Ant scripts since they were introduced into the JDK build than any other change to the build process. Ant can be very tricky. It is a Java app, which is kind of cool, but when you are actually building a JDK, it becomes more of an annoyance. I have always resisted any situation where Make runs Ant which runs Make, or where Ant runs Make which runs Ant, I figured my sanity was more important. So the bottom line is, I think we need a better answer, and it probably looks more like Make than Ant, and will likely not even include Ant. Sorry if that offends the Ant lovers. :^(

  • Peabody (JRL) builds and the need to build on a wider variety of platforms, and by a larger population of developers.

    Before Peabody and the JRL sources were exposed, the number of developers was limited and the fact that it was so difficult to build wasn't as big a factor. A developer would spend a day or so getting his system ready to do a build, and then would never have to look at it again, until he had a new system. It was a one time build setup, per developer. But as the number of people wanting to build the JDK exploded (note I said "build" not develop) it was obvious that the complex build setups were more of a problem. In addition, the antiquated Linux/Solaris/Windows systems used previously did not match what this new crop of JDK builders had access to. The various makefile sanity checks that worked so well for RE now needed to become warnings instead of fatal build errors.

  • OpenJDK, and a further explosion of developers and builders of the source.

    These same build issues continued with OpenJDK, we tried to make life easier, and provided the README-builds.html document to try and help guide people. But we knew, and we were being told day after day, it was not as good as it could be.

  • Separate repos for hotspot, langtools, jaxp, jaxws, corba, ...

    The fact that we used nested repositories has been an issue from day one. The isolation it provides for various teams and the extra dimension to distributed development it creates does have it's benefits, but it also comes with some issues. Many tools that say they "support Mercurial" don't actually support nested repositories, and without using something like the Mercurial "subrepos" functionality, coordinating changesets in between repositories is still an issue. The Forest Extension upon which we relied has not become any kind of official extension and continues to be problematic. But for now we are sticking with nested repositories but expect some changes in the future here.

  • The jaxp and jaxws source drop bundles

    These source drops served their purpose, but we need a better answer here. It does create a complexity to building that needs to be fixed.

  • The complications of building a product from open and closed sources

    This is unique to our JDK builds that use all or most of the OpenJDK sources but then augment the build with additional sources. We have tried to have this not impact the actual overall build process, this is our pain, but certainly impacts what we do in the OpenJDK from time to time.

  • And in comes GNU make's "$(eval)" function.

    At the time I thought this feature was significant, I did some experiments and discovered that it only worked well with GNU make 3.81, and we seemed to be stuck on GNU make 3.78.1 or 3.79, and in some cases people were using GNU make 3.80. This GNU Make eval function allows for the value to be evaluated and parsed again as make syntax. This might not seem to be a sane or important feature, but it actually is a very special and powerful feature. So an effort was made to get us up-to-date and working well on GNU make 3.81. In any case, this turned out to be a critical feature for what people will see moving forward in the Build Infrastructure Project.

So this is what we need to keep in mind:

  • Different build flavors, same build flow
  • Ability to use 'make -j N' on large multi-CPU machines is critical, as is being able to quickly and reliably get incremental builds done, this means:
    • target dependencies must be complete and accurate
    • nested makes should be avoided
    • ant scripts should be avoided for multiple reasons (it is a form of nested make), but we need to allow for IDE builds at the same time
    • rules that generate targets will need to avoid timestamp changes when the result has not changed
    • Java package compilations need to be made parallel and we also need to consider some kind of javac server setup (something that had been talked about a long time ago)
  • Continued use of different compilers: gcc/g++ (various versions), Sun Studio (various versions), and Windows Visual Studio (various versions)
  • Allow for clean cross compilation, this means making sure we just build it and not run it as part of the build
  • Nested repositories need to work well, so we need a way to share common make logic between repositories
  • The build dependencies should be managed as part of the makefiles

So as the Build Infrastructure Project progresses, expect some revolutionary changes.

-kto

Thursday Sep 23, 2010

JDK7: Java vendor property changes

Very soon now, the OpenJDK7 sources (including the source bundles) and the JDK7 binary bundles will include changes to the values of the various "vendor" java properties and also the Windows COMPANY file properties.

If you have code that depends on any of these settings starting with "Sun", now might be the time to include "Oracle" as part of that comparison so that your use of OpenJDK7 or JDK7 builds will operate properly.

Specifically, these Java system properties currently contain:

java.vendor = Sun Microsystems Inc.
java.vendor.url = http://java.sun.com/
java.vm.vendor = Sun Microsystems Inc.
java.specification.vendor = Sun Microsystems Inc.
java.vm.specification.vendor = Sun Microsystems Inc.
And will soon contain:
java.vendor = Oracle Corporation
java.vendor.url = http://java.oracle.com/
java.vm.vendor = Oracle Corporation
java.specification.vendor = Oracle Corporation
java.vm.specification.vendor = Oracle Corporation

In addition, on Windows, the DLL and EXE files will have their COMPANY file property changed to match the java.vendor property value.

There are other vendor properties throughout the JDK7 implementation that will also be changing, but it may take some time to complete all the other changes.

NOTE: this only applies to JDK7 and newer releases.

-kto

Monday Dec 07, 2009

Have I become a Fuzz Tester?

Everything was a little fuzzy in Livermore, California this morning:

A very rare snowfall for this area, we are at 500 feet above sea level, snow was falling as low as 200 feet early this morning. But that's just my "Fuzz" picture intro...

I read an article on Secure Software Testing and it talked about Fuzz Testing, and now I'm wondering if I've become a Fuzz Tester, and what exactly does that mean? :( Should I see a doctor? "Fuzz", what a funny word, back in the 70's that was one of the more polite slang words for the Police.

Years ago, when I worked on C compilers, one of the first things my fellow Fortran compiler developer did to test my C compiler was to feed it a Fortran source file as input. If my compiler core dumped instead of generating an error message, it would have failed the "garbage input" test. I consequently would cd into his home directory and run "f77 \*", but I could never get his damn Fortran compiler to crash, maybe it just liked eating garbage. ;\^) Anyway, it appears this kind of "garbage input" testing is a form of "Fuzz Testing", feed your app garbage input and see if you can get it to misbehave.

So back to my primary topic at hand, OpenJDK testing. Lately I've been trying to get to a point where running the jdk tests is predictable. I've done that by weeding out tests that are known failures or unpredictable, and running and re-running the same tests over and over and over, on different systems and with slightly different configurations (-server, -client, -d64).

After reading that Fuzz article, I've come to the conclusion that I've been doing a type of Fuzz Testing, by varying one of the inputs in random ways, the system itself. But that's a silly thought, that would make us all Fuzz Testers, because who runs tests with the exact same system state every time? Unless you saved a private virtual machine image and restarted the system every time, how could you ever be sure the system state was always the same? And even then, depending on how fast the various system services run, even a freshly started virtual machine could have a different state if you started a test 5 seconds later than the last time. And what about the network? There is no way to do that, well I'm sure someone will post a comment telling me how you can do that. And even if you could, what would be the point? If everything is exactly the same, of course it will do the same thing, right? So there is always Fuzz, and you always want some Fuzz, who really wants to be completely Fuzz-less? My brain is now getting Fuzzy. :\^(

When looking at the OpenJDK testing, I just want to be able to run a set of robust tests, tests that are immune to a little "system fuzz". In particular system fuzz created by the test itself or it's related tests, "self induced fuzz failures" seems to be a common problem. Sounds like some kind of disease, H1Fuzz1, keep that hand sanitizer ready. Is it reasonable to expect tests to be free of "self induced fuzz failures"? Or should testing guarantee a fuzz free environment and only run a test one at time?

I'm determined to avoid tests with H1Fuzz1, or find a cure. So I recently pushed some changes into the jdk7/tl forest jdk repository:

http://hg.openjdk.java.net/jdk7/tl/jdk/rev/af9346401220

With more improvements on the way, and the same basic change is planned for OpenJDK6. This should make it easier to validate your jdk build by running only the jdk regression/unit tests in the repository that are free of H1Fuzz1. They also should run as quickly as possible. You will need a built jdk7 image and also the jtreg tool installed. To download and install the latest jtreg tool do the following:

wget http://www.java.net/download/openjdk/jtreg/promoted/b03/jtreg-4_0-bin-b03-31_mar_2009.zip
unzip jtreg-4_0-bin-b03-31_mar_2009.zip
export JT_HOME=`pwd`/jtreg

Build the complete jdk forest, or just the jdk repository:

gmake
-OR-
cd jdk/make && gmake ALT_JDK_IMPORT_PATH=previously_built_jdk7_home all images

Then run all the H1Fuzz1-free tests:

cd jdk/test && gmake -k jdk_all [PRODUCT_HOME=jdk7_home]

There are various batches of tests, jdk_all runs all of them and if your machine can handle it, use gnumake -j 4 jdk_all to run up to 4 of the batches in parallel. Some batches are run with jtreg -samevm for faster results. The tests in the ProblemList.txt file are not run, and hopefully efforts are underway to reduce the size of this list by curing H1Fuzz1.

As to the accuracy of the ProblemList.txt file, it could be wrong, and I may have slandered some perfectly good tests by accusing them of having H1Fuzz1. My apologies in advance, let me know if any perfectly passing tests made the list and I will correct it. It is also very possible that I missed some tests, so if you run into tests that you suspect might have H1Fuzz1, we can add them to the list. On the other hand, curing H1Fuzz1 is a better answer. ;\^)

That's enough Fuzz Buzz on H1Fuzz1.

-kto

Wednesday Nov 25, 2009

Faster OpenJDK Build Tricks

Here are a few tips and tricks to get faster OpenJDK builds.

  • RAM

    RAM is cheap, if you don't have at least 2Gb RAM, go buy yourself some RAM for Xmas. ;\^)

  • LOCAL DISK

    Use local disk if at all possible, the difference in build time is significant. This mostly applies to the repository or forest you are building (and where the build output is also landing). Also, to a lesser degree, frequently accessed items like the boot jdk (ALT_BOOTDIR). Local disk is your best choice, and if possible /tmp on some systems is even better.

  • PARALLEL_COMPILE_JOBS=N

    This make variable (or environment variable) is used by the jdk repository and should be set to the number of native compilations that will be run in parallel when building native libraries. It does not apply to Windows. This is a very limited use of the GNU make -j N option in the jdk repository, only addressing the native library building. A recommended setting would be the number of cpus you have or a little higher, but no more than 2 times the number of cpus. Setting this too high can tend to swamp a machine. If all the machines you use have at least 2 cpus, using a standard value of 4 is a reasonable setting. The default is 1.

  • HOTSPOT_BUILD_JOBS=N

    Similar to PARALLEL_COMPILE_JOBS, this one applies to the hotspot repository, however, hotspot uses the GNU make -j N option at a higher level in the Makefile structure. Since more makefile rules get impacted by this setting, there may be a higher chance of a build failure using HOTSPOT_BUILD_JOBS, although reports of problems have not been seen for a while. Also does not apply to Windows. A recommended setting would be the same as PARALLEL_COMPILE_JOBS. The default is 1.

  • NO_DOCS=true

    Skip the javadoc runs unless you really need them.

Hope someone finds this helpful.

-kto

About

Various blogs on JDK development procedures, including building, build infrastructure, testing, and source maintenance.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today