replacing ant with rdf

Tim Boudreau just recently asked "What if we built Java code with...Java?". Why not replace Ant or Maven xml build documents with Java (Groovy/Jruby/jpython/...) scripts? It could be a lot easier to program for Java programmers, and much easier to understand for them too. Why go through xml, when things could be done more simply in a universal language like Java? Good question. But I think it depends on what types of problem one wants to solve. Moving to Java makes the procedural aspect of a build easier to program for a certain category of people. But is that a big enough advantage to warrant a change? Probably not. If we are looking for an improvement, why not explore something really new, something that might resolve some as yet completely unresolved problems at a much higher level? Why not explore what a hyperdata build system could bring to us? Let me start to sketch out some ideas here, very quickly, because I am late on a few other projects I am meant to be working on.

The answer to software becoming more complicated has been to create clear interfaces between the various pieces, and have people specialise in building components to the interfaces. It's the "small is beautiful" philosophy of Unix. As a result though, as software complexity builds up, every piece of software requires more and more pieces of other software, leading us from a system of independent software pieces to networked software. Let me be clear. The software industry has been speaking a lot about software containing networked components and being deployed on the network. This is not what I am pointing to here. No I want to emphasise that the software itself is built of components on the network. Ie. we need more and more a networked build system. This should be a big clue as to why hyperdata can bring something to the table that other systems cannot. Because RDF is a language whose pointer system is build on the Universal Resource Identifier (URI) it eats networked components for lunch, breakfast and dinner. (see my Jazoon presentation).

Currently my subversion repository consists of a lot of lib subdirectories full of jar files taken from other projects. Would it not be better if I referred to these libraries by URL instead? The URL where they can be HTTP gotten from of course? Here are a few advantages:

  • it would use up less space in my SubVersion repository. A pointer just takes up less space than an executable in most cases.
  • it would use up less space on the hard drive of people downloading my code. Why? Because I am referring to the jar via a universal name, a clever IDE will be able to use the local cached version already downloaded for another tool.
  • it would make setting up IDE's a lot easier. Again because each component now has a Universal Name, it will be possible to link up jars to their source code once only.
  • the build process, describing as it does how the code relates to the source, can be used by IDEs to jump to the source (also identified via URLs) when debugging a library on the network. (see some work I started on a bug ontology called Baetle)
  • Doap files can be then used to tie all these pieces together, allowing people to just drag and drop projects from a web site onto their IDE, as I demonstrated with Netbeans
  • as IDE gain knowledge of which components are successors to which other components, from such DOAP files, it is easy to imagine them developing RSS like functionality, where it scans the web for updates to your software components, and alerts you to those updates which you can then test out quickly yourself.
  • The system can be completely decentralised, making it a WEB 3.0 system, rather than a web 2.0 system. It should be as easy as having to place your components and your RDF file on a web server served up with the correct mime types.
  • It will be easy to link up jars or source code ( referred to as usual by URLs ) to bugs (described via something like Baetle ). Making it easy to describe how bugs in one project depend on bugs in other projects.

So here are just a few of the advantages that a hyperdata based build system could bring. They seem important enough in my opinion to justify exploring this in more detail. Ok. Well, let me try something here. When compiling files one needs the following: a classpath and a number of source files.

@prefix java: <http://rdf.sun.com/java/> .

_:cp a java:ClassPath;
       java:contains ( <http://apache.multidist.com/cocoon/2.1.11> <http://openrdf.org/sesame/2.0/> ) .

_:outputJar a java:Jar;
       java:buildFrom <src>;
       java:classpath _:cp .

_:outputJar 
        :pathtemplate "dist/${date}/myprog.jar";
        :fullList <outputjars.rdf> .
If the publication mechanism is done correctly the relative URLs should work on the file system just as well as they do on the http view of the repository. Making a jar would then be a matter of some program following the URLs to download all the pieces (if needed), put them in place and use that to build the code. Clearly this is just a sketch. Perhaps someone else has already had thoughts on this?

Comments:

Hello, Henry and readers.

This issue is of central importance to software development. I've seen companies, here in Brazil, with a team dedicated only to library maintenance. And with big problems in this work.

There are some initiatives in another languages, like python eggs (http://peak.telecommunity.com/DevCenter/PythonEggs)
or ruby gems (http://www.rubygems.org/) that try to address lib dependency, download, etc.

One problem I've seen with this tools is when you try to use different versions of libraries. I think the RDF model should focus on this issue too.

Posted by mario h.c.t. on February 06, 2008 at 05:27 AM CET #

"Would it not be better if I referred to these libraries by URL instead?"

For a solution right now, maybe svn:externals does partially what you want?
(http://svnbook.red-bean.com/en/1.4/svn.advanced.externals.html)

Posted by Ewout ter Haar on February 06, 2008 at 10:18 AM CET #

Ewout, thanks for pointing out that very nice subversion functionality. I'll see if I can make use of it in my projects...

Does svn:external require everyone you link to to be using subversion, or for them to publish using WebDAV? That would make it a bit less likely for it to be able to provide a global solution to the problem, even if it is very neat one. URLs and HTTP are the backbone of the web. So RDF is very easy to deploy. WebDav is a lot more complex.

In some ways the two solutions are very close: RDF is often thought of as being a metadata format (it is really a hyperdata format), and as such it can do what WebDAV does, namely specify properties of resources. This is what Subversion makes use of. It finds a piece of metadata on a resource (a directory) which the client (svn) then uses to direct its subsequent behavior. I think I am correct to say that the property values on a WebDav metadata always have as subject the URI being queried (PROPGETed). So we can think of WebDAV of giving us RDF back (in a particular format), but limited to what the subject introspectively knows about itself. And we all know that what we introspectively know about ourselves is only a small portion of what there is to be known about us, witness the size of the population of people making their living off psychotherapy (I am reading way to much on this subject recently).

So RDF allows resources to not just talk about themselves but also about the way it sees other things being related. Here I am thinking a build system may be defined in a very abstract way as requiring certain resources, which combined in a certain way can create new resources, each of these defined using URLs, and in terms of declarative relations of these things to each other.

Another way of looking at this is that the svn:external property is a way to allow software to shoe horn the web into a tree. This is ok for software that is built for a local system, where the assumption is indeed correct. But it seems like it could end up being restrictive.
In what way? Well here is a simple example: it can make reasoning about components more difficult, in part because it ends up giving new names to components instead of using their real names. So instead of using source code in <http://yourcode.com/src/> and knowing that this code has a specific bug database, owner, developers, translators, depends on this other project <http://yourfriend.org/project>, ... the tools you are working with will think they are working with <http://yourcompany.com/src/myfriendsmapped/> . So there is a lot of information they could be loosing here. For one, your IDE may already have built that component previously, and so have no need to build it again.

Also svn:external is just one piece of metadata. It says in effect

<> svn:continues <http://other.com/overthere> .

What the build system I am proposing should do is a lot more. It should be able to specify relationships between software components:

- this components was used to produce this component
- this component was compiled from this source
- this component was transformed in this way

the nature of the components

- this component is a JAR
- this component is source code
...

And that should link up to project information, which links up to mailing lists, people, rss feeds, companies, etc, etc....

Posted by Henry Story on February 06, 2008 at 11:11 AM CET #

This is something I might explore if I were to hit a brick wall with ant, or with whatever build tool I am already using.

Let us know if your proposal yields something that solves your challenges with builds.

Posted by enrique on February 09, 2008 at 06:26 AM CET #

one possible way to achieve this: 'semanticize' Maven's POM (project object model). ie., write a new maven front-end that interprets (local or remote) RDF project metadata instead of the XML that it currently uses.

then add extra maven plugins that take advantage of the RDF config to infer possible "actions" in the presence of the activated maven plugins: compile, package, install, deploy, etc...

you can look at mvn POM's interface in mvn's source code - it is IMO a reasonable and extendable model for any kind of software development project.

i'm looking forward to a convergence of Maven, Spring, and OSGi -- all controlled by a unified development environment that makes any dev process "simple" (describing everything in the simplest words possible).

Posted by Reri Uwerj on February 09, 2008 at 01:05 PM CET #

Continuing on the idea of semanticising Maven POM's, those who have time to
look at this should check out GRDDL [1]. GRDDL allows you to think of existing
XML files as encoding RDF.

Considering Maven I think what I am suggesting goes beyond Maven in that there
would be no need for a central repository of libraries. (though of course just
as on the web we have archive.org, someone could set up an archive service with
a SPARQL endpoint to help people find older versions of libraries)

For those who do feel like exploring an rdf build system here is how would go
around exploring this:
1. start with a very simple build example.
1.1 write out the declarative relations in N3 needed a build system would need
to build it (something like what I did above)
1.2. read such a file with sesame and map it to java objects with so(m)mer [2]
and use that information to build your project locally
1.3 make sure you set up your build file with relative URLs in such a way that
it works when you build your tool from the local checked out directory, but
also when build by downloading the same file from the web using HTTP. Ie the
links should work both locally and remotely. An enhanced java compiler should
be able to download all the .java files from the web and compile them just as
if they had been accessed locally

2. Having gotten the above to work, increase the complexity of the example in
1. and start again. At this point it would probably be worth looking at reusing
Ant modules or Maven Poms.
2.1 start publishing the an ontology of the relations you are finding useful at
the location of their URLS. So that people who browse the rdf build files you
are creating can understand the meaning of the elements of the file simply by
clicking on the relations or classes themselves. Just as if you want to know
the meaning of foaf:knows you can just click on http://xmlns.com/foaf/0.1/knows
. (Use #urls, they are easier to set up)
2.2 start an open source project where other people can participate and bring
feedback. (Don't expect huge amounts of people to come and help out - even if
people give some feedback that can be helpful. I certainly will try to)

3. See if one can get this to work where one RDF build file points to another
build file. Where does this make sense? Where not?
Can all the above then be linked to from a Doap file?

4. If the above seems to be working out and proving to be useful, it may then be worth creating XSLT GRDDL transforms that could turn normal Ant or Maven files into such rdf build files. It should then be obvious that those are just variational notations of this new hyperdata build system being put together. This should show I suppose is that And and Maven were movements in the right direction down the declarative route, but that they did not take advantage of the web of data, and so did not take advantage of Metcalf's law.

[1] http://www.w3.org/2001/sw/grddl-wg/
[2] http://openrdf.org/
https://sommer.dev.java.net

Posted by guest on February 11, 2008 at 03:14 AM CET #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today