Webifying Integrated Development Environments

IDEs should be browsers of code on a Read Write Web. A whole revolution in how to build code editors is I believe hidden in those words. So let's imagine it. Fiction anticipates reality.

Imagine your favorite IDE, a future version of NetBeans perhaps or IntelliJ, which would make downloading a new project as easy as dragging and dropping a project url onto your IDE. The project home page would point to a description of the location of the code, the dependencies of this project on other projects, described themselves via URL references, which themselves would be set up in a similar manner. Let's imagine further: instead of downloading all the code from CVS, think of every source code document as having a URL on the web. ( Subversion is in fact designed like this, so this is not so far fetched at all.) And let's imagine that NetBeans thinks about each software component primarily via this URL.
Since every piece of code and every library has a URL, the IDE would be able to use RESTful architectural principles of the web. A few key advantages of this are

  • Caching: web architecture is the ability to cache information on the network or locally without ambiguity. This is how your web browser works ( though it could work better ). To illustrate: once a day Google changes its banner image. Your browser and every browser on earth only fetches that picture once a day, even if you do 100 searches. Does Google serve one image to each browser? No! numerous caches (company, country, or other) cache that picture and send it to the browser without sending the request all the way to the search engine, reducing the load on their servers very significantly.
  • Universal names: since every resource has a URL, any resource can relate in one way or another to any other resource wherever it is located. This is what enables hypertext and what is enabling hyperdata.

Back to the IDE. So now that all code, all libraries, can be served up RESTfully in a Resource Oriented Architecture what does this mean to the IDE? Well a lot. Each may seem small, but together they pack a huge punch:
  • No need to download libraries twice: if you have been working on open source projects at all frequently you must have noticed how often the same libraries are found in each of the projects you have downloaded. Apache logging is a good example.
  • No need to download source code: it's on the web! You don't therefore need a local cache of code you have never looked at. Download what you need when you need it (and then cache it!): the Just in Time principle.
  • Describe things globally: Since you have universal identifiers you can now describe how source code relates to documentation, to people working on the code, or anything else in a global way, that will be valid for all. Just describe the resources. There's a framework around just for that, that is very easy to use with the right introduction.

The above advantages may seem rather insignificant. After all, real developers are tough. They use vi. (And I do). So why should they change? Well notice that they also use Adobe Air or Microsoft Silverlight. So productivity considerations do in fact play a very important factor in the software ecosystem.
Don't normal developers just work on a few pieces of code? Well speaking for myself here, I have 62 different projects in my /Users/hjs/Programming directory, and in each of these I often have a handful of project branches. As more and more code is open source, and owned and tested by different organizations, the number of projects available on the web will continue to explode, and due to the laziness principle the number of projects using code from other projects will grow further. Already whole operating systems consisting of many tens of thousands of different modules can be downloaded and compiled. The ones I have downloaded are just the ones I have had the patience to get. Usually this means jumping through a lot of hoops:

  1. I have to finding the web site of the code. And I may only have a jar name to go by. So Google helps. But that is a whole procedure in itself that should be unecessary. If you have an image in your browser you know where it is located by right-clicking over it and selecting the URL. Why not so with code?
  2. Then I have to browse a web page, which may not be written in my language, and find the repository of the source code
  3. Then I have to find the command line to download the source code, or the command in the IDE and also somehow guess which version number produced the jar I am using.
  4. Once downloaded, and this can take some time, I may have to find the build procedure. There are a few out there. Luckily ant and maven are catching on. But some of these files can be very complicated to understand.
  5. Then I have to link the source code on my local file system to the jar on my local file system my project is using. In NetBeans this is exceedingly tedious - sometimes I have found it to be close to impossible even. IntelliJ has a few little tricks to automate some of this, but it can be pretty nasty too, requiring jumping around different forms. Especially if a project has created a large number of little jar files.
  6. And then all that work is only valid for me. Because all references are to files on my local file system, they cannot be published. NetBeans is a huge pain here in that it often creates absolute file URLs in its properties files. By replacing them with relative urls one can get publish some of the results, but at the cost of copying every dependency into the local repository. And working out what is local and what is remote can take up a lot of time. It will work on my system, but not on someone else's.
  7. Once that project downloaded one may discover that it depends on yet another project, and so we have to go back to step 1.

So doing the above is currently causing me huge headaches even for very simple projects. As a result I do it a lot less often than I could, missing valuable opportunities as a result. Each time I download a project in order to access the sources to walk through my code and find a bug, or to test out a new component I have to do all that download rigmarole described above. If you have a deadline, this can be a killer.

So why do we have to tie together all the components on our local file system? This is because the IDE's are not referring to the resources with global identifiers. The owner of the junit project should say somewhere, in his doap file perhaps that:

 
   @prefix java: <http://java.net/ont/java#> . #made this up
   @prefix code: <http://todo.eg/#> .

   <http://project.eg/svn/lib/junit-4.0.jar> a java:Jar;
         code:builtFrom <http://junit.sourceforge.net/> .

   #what would be needed here needs to be worked out more carefully. The point is that we don't
   #at any point refer to any local file.

Because this future IDE we are imagining together will then know that it has stored a local copy of the jar somewhere on the local file system, and because it will know where it placed the local copy of the source code, it will know how the cached jar relates to the cached source code, as illustrated in the diagram above. So just as when you click on a link on your web browser you don't have to do any maintenance to find out where the images and html files are cached on your hard drive, and how one resource (you local copy of an image) relates to the web page, so we should not have to do any of this type of work in our Development Environment either.

From here many other things follow. A couple of years ago I showed how this could be used link source code to bugs, to create a distributed bug database. Recently I showed how one could use this to improve build scripts. Why even download a whole project if you are stepping through code? Why not just fetch the code that you need when you need it from the web? One HTTP GET at a time. The list of functional improvements is endless. I welcome you to list some that you come up with in the comments section below.

If you want to make a big impact in the IDE space, that will be the way to go.

Comments:

Just wondering what you thought of Maven and whether or not it might be headed in the direction you're describing?

Posted by Adam Jordens on June 24, 2008 at 07:11 AM CEST #

Hi Adam,

You are quite right to point to Maven. I have not used Maven enough myself, so do feel free to correct me. I think it clearly constitutes a step in the right direction. Furthermore it should integrate well into the vision I am drawing up here.

For one Maven xml files could be GRDDLable [1] into RDF, so there is no problem there. Maven also uses the network intelligently, and as such has become very popular. What I am proposing is much more of the same. Much more network, much more distributed.
For one think one does not need a central repository at all - and I think maven tends to work that way. At least architecturally nothing should push that way. I can of course see that a very successful service could emerge that came to be used by most people (such as Google for the web) because they offered such a valuable service.

For example the jar files could easily be distributed around the web, on different servers, just as web pages are. You can embed a youtube video in your blog, and there never was a technical problem doing that. (yet YouTube is hosting a very large percentage of online videos)

An IDE would have to support Maven, and would be foolish not to. But it should still deal with the source code downloaded from a maven repository the way a web browser deals with pages, images, code from any all other languages, and other content. Namely by reference using URLs. The downloaded representation on the local hard drive should be though of as a cache of the remote version. This would then allow the IDE to describe the items it is working with in a global namespace, allowing people to describe the relationship of their project to other resources such as people (using foaf), bugs, source code, documentation, specifications, blog posts, images on the web, real world objects, etc, etc,....

Using RDF frameworks such as Sesame or Jena developers now have the tools to build such a layer cleanly.

In short unless I am debugging my IDE, or when I need to work disconnected, I would like to never have to know where on the hard drive my code is located or even if it is there at all! (Of course it should be very easy to find the code locally so that one could use command line tools on it, and do all kinds of other things that we developers like doing. The problem is that we developers tend to be misled by this desire to work with code locally, into forgetting about the global ecosystem it fits in, and so loose sight of the power of the network.)

[1] http://www.w3.org/2001/sw/grddl-wg/

Posted by Henry Story on June 24, 2008 at 08:07 AM CEST #

This what web2py does (almost). We are working on integration with svn, hg, bzr.

Posted by Massimo on June 24, 2008 at 11:58 AM CEST #

Massimo, can you detail a little how web2py does "almost" what this article is describing? It is really not obvious looking at your web site, how that is related to what we are speaking about here. Do you perhaps have a more detailed section of the documentation that makes this clear? We are speaking about IDEs such as NetBeans acting as browsers here... I'll leave it up to you to tell us how this is similar. :-)

Posted by Henry Story on June 24, 2008 at 12:11 PM CEST #

Take a look at Seaside, the Smalltalk web development framework. When you throw an error in the debug mode of deployment, Seaside will put up the smalltalk code inspector, you can browse code, make changes and restart the methods debugging and fixing the code on the fly in your browser. You really have great access to the Smalltalk development environment in the browser.

Posted by Tony Giaccone on June 24, 2008 at 12:26 PM CEST #

Hi Tony,

NetBeans and IntelliJ will also show stack traces in the debug window which will contain references to source code. When stack trace refers to code that you have downloaded and set up properly as I explained in outline in my steps 1-7 above, then it will be a hyperlink to code \*\*on your file system\*\*. Clicking that hyperlinked reference will open that source code in your IDE, at the correct line. I think every half decent IDE does that ( and yes, they all took it either from Smalltalk or from emacs, :-)

But that is not what I am looking for. If Seaside did the following I would be amazed, so please do point me to the docs if this feature exists:

What I would like it to do is to to know where \*\*on the web\*\* the source code for files that was compiled to jars (or dlls) is located, when it is not on the local file system. So that if I find a stack trace that refers to an Apache component (that does not happen often btw), that I could click on a line in that stack trace and the IDE would then do an HTTP GET to the URL at which the source code that compiled that class is located, and present it to me in my browser like any other source code.

Now that would be cool. Because then I would not have to download any source code other than that which I work on a lot, yet it would seem to me as if all the source code in the world was at my fingertips, just like it seems to us that any web page in the world is there at my fingertips.

Posted by Henry Story on June 24, 2008 at 12:48 PM CEST #

Hi Henry, by "almost" I actually meant "opposite". web2py provides an app that works as a web based IDE. I will do even more so in the future when we finish integration of the AMY js editor and versioning. You can access your project on your production or development server anywhere and develop remotely. I realize this is not quite what you are discussing here but it seemed relevant.

Posted by Massimo on June 24, 2008 at 01:02 PM CEST #

Ah ok Massimo, so web2py is a web based IDE. Neat! I think I heard of one other project like this.

Now I suppose for such an application it also would be very interesting if you could automate the process of finding, and downloading of projects and their sources, and linking them to binaries, to documentation, to people and bugs in the way described in this blog post: In a global way. Right? Any of the above would probably make it a lot easier to work with your application.

Given that you are working with a web interface though I imagine that it is much more natural for you to think in terms of URLs.

So will the code I can write with your service only be able to work with other code available on that service, or will it be able to download and work with code anywere in the world?

In any case I think some of what I am talking about here should be very useful to a service like yours.

Posted by Henry Story on June 24, 2008 at 01:29 PM CEST #

Yes this is very interesting. web2py is a framework for web application development and so the web based IDE it comes with only works with the underlying Python language but you are giving us food for thought. In fact the future of web2py is in interoperable plugins that can be referenced via URLs. We are looking into writing specifications on how to do that but we are not quite there yet.

Posted by Massimo on June 24, 2008 at 01:42 PM CEST #

Massimo wrote:

>In fact the future of web2py is in interoperable plugins that can be referenced via URLs. We are looking into writing specifications on how to do that but we are not quite there yet.

Well it would be very useful if we could work out how to do this in a language independent manner (as far as possible). There is some work going on here:

http://baetle.googlecode.com/

It would be good if you added your use case to the stack there, by posting some of this to the mailing list. There are links there to other projects that are developing ontogies for source code.

I suppose the best well known for the moment is the doap ontology

http://usefulinc.com/doap

And the doapspace project

hhttp://doapspace.org/

All still in early days yet. But the community is growing and interest is developing :-)

Posted by Henry Story on June 24, 2008 at 01:53 PM CEST #

ok, thanks. I will look into it.

Posted by Massimo on June 24, 2008 at 02:10 PM CEST #

>But that is not what I am looking for. If Seaside did the following I would be amazed, so please do point me to the docs if this feature exists.

Here you are:
http://www.swa.hpi.uni-potsdam.de/seaside/tutorial?chapter=2#part6

I really recommend this tutorial in lulu.com book form!

Posted by Redmar Kerkhoff on June 24, 2008 at 02:58 PM CEST #

Redmar Kerkhoff wrote:
> Here you are:
http://www.swa.hpi.uni-potsdam.de/seaside/tutorial?chapter=2#part6

Redmar, I looked at that and I see a URL that points to a repository of packages. That is neat.
NetBeans has something similar at
http://plugins.netbeans.org/PluginPortal/faces/MainPage.jsp

I don't think that is so unusual, though it seems the Seaside packages contain source and binaries (though perhaps Smalltalk does not make use of binaries, which would of course help solve one problem, though at the cost of others appearing in different places).

NetBeans also allows one to have packages I think that link source code to binaries, as I can download the NetBeans source code package from the IDE.

What I am looking for is something even more distributed than that. The project I am working on myself at https://sommer.dev.java.net/AddressBook.html is built on quite a few other projects that themselves has dependencies on yet further projects. What I would like is for people to be able to drag that project onto their IDE, and the IDE know where to get all the components \*\*wherever they are on the web\*\*. So there should be no need for them to be in a single repository. They should be available in many different repositories, and similarly the source code should be able to be distributed around the web. The IDE should know where the source code is on the web of each file in each component, so that it could find previous versions of each file automatically, so that it could find the authors wherever they place their foaf file, so that it could find their friends, so that it could find bugs that were posted against that file, and find the dependencies of that bug on other projects, wherever those are.

So on the plus side Seaside uses a URL to identify a repository. But does it use URLs to identify each piece of code, where that URL is not something like http://localhost:8080/.... ?
but the location on the web of that source file? Would it allow some source files to be on http://openrdf.org/ and others on http://apache.org and yet further on http://dev.java.net/ ?

I am not saying it does not. It may. I am looking for expert guidance here.

Henry

Posted by Henry Story on June 24, 2008 at 04:30 PM CEST #

everything you've mentioned is covered by maven.

you can open almost any maven pom.xml and intellij/netbeans/eclipse will download the source for the project, download dependencies, and possibly provide you with deployment tasks. for dependencies, the first thing that gets downloaded and put into your repository (local cache) is the pom, so you can iterate through the cycle to your hearts content.

you can also commit back to the repository as well. this source r/w works for a wide variety of SCM software.

you can tell maven to download source-jars, which will then automatically be used, and you can use maven to build source-jars. it makes indexing the source in debugging automatic. its not "on the web" as you seem to want, its in the source-jar, which maven can fetch automatically.

in leui of http addresses, maven uses artifact ids that should mirror a java projects package names. these should be global (eg, org.apache.Foo).

downloaded copies are thought of as a cache of the global unique identifier.

maven does not have to be centralized. you can hit many servers. your dependencies can be anywhere. you can include centralized repositories and explicitly tell maven not to use them for certain packages. you can include a long list of repositories, one for each package. whatever.

maven wont let you get projects and their source \*\*wherever they are on the web\*\*, but i cannot fathom what on gods green earth would mandate that kind of a requirement. the centralized repositories are very up to the minute, and if thats not good enough, maven will let you check out the source for any project directly from the projects scm and do whatever you want with that checkout. including build it, its source jar, and them both install it into the maven repository.

there are also good artifact repositories, if you want more tailored configurations. at work we all point to a single local repository that caches (on demand) a number of external repositories and collects internal releases. our artifact repository makes it easy to maintain a convenient and customized collection of dependencies.

you never have to know where your dependencies are in maven. (and the answer is always ~/.m2/repository/).

PLEASE stop calling things web-based IDE's unless they are IDE's implemented in the browser.

Posted by rektide on June 24, 2008 at 07:53 PM CEST #

Hi rektide,

thanks for your lengthy response. Clearly I need to master the architecture of Maven a lot better. So I will look into this. There are a few projects that do use maven poms I am working on. I'll try using them and see how much that helps.

> in lieu of http addresses, maven uses artifact ids that should mirror a java projects package names. These should be global (eg, org.apache.Foo).

Ok so we have URNs here, rather than URLS. This probably explains why there are limited number of repositories, as they are probably playing the role of a lookup service to tie names to locations.

> downloaded copies are thought of as a cache of the global unique identifier.

... So Maven could be thought of as the caching layer...

I suppose I was thinking some of this should be closer to the IDE itself. (That would not exclude it being in Maven too). My feeling is that current IDEs don't know where to get resources if they are not available and defined on the file system, in projects that have been imported and that are fully specified. So currently when there is a stack trace that digs into a jar whose code I have not yet downloaded, I don't know any IDE that would fetch the source code off the web.

But that may be because I have not been using IDE's in Maven mode...?

> maven wont let you get projects and their source \*\*wherever they are on the web\*\*, but i cannot fathom what on gods green earth would mandate that kind of a requirement.

Perhaps the "wherever they are on the web" was misleading on my part. I am just suggesting one use http URLs to locate resources. After all that is how we can get most other resources. So in some sense I am wondering why one would need a central repository at all. After all, publishing a jar behind an Apache web server is not very difficult. Same thing for source code. Subversion even builds that into their core.

Perhaps these are some thoughts for Maven 3 :-)

> PLEASE stop calling things web-based IDE's unless they are IDE's implemented in the browser.

I did a quick search on this page, and only found us using the words "web based ide" when talking about web2py, which is indeed implemented in the browser on the web.

Posted by Henry Story on June 25, 2008 at 02:32 AM CEST #

Oh well, if I am going to look at Maven I will have to look at Apache's Ivy project too:

- http://ant.apache.org/ivy/
- http://ant.apache.org/ivy/m2comparison.html

Posted by Henry Story on June 25, 2008 at 04:21 AM CEST #

Talking about IDEs and the web, I had this idea of a Jini-based IDE back in 2001, see http://blogs.codehaus.org/people/vmassol/archives/000530_jini_ide_is_it_the_future.html

Maybe it's relevant to this discussion (which to be honest I haven't fully read...) or maybe not...

Posted by Vincent Massol on June 25, 2008 at 04:50 AM CEST #

Erling Wegger Linde on the baetle mailing list pointed out to the following maven to doap plugin:

http://maven.apache.org/plugins/maven-doap-plugin/

Henry

Posted by Henry Story on June 29, 2008 at 04:55 AM CEST #

"No need to download source code: it's on the web! You don't therefore need a local cache of it. Use what you need when you need it: the Just in Time principle."

Hmm maybe I don't get it, but if you do that you create single point of failure. HTTP cache should be on and the application must download locally \*only if files have changed\*. With popular Web resources, you create a huge traffic bottleneck and in the end something which is similar to a DOS.

See http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic

But maybe I didn't get what you were saying.

(ps: your comment form doesn't work if cookies are not on.)

Posted by Karl Dubost, w3c on July 15, 2008 at 07:24 PM CEST #

Thanks Karl for the link to the w3c excessive DTD traffic site, that is a very interesting story.

Of course I had no intention of suggesting that one should not cache information that one has downloaded, nor that one should not work carefully with the HTTP headers. That is where libraries such as RESTlets [1] are really important in simplifying the work of developers btw. as they make the semantics of the responses very easy to access.

My point is rather that there is no need to download all the source code for a project ahead of time, regardless of it being needed. Currently if I want to view a particular file in the Apache Ant project, I have to download ALL the source code for Ant. This is not a problem for the core code I am working on most often, but as often happens when looking for a bug, one ends up needing to step through some non core library. One may only do this once in the time it takes that library itself to be upgraded. So to have to go through the lengthy process of downloading all the source code of a repository to see a couple of files is clearly not making good use of the experience of the W3c tag nor of developer time.

But I can understand how what I say can be misleading. I'll fix it to say

"No need to download source code: it's on the web! You don't therefore need a local cache of code you don't need. Download what you need when you need it (and then cache it): the Just in Time principle."

PS. Thanks for noticing the comment and cookie problem. I'll pass that on to the Roller developers.

[1] http://www.restlet.org/

Posted by Henry Story on July 16, 2008 at 01:38 AM CEST #

I just started using maven, and found out that it can download the javadocs in one command with

$mvn dependency:resolve -Dclassifier=javadoc

I still feel that maven could be made to be a lot more flexible if it had taken a LinkedData approach, but I have to say that this is extreemly helpful.

Posted by Henry Story on January 26, 2009 at 04:15 AM CET #

Hello,
I just started a PHP web based IDE.

So I though it would be appropriate to post it here, considering the topic. :)

Anyway it is located @ www.phpanywhere.net

Try it out and let me know what you think.

Posted by Ivan on June 28, 2009 at 05:59 AM CEST #

Post a Comment:
Comments are closed for this entry.
About

bblfish

Search

Archives
« July 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
   
       
Today