Thursday Sep 25, 2008

Want to Feel Like Royalty? Join an Open Source Project.

A Crown

Every open source project talks about how much they want your contributions. But do they really mean it? If you submit a patch, will they puke all over your work because they would have written it differently? Or because you indented your code with three spaces instead of four? Or just because you don't work for the right company?

Maybe. But not in most projects. I can guarantee that it won't happen in the area where I work (Project SocialSite). And I honestly think the same is true for most of Sun's other open source projects.

Why? Because these things shouldn't be Sun's open source projects. They should be open source projects in which Sun happens to be a very active participant. I think that most people at Sun understand and agree with that sentiment. So we'll bend over backwards to support outside contributions. Again, using SocialSite as an example, we would love to see any of the following coming from people who don't work for Sun:

  • Bug and RFE Reports
    • You just need a ID to submit a bug or RFE in our Issue Tracker
  • Code Submissions
  • Wiki Updates
    • Anyone with a ID can create and edit content on the SocialSite Wiki
  • Outreach
    • Mention us in a blog or a discussion forum--anywhere our project might be of interest

And when I say we'll bend over backwards to support you, I mean it. If your contribution could benefit from some changes, we'll work with you to make them. If you need more information before you can contribute, just ask and we'll provide it. Or if your goal is to become a commiter, we'll help you through the process.

One thing we can't do is suspend the rules. But the rules are simple and they serve a purpose. To become a committer, you first need to sign a Sun Contributor Agreement (SCA) and then submit a patch or two. That's pretty standard stuff in the world of open source. The SCA ensures that Sun has the legal rights to protect the project and its source code in court if necessary. And the patches don't have to be huge. They just need to be a positive change and demonstrate that you have a basic understanding of the project's code.

So please, put me to the test. Find something in SocialSite that you think could be better, and submit a patch. Or edit the Wiki. Or open a bug. And if we don't give you the support you need, let me know. It'll be my personal mission to find out why we failed and make sure it never happens again.

Monday May 12, 2008

Video: Project SocialSite In One Minute

As Dave noted a few days ago, we have a video that demonstrates Project SocialSite (by turning your MediaWiki instance into a social networking system in 1 minute, 8 seconds). I'm sure you'll all want to give the video a top rating (which helps us in a contest for the Enterprise 2.0 Conference). So I wanted to remind everyone that today is the deadline to submit your ratings. :)

Thursday May 24, 2007

Slynkr on JavaDB

The first question my boss asked after I got the Slynkr code released as a project: where can I file an RFE for it to install with one click using a JavaDB back-end? Then, as if on cue, the first outside message on the project's dev aliases followed suit: "What are your thoughts about other support for other databases (ie, MySQL and/or Postgres)?"

I may be slow, but I think I see the message. People might just want to use a database other than Oracle (which is what we used for our initial development of Slynkr). Well, guess what? You can do it.

It's not yet a one-click install process (sorry, Eduardo). And I personally haven't yet tried things out with MySQL or PostgreSQL (sorry, Nick). But I do now have instructions for running Slynkr using a JavaDB back-end (aka Apache Derby). If you're wanting to get your own instance of Slynkr up and running, this is currently your best bet. After all, Oracle is a nice database but it's a lot of overhead for just trying something out.

So please, give it a shot. If you run into problems, let me know. Or if you see ways to make things better, update the instructions (that's why they're on a Wiki, after all).

Oh, and by the way... I think that at least MySQL support should be pretty easy also. Here at Sun, we actually have an internal Slynkr instance which is using MySQL. I just don't have instructions for it (yet).

Monday May 21, 2007

Slynkr: Open Source System for Social News, Bookmarking, and Tagging

Slynkr Project Logo

Hello everybody out there reading I'm doing a (free) social news and bookmarking system (just a hobby, won't be big and professional like Digg and It's been brewing for quite a while now, but I just recently got Sun's okay to release it as an open source project.

I hope it isn't too presumptuous of me to base that intro on another project's announcement, but I think it illustrates what Slynkr is all about: sharing and building on each others' work. When you submit something to a social news or bookmarking system, you almost always start with a URL for content which someone other than you wrote. So you build on their work by adding some kind of summary or commentary and a few keywords (tags). Then others come along and build on your work by adding their own tags and voting on whether they think the underlying item is any good. The more people contribute, the better the system becomes.

Of course, that's very similar to how open source software works. The more people contribute, the better the code becomes. That's what we're hoping to gain by making Slynkr open source. It's a usable system today (as you can see at or, but it certainly has room for improvement. We'd love to have your help and ideas to make those improvements happen.

I certainly haven't created Slynkr by myself. Many people have chipped in with ideas and support. These people have been especially involved:

  • Lou Ordorica: original idea, look & feel, and other follow-up ideas and support
  • Jeff Shoup: co-developed the Slynkr code
  • Erik Larson: conceptual input
  • Todd Wichers: hosting support for our instance

I'd also like to thank the people behind sites like flickr,, and Digg. Their development and popularization of practices such as tagging and user-controlled voting were obviously major influences for us in the development of Slynkr.

Thursday Apr 26, 2007

Building "SDN Share"

If you read my entry from earlier today, you know that we've launched a new program called SDN Share. It's a place where developers can share technical information with other developers and, in the process, earn some nice rewards (Amazon gift certificates).

Well we've now received our first bit of outside feedback. It came from Alan McClellan, who posted this comment on our SDN Share blog:

This is a cool site. How did you build it? It it home grown or did you use purchased or open source discussion/forum software?

Thanks, Alan. I'd been looking for an excuse to talk about this. :)

We implemented SDN Share by skinning and slightly modifying an existing piece of software called Slynkr. It's a Java implementation of a web-based service which allows anyone to submit items and then lets anyone else tag, vote, and comment on them (collectively forming what's often called a Social News or Social Bookmarking service).

Going back to the original question of whether this is home-grown or open source software... It's both. Okay, that isn't quite true--but I think it's safe to say that it will be soon. We're well into the process of getting Slynkr released as open source software. So you might want to keep an eye out wherever great open source Java software is created.

Sharing "SDN Share"

Developers are changing. Every day, it seems that interacting with the outside world becomes a larger and larger part of the job. We interact via open source projects, mailing lists, forums, blogs, journals, and more.

As developers change, it's only natural that tech companies' developer programs also change. In our case, that means the Sun Developer Network (SDN). It's always been a great program, providing a ton of information to developers. But one place where it could improve is in getting more information from developers.

That's why we're starting a new branch of this program, called SDN Share. In short, it's a place where developers can share technical content with other developers. Have a snippet of code that solves a common problem? Share it. A script which lets you avoid mundane tasks? Share it. An article which takes the mystery out of some new technology? That's right: Share it.

Once a submission has been accepted, anyone can add to it with some sharing of their own. They can comment on it, tag it, and vote on whether they think it's great or needs some work. This makes the best stuff float to the top, the Java stuff clump together with other Java stuff, and the occasional error get pointed out and corrected. You know--standard Web 2.0 and Participation Age stuff. Kudos to the Diggs, del.icio.uses, and Wikipedias of the world (and others) who came before us in popularizing and evolving these ideas.

I almost forgot the best part--the rewards. You receive points when your submission is accepted and when people vote for it. These rewards can be exchanged for cash which is deposited into a communal account shared by all SDN Share members. Then someday when enough cash has accrued, we will fulfill the longstanding dream of buying the world a Coke.

JUST KIDDING. This touchy-feely sharing stuff has to end somewhere, right? The rewards are Amazon gift certificates, and they're yours to hoard or spend in any way you like. The "How it Works" page provides details on how you get rewards points, and the "Redeeming Points" page explains how you can use them.

So what are you waiting for? I know you have little tidbits lying around that make you a better developer. You might as well share them and earn some free stuff, right? Also, keep an eye on the SDN Share Blog for more information on the program.

Friday Mar 16, 2007

Product Quality Heatmaps

Read/WriteWeb has an interesting look at heatmap visualizations. In particular, they focus on Summize, a site specializing in product reviews.

Summize allows users to vote on the quality of products. What's new and interesting is how they present those voting results--with heatmaps. Here is an example:

The colored stripe is a heatmap showing what percentage of users think the iPod Nano is great (the green: 44%), what percentage think it's wretched (the red: 12%), and those in between (the orange, yellow, and yellowish green). One nice thing about this visualization is that it works well even when the heatmap image is small. So, for example, they use a scaled-down version of the stripe next to each item in search results.

The end result is a nice way to see and understand a lot of information packed into a small space--the very definition of a good visualization.

Tuesday Mar 06, 2007

Is NoFollow Misnamed or Not?

Conventional wisdom is that the rel="nofollow" mechanism is misnamed. As the current version of the NoFollow Wikipedia article says:

rel="nofollow" actually tells a search engine "Don't score this link" rather than "Don't follow this link." This differs from the meaning of nofollow as used within a robots meta tag, which does tell a search engine: "Do not follow any of the hyperlinks in the body of this document."

But... Recently Matt Cutts (a Google specialist in SEO issues) has contradicted that. Specifically, a forum participant asked:

...does nofollow really prevent Google from crawling a page?
And Matt responded:
...if a page would have been found anyway via other links, it doesn't prevent crawling of that page. But I believe that if the only link to a page is a nofollow link, Google won't follow that link to the destination page.

So he's saying that rel="nofollow" really does mean "don't follow" (at least to Google), and that the conventional wisdom (and Wikipedia article) are wrong?

Is that right? It'd be nice to have a definitive answer, given the "I believe" opening in Matt's statement.

Friday Mar 02, 2007

NoFollow Considered Harmful?

I've noticed a fair number of people recently calling the rel="nofollow" mechanism a failure and calling for its end. Loren Baker is one such voice, with a post called "13 Reasons Why NoFollow Tags Suck". Andy Beal is another, with a post entitled "Google’s Lasnik Wishes 'NoFollow Didn’t Exist'".

I'm on the opposite side of this argument. As I mentioned a while back, I think that web pages need even more control over the "voting intent" of hyperlinks. So instead of sending NoFollow to its grave, I'd like to see it extended (though probably with a new name and format, such as the Vote Links microformat).

I don't want to re-hash that discussion today. Instead, I want to examine the most prominent argument from the anti-NoFollow crowd: that it just doesn't work. Comment spam has increased in blogs since the time when NoFollow was introduced. Because of that, these people argue that NoFollow is an outright failure and isn't needed in the first place because any good blogger is vigilant in moderating comments.

Again, I disagree. Of course comment spam has increased. Blogging and spamming both have little barrier to entry and high growth. It was inevitable that comment spam would increase, even if the benefit to the spammer for each instance was reduced (which NoFollow ensures, by eliminating any PageRank bonus). But that growth alone doesn't mean that NoFollow is a failure. If a disease grows, do we assume that all related medical treatments and research are failures and should be stopped?

Comment spam would be even worse if the NoFollow mechanism didn't exist. Its practitioners would be multiplied because every shady marketing guide around would be touting "amazing benefits" of using blog comments to increase one's standing in Google.

Even if I'm wrong and NoFollow has done nothing to reduce comment spam, at least it has protected the quality of search results. Google isn't the only one with a vested interest in maintaining quality search results. We would all suffer if we had to go back to the "bad old days" of low-quality web search.

What about the idea that any good blog will have vigilantly moderated comments and make NoFollow irrelevant? Good moderation of blog comments is very important. But the argument that it can displace NoFollow assumes that blatant spam is the only threat. As I mentioned in my "Hyperlinks as Votes" entry, a PageRank-style system in part depends upon us each voting in our own "name" (URL). Without NoFollow, that system breaks down with hyperlinks coming from your URL which aren't spam but also aren't something you would intend to positively endorse.

Suppose I post a comment on your blog with a link back to an entry of my own which is completely relevant but disagrees with you at every turn. It isn't spam. And unless you're particularly thin-skinnned, you probably shouldn't exercise your moderation power to delete it. But should search engines interpret that link to be your positive vote for the quality or importance of my page? And even if you think it should, would you want that vote to be of the same strength as one given to something which you directly referenced in the body of your post?

It isn't time for NoFollow to go away. It's time for it to grow up into something more powerful and expressive.

Thursday Feb 15, 2007

Picturing the Summit of the Blogosphere

I love visualizations which turn complex information into a simple picture. This one, from Ben Fry shows how the fifty most popular blogs in the world exchanged hyperlinks over a ninety-day period. See his description for full details, but in short:

The first [image], used for an article titled Linkology, shows the connections between the top 50 blogs, based on data provided by Technorati. The colors depict the categorization: orange for technology, blue for politics, pink for gossip, and green for "other".

The intensity of the line is based on the direction of the link, so the lines are brightest at the link destinations. Because lower-ranked blogs are more likely to link to a higher ranked blog than vice-versa, the lefthand side of the image (the top ranked blogs) is brightest.

In other words, we can see that the most popular blogs (on the left of the picture) get the most incoming links because their ends of the lines are brighter. And the large number of lines overall shows us how frequently top blogs reference one another.

The guys at Table of Malcontents think that the picture demonstrates how "professional bloggers are, at best, symbiotic parasites" (because they use information from other blogs to fuel their own). I'm not sure that the image supports such a conclusion. To me, the picture isn't noteworthy for unearthing some surprising trend. The web is made up of hyperlinks, and the most popular sites receive more inbound links than do less popular sites. No shock there.

It's noteworthy just because it's a great picture and makes a trend very easy to see and understand.

Why is the Digg Community So Sensitive to Competition?

The Digg community is once again lashing out at a "shameless rip-off" site. This time their target is Yahoo, which has (in their own words) added "Digg-style voting" to their suggestion boards. There was a similar reaction months ago when re-launched itself as a voter-driven news portal.

Why do so many Digg users have a hair-trigger response against anyone who builds on Digg's ideas? Imitation is the sincerest form of flattery. Imitation with due credit (as Yahoo provided in their blog post) is pretty-well beyond reproach.

Yes, Yahoo and Netscape took ideas from Digg. That's the way the world works. We build off of each others' ideas. If you can't accept that, you need to strip naked and move to some deserted cave. Every technology and idea we use today is a derivative of something which came before it.

If you think the "good guys" of the tech world sprang up from great new ideas, you're wrong (at least partially). The ideas may have been great, but they were never entirely new. So before you launch another campaign against a "shameless rip-off" of Digg, consider going after:

  • Linux, which shamelessly stole the design of UNIX.
  • Apache, which shamelessly stole the idea of serving web pages from Tim Berners-Lee.
  • Firefox, which shamelessly stole the idea of a graphical web browser from NCSA Mosaic.
  • Digg itself, which shamelessly stole the concept of voting from the ancient Greeks.
  • The paranoid members of the Digg community, who shamelessly stole the ideas of intolerance and isolationism from countless ancient tribes.

Yes, they're ridiculous examples. It's a ridiculous discussion. Building on the ideas of others is a fact of life. It's also a fact of Web 2.0, and it's time for lagging members of the Digg community to accept it.

Friday Jan 26, 2007

A quick word about "A quick word about Googlebombs"

Google has just announced that they have tweaked their search algorithm in a way which "has begun minimizing the impact of many Googlebombs." I'm not sure whether I think that's a good thing or not. On one hand, susceptibility to any artificial manipulation of search results is probably bad. On the other hand, a little light-heartedness is one way that Google has always stood out as a company.

I have no such mixed feelings in looking at how Google announced this change, however. I think it's pathetic. Their blog entry essentially just says that the change is algorithmic and "very limited in scope and impact." Good intro, but how about some details?

Google Bombs worked in the first place because Google's search algorithm assumes that what people say when they link to a page can be used to better understand that page. That idea is an important piece in the search puzzle, and I'd like to understand how their new algorithm changes impact it. Presumably, being "very limited in scope and impact" means that they somehow detect and ignore "bad" context in some links (which match some Google Bomb profile) while still paying attention to "good" context in other links? Again, that sounds good (if my presumption is correct), but why not be more forthcoming with exactly what's being done? We all deserve to know if and how wording around hyperlinks impacts the target URL's status in search results.

I realize that Google is in a very competitive space. Keeping a lead over the likes of Microsoft and Yahoo (if you believe they're leading) requires that Google keep some technical secrets to itself. But the key word is some. There is value in allowing everyone to understand the basics of how a key service such as Google search works. Their core PageRank technology fundamentally depends on us all "voting" with our hyperlinks. And as I've mentioned before, I think that there is an obligation to allow its "electorate" to learn how to best use those votes. That can certainly be accomplished without giving away every detail of their technology. But I think it requires more detail than just telling us that something is algorithmic and low-impact.

Wednesday Jan 24, 2007

Antisocial URLs

Muhammad Saleem is talking about how bad URL structures can clash with social bookmarking services. Specifically, he notes that providing redundant URLs can lead to duplicate postings at sites such as Digg.

To address the situation, he advises that webmasters provide just one URL per page. That's nice in theory, but can be difficult in practice. Special needs often arise (in areas such as metrics tracking and personalization) which can best be met with varied URLs. Yes, there are a whole slew of ways to deal with such things without touching the URL. But there are also a whole slew of complicating factors (such as trying trying to monitor traffic originating outside the browser in RSS readers or emails). Sometimes one URL just isn't enough.

Fortunately, exposing multiple URLs doesn't have to mean sacrificing the idea that one of them is "primary." Just pick the primary URL and use a <link rel="bookmark" href="..." /> element to identify it (as described in the Wikipedia permalink page). Nice solution, isn't it? You get the best of both worlds--purity and pragmatism.

Unfortunately, most web sites don't include this element, and most tools don't understand it anyway. Why hasn't it gained more traction?

Monday Jan 22, 2007

Wikipedia Decides Its Outgoing Links Can't Be Trusted?

I find this sad. By adding the rel="nofollow" attribute to the outgoing links in all articles, the Wikipedia seems to be wavering in its trust of volunteers. Yes, link spam is a problem. And with its combination of high visibility and open authoring, the Wikipedia is a prime target. But why not deal with this problem the same way it deals with other inaccurate and abusive content? Count on the volunteer base to detect and correct issues quickly (and give the administrators tools to lock certain articles which are repeated targets).

Until yesterday, that's exactly how the English-language Wikipedia dealt with link spam. But now the project has thrown up a white flag and said that its volunteers and tools aren't adequate to police the situation. Instead, the equivalent of martial law has been declared and everyone suffers.

The Wikipedia is the closest thing we have to a collective and collaborative voice in describing our world. When an external URL is referenced in a Wikipedia article, it must pass the editorial "litmus test" of all Wikipedians watching that article (who will presumably have high interest and expertise in the subject). With the blanket inclusion of the nofollow attribute on these links, search engines such as Google will no longer use these links as part of their determination of which URLs are most important. So we end up with slightly poorer search results and one less way to register our "votes" for improving them. Sad.

On the bright side, the original announcement does note that "better heuristic and manual flagging tools for URLs would of course be super." Presumably, this means that when such tools are made available, the blanket application of nofollow will be removed. Let's hope that happens. Soon.

Thursday Jan 18, 2007

Hyperlinks as Votes: Time for a PageRank Tune-up?

Treat the hyperlinks in web pages as "votes" for other web pages. Then use a feedback loop so that pages which receive more votes from others have their own votes become more powerful. That's how the PageRank algorithm pushes the best pages to the top of Google search results. Twelve years after Larry Page and Sergey Brin published the initial description of PageRank, Google says it still serves as the core of its technology.

So if hyperlinks are votes, how do we make sure the electorate uses their power wisely?

For one, we need to ensure that people only vote in their own name. Not so long ago, that ideal was effectively violated by blog spam. Automated programs would comb the web looking for any blog where they could post hyperlinks to the likes of Viagara sales. Successfully adding such a hyperlink on a well-known blog would result in a strong PageRank "vote" for the spammer's page. So in effect, the spammer was voting in the blog owner's name (and hijacking his PageRank strength).

This issue was largely fixed in 2005, when Google announced that it would start interpreting a rel="nofollow" hyperlink attribute as a request for exclusion from PageRank calculations. Blog spam can still be a problem, but since most blogging software now adds the rel="nofollow" attribute to hyperlinks in comments, it won't benefit spammers' PageRank standings.

But is just being able to mark a hyperlink as a "non vote" enough? Wouldn't it be nice to have even more control, such as specifying which hyperlinks are positive votes for the referenced page and which are negative votes? That's what some of the Technorati folks are aiming to allow with the Vote Links microformat. It proposes rev="vote-for", rev="vote-abstain", and rev="vote-against" attributes to allow page authors to express their voting intents for each hyperlink.

Still, is even that enough? I wonder why there is no effort to allow authors to control the relative strength of their votes. The Vote Links FAQ has an entry covering this, saying:

Q: Why only for and against? How about something more nuanced?

A: The point of this is to provide a strong yes/no response. Finer-grained measures of agreement don't make much sense on an individual basis; aggregating many votes is more interesting. For example, consider how eBay's user rating system has been reduced to a like/dislike switch by users. The 'Ayes, Noes, abstentions' model has served well in politics and committees, when a division is called for.

I'm not satisfied with this answer. The "interesting" aggregation of simple votes which they mention will sometimes be housed within a single page. For example, thousands of people may give a particular URL a positive response at Digg, but it still just shows up as one hyperlink. The same could be said for other sites with significant user input (such as YouTube, Slashdot, or their own example: eBay).

Obviously, no page should be able to artificially inflate the importance of its own hyperlink votes (e.g. rel="I_represent_1_million_votes--honest"). But why not allow pages to determine the portion of their fixed PageRank contribution which is passed along to each of its hyperlinks? So a Digg page, for example, might choose to give 10% of its PageRank voting value to an item getting 2000 Diggs and only 2% to another item which got just 200 Diggs. Search engines could then benefit from the internal ranking systems of sites (such as digg) without having to understand their internal details. And we could all benefit from a more finely-tuned hyperlink democracy.




« July 2016