Tuesday Mar 06, 2007

Is NoFollow Misnamed or Not?

Conventional wisdom is that the rel="nofollow" mechanism is misnamed. As the current version of the NoFollow Wikipedia article says:

rel="nofollow" actually tells a search engine "Don't score this link" rather than "Don't follow this link." This differs from the meaning of nofollow as used within a robots meta tag, which does tell a search engine: "Do not follow any of the hyperlinks in the body of this document."

But... Recently Matt Cutts (a Google specialist in SEO issues) has contradicted that. Specifically, a forum participant asked:

...does nofollow really prevent Google from crawling a page?
And Matt responded:
...if a page would have been found anyway via other links, it doesn't prevent crawling of that page. But I believe that if the only link to a page is a nofollow link, Google won't follow that link to the destination page.

So he's saying that rel="nofollow" really does mean "don't follow" (at least to Google), and that the conventional wisdom (and Wikipedia article) are wrong?

Is that right? It'd be nice to have a definitive answer, given the "I believe" opening in Matt's statement.

Friday Mar 02, 2007

NoFollow Considered Harmful?

I've noticed a fair number of people recently calling the rel="nofollow" mechanism a failure and calling for its end. Loren Baker is one such voice, with a post called "13 Reasons Why NoFollow Tags Suck". Andy Beal is another, with a post entitled "Google’s Lasnik Wishes 'NoFollow Didn’t Exist'".

I'm on the opposite side of this argument. As I mentioned a while back, I think that web pages need even more control over the "voting intent" of hyperlinks. So instead of sending NoFollow to its grave, I'd like to see it extended (though probably with a new name and format, such as the Vote Links microformat).

I don't want to re-hash that discussion today. Instead, I want to examine the most prominent argument from the anti-NoFollow crowd: that it just doesn't work. Comment spam has increased in blogs since the time when NoFollow was introduced. Because of that, these people argue that NoFollow is an outright failure and isn't needed in the first place because any good blogger is vigilant in moderating comments.

Again, I disagree. Of course comment spam has increased. Blogging and spamming both have little barrier to entry and high growth. It was inevitable that comment spam would increase, even if the benefit to the spammer for each instance was reduced (which NoFollow ensures, by eliminating any PageRank bonus). But that growth alone doesn't mean that NoFollow is a failure. If a disease grows, do we assume that all related medical treatments and research are failures and should be stopped?

Comment spam would be even worse if the NoFollow mechanism didn't exist. Its practitioners would be multiplied because every shady marketing guide around would be touting "amazing benefits" of using blog comments to increase one's standing in Google.

Even if I'm wrong and NoFollow has done nothing to reduce comment spam, at least it has protected the quality of search results. Google isn't the only one with a vested interest in maintaining quality search results. We would all suffer if we had to go back to the "bad old days" of low-quality web search.

What about the idea that any good blog will have vigilantly moderated comments and make NoFollow irrelevant? Good moderation of blog comments is very important. But the argument that it can displace NoFollow assumes that blatant spam is the only threat. As I mentioned in my "Hyperlinks as Votes" entry, a PageRank-style system in part depends upon us each voting in our own "name" (URL). Without NoFollow, that system breaks down with hyperlinks coming from your URL which aren't spam but also aren't something you would intend to positively endorse.

Suppose I post a comment on your blog with a link back to an entry of my own which is completely relevant but disagrees with you at every turn. It isn't spam. And unless you're particularly thin-skinnned, you probably shouldn't exercise your moderation power to delete it. But should search engines interpret that link to be your positive vote for the quality or importance of my page? And even if you think it should, would you want that vote to be of the same strength as one given to something which you directly referenced in the body of your post?

It isn't time for NoFollow to go away. It's time for it to grow up into something more powerful and expressive.

Monday Jan 22, 2007

Wikipedia Decides Its Outgoing Links Can't Be Trusted?

I find this sad. By adding the rel="nofollow" attribute to the outgoing links in all articles, the Wikipedia seems to be wavering in its trust of volunteers. Yes, link spam is a problem. And with its combination of high visibility and open authoring, the Wikipedia is a prime target. But why not deal with this problem the same way it deals with other inaccurate and abusive content? Count on the volunteer base to detect and correct issues quickly (and give the administrators tools to lock certain articles which are repeated targets).

Until yesterday, that's exactly how the English-language Wikipedia dealt with link spam. But now the project has thrown up a white flag and said that its volunteers and tools aren't adequate to police the situation. Instead, the equivalent of martial law has been declared and everyone suffers.

The Wikipedia is the closest thing we have to a collective and collaborative voice in describing our world. When an external URL is referenced in a Wikipedia article, it must pass the editorial "litmus test" of all Wikipedians watching that article (who will presumably have high interest and expertise in the subject). With the blanket inclusion of the nofollow attribute on these links, search engines such as Google will no longer use these links as part of their determination of which URLs are most important. So we end up with slightly poorer search results and one less way to register our "votes" for improving them. Sad.

On the bright side, the original announcement does note that "better heuristic and manual flagging tools for URLs would of course be super." Presumably, this means that when such tools are made available, the blanket application of nofollow will be removed. Let's hope that happens. Soon.

Thursday Jan 18, 2007

Hyperlinks as Votes: Time for a PageRank Tune-up?

Treat the hyperlinks in web pages as "votes" for other web pages. Then use a feedback loop so that pages which receive more votes from others have their own votes become more powerful. That's how the PageRank algorithm pushes the best pages to the top of Google search results. Twelve years after Larry Page and Sergey Brin published the initial description of PageRank, Google says it still serves as the core of its technology.

So if hyperlinks are votes, how do we make sure the electorate uses their power wisely?

For one, we need to ensure that people only vote in their own name. Not so long ago, that ideal was effectively violated by blog spam. Automated programs would comb the web looking for any blog where they could post hyperlinks to the likes of Viagara sales. Successfully adding such a hyperlink on a well-known blog would result in a strong PageRank "vote" for the spammer's page. So in effect, the spammer was voting in the blog owner's name (and hijacking his PageRank strength).

This issue was largely fixed in 2005, when Google announced that it would start interpreting a rel="nofollow" hyperlink attribute as a request for exclusion from PageRank calculations. Blog spam can still be a problem, but since most blogging software now adds the rel="nofollow" attribute to hyperlinks in comments, it won't benefit spammers' PageRank standings.

But is just being able to mark a hyperlink as a "non vote" enough? Wouldn't it be nice to have even more control, such as specifying which hyperlinks are positive votes for the referenced page and which are negative votes? That's what some of the Technorati folks are aiming to allow with the Vote Links microformat. It proposes rev="vote-for", rev="vote-abstain", and rev="vote-against" attributes to allow page authors to express their voting intents for each hyperlink.

Still, is even that enough? I wonder why there is no effort to allow authors to control the relative strength of their votes. The Vote Links FAQ has an entry covering this, saying:

Q: Why only for and against? How about something more nuanced?

A: The point of this is to provide a strong yes/no response. Finer-grained measures of agreement don't make much sense on an individual basis; aggregating many votes is more interesting. For example, consider how eBay's user rating system has been reduced to a like/dislike switch by users. The 'Ayes, Noes, abstentions' model has served well in politics and committees, when a division is called for.

I'm not satisfied with this answer. The "interesting" aggregation of simple votes which they mention will sometimes be housed within a single page. For example, thousands of people may give a particular URL a positive response at Digg, but it still just shows up as one hyperlink. The same could be said for other sites with significant user input (such as YouTube, Slashdot, or their own example: eBay).

Obviously, no page should be able to artificially inflate the importance of its own hyperlink votes (e.g. rel="I_represent_1_million_votes--honest"). But why not allow pages to determine the portion of their fixed PageRank contribution which is passed along to each of its hyperlinks? So a Digg page, for example, might choose to give 10% of its PageRank voting value to an item getting 2000 Diggs and only 2% to another item which got just 200 Diggs. Search engines could then benefit from the internal ranking systems of sites (such as digg) without having to understand their internal details. And we could all benefit from a more finely-tuned hyperlink democracy.




« July 2016