Saturday Aug 29, 2009

Are blogs losing their infuence?

Richard Morgan sent me this article, "Are Blogs Losing Their Authority To The Statusphere?" dated March 10th 2009, which argues that while blog authority ranking according to Technorati remains fairly static, the scores of the various blogs are declining. Technorati uses an inlist scoring algorithm which may be part of the problem, but it would seem to me that micro-blogging is impacting the strength of the voice of blogs as a communications tool, which is what the article argued. In some way's not just micro-blogging, but the various places where people can and do record what they do and think. When I started this blog 5 years ago, I chose to restrain what I put here but other media have grown in popularity, and so people's ability to express themselves have grown. There is a diversification of publication sites which makes following people harder, although technorati only set out to capture blogs, not people, blogs seem no longer to be at the centre of how the internet records what people think. I know that I have been writing less frequently.

Internet messaging is built on a growing distributed architecture, consisting of

  • publication,

  • distribution,

  • aggregation and

  • consumption.

Different sites and technologies seek to perform and excel in different parts of the chain. The aggregation stage permits people to view people, if they permit it, or subject matter and most importantly control their own entry points to the mess that is today's content, by which I mean choose to follow people of look for specific expertise. I think that authors should seek to co-operate with this consumer control of the reading process. It should be noted that the behaviour of individuals and corporations will differ. In particular most media companies want to capture the reader/viewer but individuals have no need to copy this behaviour. I try to post content and let people find it; I hope I have developed a reputation for expertise in some subjects over my career.

By keeping the architecture in mind, one can try and avoid annoying your readers, who, if one has any, are likely to be your friends. Bad habits I see are people who syndicate their tweets into facebook, so I, and others, get to know about their breakfast twice, and I am not a fan of syndicating one's feed into blogs using the APIs. This latter habit annoys me because I don't see the blog as an aggregation tool, but a publishing tool, and so I expect original work, of some description in people's blogs. This can be even worse when people then publicise the blog, containing bookmarks using a micro blog. That's three clicks to read something written by someone other than the person who's views you've subscribed to, and if using a wireless device that's a real pain. NB This is also true if you subscribe to Digg feeds, you get to 'interesting' content via the Digg page, so three clicks, three tabs or windows to read content you want. Another offence which I wish I could deal with more easily is the microblogging incontinent. The only way I have discovered how to deal with those, is to unsubscribe.

One can, and I do aggregate my feeds into one place. I originally created a personal planet, which aggregates some of the feeds I create. I have tried to create an everything feed at, which also has a nice key of the feeds I contribute to. This means that my readers can construct a feed that interests them. I know that some friends are interested in the technology, but not the politics. I commit the offence of subscribing my friend feed to face book, but I consider Facebook to be mainly a consumer. I need to think about this. Its not great, but I don't syndicate my tweets directly to my face book statuses (sic), nor do I copy them back into friend feed. Manging my facebook feed is not easy and is compounded by Facebook's desire to perform all roles in the architecture while being 'open'. Its this open-ness which has enable site specialisation around, for instance, travel, books, restaurants and even at living social, iphone apps.

I suppose I am appealing for people to consider what tools they use to perform a specific role in the the personal content architecture. Don't over aggregate, if people are interested in your thoughts they'll find them. Don't shove it down their throats.

When I first considered writing this little essay, it seemed interesting to consider, “Is the status-sphere replacing blogs?”, others including Tim Bray have written about this since and argue Not. I hope that the evolution of easy micro-blogging, will free blogs to become deeper and more interesting. I know that I have produced less frequent blog article since I took up with Twitter, but I also considered my feed, to be a microblog of sorts. Another key development is that the use of sites like has turned in-list search ranking from a vote of web authors, where you needed the technology skills and resource to have a web page in order to influence the sort order, into a vote of web readers and authors. The ease of micro blog adoption means that an even large crowd should now be participating in the construction of in list search orders. I am unsure how url shortner's impact the search engines in-list calculations. They make it harder, I 'm sure, as does the fact there's more than one. Many argue that Twitter's best value is as a search engine and that it, and other micro-blogging systems won't replace blogs because there are too many subjects that can't be accurately discussed in 140, characters. Techcrunch published further thoughts on twitter, and it chances of supplanting the blogs, however it takes less time to tweet, rubbish gets lost easier, and twitter in particular is designed to be used by handheld devices. (I don't think I'd like to have written this on my new Nokia 5800, or even my ipodtouch.) It should be noted, that while its very easy to create a 140 character message, it should be easy to create a podcast or even a video, but its not. They are both difficult to create, especially if you don't just record a chat amongst friends but try and 'deliver/perform' a report. This is a skills issue. They wouldn't pay Steven Fry all that money to make audio books if it was easy.

One final thought is that communitarian aggregation is not well done at the moment. One of the strength's of Peter Reiser's approach, 'Community Equity', to knowledge management systems that at its heart is a personal rating engine. (See also They don't yet have this as truly n-dimensional, which I think is needed, so you can rate your own content, rate other author's expertise, rate & describe their interest to you. I may play with a Google App or Zembly, to experiment with some of how to make some of this work. A very rich inter-personal network with sophisitcated popular and machine calculated relevance scoring is something that can add value. Content could flow through your colleagues votes moving closer and further away from your viewing window and your friends and colleagues advice and hints would influence or determine what you find. Google reader's share facility is quite close, but there's only one dimension, you can have friends, and they can recommend stuff for you to read. (I think more can be done.)

I hint that one of Technorati's problems is its reliance on in-list. Searching the Workplace Web, written by Fagin, Kumar and McCurley, which I commented on, on this blog in an article called, The Shape of Internet, write about a number of relevance and ordering tests that could be used and specifically argue that within the corporate intranet other sorts and relevance tests may be more appropriate to help solve a number of questions such as authority. They also argue that intranet URL naming is less search friendly and it is clear that the dissipation of people's voice and advice over multiple sites with different naming conventions, often using surrogates or numbers and URL shortners means that the internet is catching up on the early intranet in the chaos of name space. It may be time to review in-list and begin to weight votes for relevance and sort-order.

Are blogs losing their relevance, maybe, maybe not. Well written opinions by disinterested experts will always be valued. As the dross moves to the microblogs, this may liberate the blogs to re-establish themselves as clear voices of expertise. Some of what was observed Richard's post to me may be failures in Technorati, its initial insights are aged and its being out innovated.


Sunday Feb 22, 2009

Searching europa, is there a limit to Google

Just some times I come across a piece of research which my search engines find hard to help me with. Since Google, they all seem to use in-list based sorting algorithms. Some resources, such as the EU's web complex don't seem to have enough sites pointing at it for this to be a wisdom of crowds solution and their own search engine doesn't seem to help me either. You'd think that the various News organisation feeds that specialise might issue permalink based pointers but querying the EU site remains hard.

A while ago, I reviewed , the research white paper, Searching the Workplace Web in my blog artice The shape of the Internet..., which argued that inlist based ranking is not necessarily the best sort order of an intranet query. Certainly the Europa site seems to have many of the properties of an intranet identifed by IBM research team. Is this true of all Government sites? Do they have to be their own in-list?

Are there any search engines that might do better?

tags: :

Monday Mar 31, 2008

The socio/economic impact

The rest of the morning was taken over with a panel presentation, which focused on the socio/economic impact of the changing internet. The first speaker was Andy Wyckoff from the OECD who spoke of a number of economic issues reinforcing the link between creativity and wealth creation. In fact the OECD are running a ministerial conference, see, which has had massive and unexpected support from the OECD's member and candidate members. He also emphasised the need for openness & interoperability. He also argued that smarter interfaces will be needed to truly create an internet of people, and that is required before further evolutions will occur.

Led by Geert Lovink of Institute of Network Cultures, the panel explored the question of paying for creativity given the marginal cost to copy is zero. Will it be possible to implement a form of micro payments?

Another issue raised was the duopoly of the search engines. It was argued that it is necessary to have a diversity of search engines, and that fortunately, the smaller players are staying in the market and continuing to innovate. Search will remain the "killer app" of the internet, but where is the "only people are experts" dimension. Will the next evolution be people finders?. They may become more important than resource finders, and is a dimension of the NESSI problem. How will you find services, in a world of billions, with hundreds of thousands joing each day. (Obviously thats the vision, not today's reality).

Dag Johansen asked if can we build a 'push' search engine, and that its very important to protect one's privacy. He (and others argued) that many internet users are prepared to trade some of their privacy for free services and resources. In terms of his privacy, he deliberately uses multiple search engines to hide from those that wnat to know about everything he does, he also stated that he doesn't think Google is good enough to justify exclusive use. I am moving towards this behaviour and often use exalead which tries to use semantic technology to improve the search quality. Another thought this raised in my mind is that {english} schools are once again pretty poor, they're teaching how to use apps, not the internet, and so while todays children are being taught in class how to use Word to write a letter, they are missing how to protect your privacy and use firewalls and spam filters. Actually it would seem they are teaching how to circumvent poorly configured content filters. (Don't ban Google images for the UK & USA, if you leave Ireland, India and Australia available.)

Diogo Vasconcelos from Cisco came up with the following insight, "People like politics, with politicians it depends", he also raised the issue of sustainability. Some of his visions had a real 'Minority Report' touch. A question was raised suggesting that, sometimes selling you stuff you thought you didn't want is good. But how much more than Amazon recommendations do we need? This did remind me of the minority report scene where the shop recognises Anderton (Tom Cruise) via an eyeball scan. Diaogo repeated the idea that the EU is the most connected place in the world? I wonder if its true. I find connecting in the States when traveling easier, the network and wi-fi seems much more pervasive, although I often have to pay. You can see elsewhere in this blog for my views on Italy and Brussels. My recent travels have confused me and I can't make up my mind whether to buy a wi-fi or 3G connected hand held appliance. I hope that I will be allowed to trial a new vodafone commercial solution, or maybe I'll check out BT Fon, which reminds me, I really need to sort my household content subscriptions. It just never stops.

The morning was finished up with a presentation on internet governance, and the need to address bureaucratic degeneracy and market failure. See also, which is a United Nations body.

tags: ""

Wednesday Jan 24, 2007

The shape of the internet, inside and outside the corporate firewall

I have been discussing the efficacy of our internal search tools and how hard it is to find stuff, and to be honest, assumed that it was the crapness that most users accuse their IT colleagues of. However a colleague, Bernard Horan recommended that I read "Searching the Workplace Web", which suggests a different answer.

Searching the Workplace Web argues that intranet's are different from the internet and that more flexible, and different search algorithms are required to search an intranet; the most successful internet search algorithms are not necessarily going to work well on an intranet.

The author's made four observations.

The first assumption is that content in the intranet is often created for the purposes of dissemination of an authorised opinion or fact, or a statement of policy. There is no design intent to attract readership. One observation on which this is based is the fact that often content in the intranet is very light of additional hyperlinks to suggest further reading or to quote sources. Why suggest further reading when your authoring policy? There is no further reading to be done. Why quote a source when the answer is "because my boss said so!" For example, check out your companies expenses or travel policy. The authors argue that a corollary of this is that "in list" based algorithms such as PageRank may be less effective on intranet searches. Interestingly, I ran this past Chris Gerhard, who said that he'd been looking for the original text of the Road Traffic Act, but that Google (an in-list based search engine) had difficulty finding it as it preferred commentaries on the law, because they were more referenced by web page authors.

These two examples take us to assumption two. In the intranet, we're often not looking for the "Wisdom of Crowds"; there are often very small result sets for a given query, often the correct result set is only one entry. This will occur when you are looking for a policy, or an officer's represented opinion. Will expenses pay this journey cost? Is this a supported configuration? It occurs when the researcher is looking for authority not opinion.

Observation three, is that there is (likely) to be less spam inside the firewall. (I wonder if this is an aging observation, with the growth of blogs and the opening of mail archives to search, it may be that this is weakening in strength, but its unlikely (but not unheard of) that large porn collections will be found be accident in an intranet search). The corollary to this observation is that some ranking algorithms that are unsafe on the Internet, become useful inside the firewall.

Observation four is that Intranets are less friendly to search. The authors observe that much content is held inside databases, or document servers, portals, directories and other specialised interfaces

While reading Benkler's "Wealth of Networks", I first came across the concept of, a shape of the internet. Obviously we all know that some sites are very influential and highly read, but the internet's hyperlinks have a topography that can be described and measured using graph theory. This was as far as I can tell first explored by Broder, Kumar, Maghoul and others in their paper "Graph Structure in the Web". These topographies were discovered during the ascent of the dynamic search engine, which won out to the detriment of the directory based references. These two papers are contemporaries and it'd be interesting to see if these topographies remain useful as insight today.

IBM discovered their intranet topology was different to the Internet, with a smaller "core" and a larger periphery. The core is a bunch of sites that meet formal graph theory definition as strongly connected. (See Graph Structure in the Web). The size of the OUT segment, pages that can be reached from the core, but do not return is larger than in the internet, and is much exacerbated by domino document repositories. The periphery is also much larger than in the internet, they can be found from the crawl seed pages (which must be in the IN segment) but not from the core. They measured the frequency distribution of the probability that an in-list based sort algorithm would place on a page's relevance and discovered a difference in shape between the intranet and internet results, with a lower proportion of high scoring pages in the intranet.

Another interesting innovation was that the research team created three indices (most solutions used only one) for determining relevance, these were content, title and anchor text. (Anchor text is the text between the anchor tags, and thus chosen by the author to represent the link in the original document). They then build a flexible ranking engine that had a number of input parameters. (I might write about this another day, but if you want more now go to the original document).

It's three years later and its almost certain that with the changes in user content authoring tools and the fact that there is more spam and more opinion, that the topology will have changed. The improved content creation tools represented by blogs and wikis also weakens the assumption that intranet content has low link counts. Sun is very permissive about blogs, as are as far as I can tell IBM, but the introduction of blog and wiki technology has both strengthened and weakened the firewall and hence the boundary to the intranet. Company staff are better informed and make better judgements whether to publish their material internally or publicly and can do so more easily both politically and technically, but the fact that sometimes/often the authoritative statement by a colleague is on a public blog, means that intranet search needs to pass through the firewall and "join" intranet and internet resources.

One very obvious example, illustrating the difference in intranet content can be discovered by examining tag clouds. If one were to compare my tag cloud with my internal delirious tag cloud, there are huge differences, I have no picture gallery inside Sun, none of my food, gardening and culture bookmarks are stored, internally I have a bunch of "How to" and "Do not do", repository links and applications home pages. (Let's face it a lot of the Technical documentation is on the internet now, and the secret R&D stuff I don't get to see anyway!) Also the clouds have very different shape, partly because I have over 1200 bookmarks on my public site and considerably less on internally. Tag clouds may also be another way of overcoming some of the four observations and corollaries

In order to "see" my true tag cloud, I need to "add" my private and public bookmark lists, which I organise using delirious & It needs a form of federation.

It would be hoped that tags might be part of the answer, but the different shape of the intranet, may make the development of disciminating tags very hard. My experience at the momement is that htis is true. I had given up, but I have been inspired to have another go.

Sites like Digg and where user generated content creates huge numbers of hyperlinks because of the number of users, will also distort the shape of the internet; as they become part of the core , the size of the "out degree" segment will become larger. Hyperlinks become the votes of web readers, not authors. Although its possible that since these sites are designed to be read by people, that there will be a more limited reference to them on other sites and they will remain part of the "In" segment. The XML feed services will however be referenced by many sites, and the linkroll gadgets mean that they are referenced..

So intranet search queries require a different approach to internet search, but is it getting closer or travelling in different directions.


Sunday Nov 05, 2006

Finding stuff I said last year!

Since the introduction of tags, I have slightly re-organised this blog site.

 It now has two new features, Yesterday's Words and About Me. These features are available through the small font menu bar above the category list. Yesterday's Words is an archive feature, allowing you to look for things in this blog by Category, Tags, Publication Month, Keyword Search and review titles of the last six months article titles. I have done this because we now have tags, and people ought to be able to see the tag cloud, and I have come to the conlusion that the front page side bar was begining to be hard to use. In order to improve the ease of use I expect to move my feed to another back page, together with some of the bookmarks I have stored. At the moment it remains on the front page and on Yesterday's Words.

Both these new pages have tuned and smaller sidebars. I have done this by utilising the roller #includePage() macro and hold the banner (with the duck & licence) and the sidebars as seperate files. The content for Yesterday's Words is also held in an external file. This should all make updating the site a lot easier, and allow me to move from HTML tables to CSS at some time soon.

I hope to introduce a reading list page about the books I am or am planning to, or have just read.

The navigation bars at the top (and side) offer a page called site search. This is safe checkpointed version of Yesterday's Words; I created it to permit a "roll forward from" point in case I made any silly and drastic mistakes. I will delete it some time soon, so I recommend that you don't bookmark it.


Monday Aug 28, 2006

Semantic footprints through time

Another thought (about tags & prompting, see also here... or here...) is that prompting (& categories) inhibits evolution. Either the ideas being explored change, or the common use of words chnage. An example on this blog is that I should probably create a travel category on this blog rather than treat it as a qualifier of Culture, but because I am prompted, I have never bothered.

On the other hand, I also want you to able to find all the articles in a natural series, so I value the stability but its means that you have to learn about my use of the catagories. The search engine and article list on the sidebar are also category context sensitive which is good.

(I really need to see the coming Roller tags implementation, to see if tags and categories really are interchangable).


Friday Jun 02, 2006

Searching for........

We then popped into see two search projects. I saw the display of the Search inside the Music project, which is currently focusing on two areas: using acoustic similarity to help people find music that 'sounds similar' to music that they already like, and using social data to recommend and organize music based upon the listening habits of people with similar musical tastes. Some deeply interesting science (how do you define music as sounds alike?), plus leveraging a "wisdom of crowds (or networks). The 3D screen display is pretty cute too. (I need to have another look! )

I actually spent my time here talking to Steve Green, whose blog I occasionally read, but always with interest. He was demonstrating some technology from the Advanced Search Tecnologies project. The web page states that the mission of the Advanced Search Technologies project is to improve the ability of people to find and organize information in an enterprise setting and that the group is responsible for the Sun Labs Search Enginewhich ships as part of the Sun Java System Portal Server and Web Server. The demo showed a tool called the blurbalyzer which recommends (or sorts and groups) books based upon similarities in the book's published 'blurbs'. It's amazing the complexity of problem hidden in the single word, in this case "similarity".


Tuesday May 30, 2006

Visualising tag clouds

Two points come to mind on reflecting my conversation with Elias Torres at www20006, (See Tags and Spontaneity below...). First maybe on tag clouds we should use colour for highly used & less frequently used with Red being highly used and Blue (or Indigo) less frequently used. This should mean that the less frequently used tags, which are the most discriminatory (i.e. meaningful) are not visually eclipsed by the most heavily used.

Secondly, if we were to look at the structure "UNIX > Solaris > AIX, the significance would only be true if all taggers tagged the articles as UNIX & Solaris or UNIX & AIX, and only then could we be clear on the meaningfulness of a tag. This illustrates that a stranger may need to be familar with the crowd's use of language.

Thirdly I'm also not sure how we might 'refine' a query if we start with very meaningful tags, we would have to re-query, although the interface offers you a list of associated tags even for the smallest of queries.





« July 2016