Friday, June 12, 2009

Topsy - a tweet based real time search engine

Today I'd *like* to talk about a new web 2.0 technology website, it just feels like the right time to do it (plus this is an assignment for my KM course due Monday).

go2web20.net is the place 2go2 for all the recent cool web 2.0 sites. This is where i found Topsy.
Topsy is a search engine that's entirely based on citations of hyperlinks embedded within twitter messages by using the Twitter search API. Big words; let's try to break it down a bit.

Google's search results are ranked mainly according to the number of links pointing to a page, meaning how many other pages have links to this page. The most linked pages that correspond to a search term are the top results. Google sees the web as a network of documents. It's web crawlers index the WWW, which is expensive. That's why it might take Google a while to show rather recent content on it's search results (this is also why it took about a week after creating this blog for it to be found in Google searching for the term "simplenotsimplistic". Still not bitter.)

So it can sometimes be problematic to find recent blog posts on Google (although the more popular blogs are indexed quite frequently). Recently, with the explosion of Twitter, the amount of real time information being published has been increased significantly and it seems this is too much to handle by Google, Yahoo! search, Bing and the other big link based ranking search engines. This is where Topsy comes in.

Topsy sees the web as a stream of conversations. As Michael Arrington describes it, Topsy makes use of the 30+ million twitter users as "an army of little content-finding machines". When searching for a term in Topsy, the sorting of the results are based on the links that were most discussed in Twitter, on who posted the tweets (some authors are more influential than others) and by the other content in the tweet (other than the hyperlink).

Search results can be sorted by their age (similar to the you tube search) by using the bar on the left refining the search to results from "all time" (September 2008 being the oldest results), month, week, day and even hour.
The more influential Twitter authors for the search terms are listed on the bar at the right.

As an example, I searched for the term "Sacha Baron Cohen" on Google and on Topsy. The results from Google (see image) are clearly not real-time. The first result is the Wikipedia page, followed by Cohen's imdb page and a rolling-stone article from 2006.
The Topsy results are different (see image above): the two first results is the trailer of Cohen's new movie (Bruno), followed by Bruno's Facebook page and a news article from June 3rd 2009.

Topsy is a great source of up-to-date information and knowledge. Another Michael Arrington quote sums it up: "For me, someone who’s obsessed with news and stuff that’s happening right now, Twitter search is about 25% of my total Internet searches. The ratio keeps going up over time."
Topsy has a clear advantage over the "traditional (?)" search engines when you need to know what's going on NOW.

Seeya,
Ben

0 comments:

Post a Comment