&
Written by |
Date |
Keywords |
Zheng |
2007-6-15 |
Meme Hotspot Tipping Point Techmeme |
September 2005, Techmeme predecessor Memeorandum turned out, at that time in North America was also a whirlwind, we put it and Slashdot, Digg these famous sites, and put forward the Slashdot effect similar memeorandum effect.
Techmeme this Gabe Rivera-dominated hot computing engine, the blog list of his own definition of real-time monitoring, through the search blog, news media URL link to tap the dialogue between bloggers, and in the form of dialogue on the first page, Become a very effective content filter, tells us what's hot and what's not.
This algorithm for linking mining hotspots does not work in China, for simple reasons:
Chinese blogs rarely embed in Bowen URL links .
For the same reason, Google's PageRank algorithm does not play much value on blogs.
In fact, since the second half of 2006, we have been secretly developing the content engine, where hotspot auto Discovery (hot point) covers the direction of Techmeme.
In the blink of an eye in 2007, recent media mentions Techmeme.
such as May 25, Sina translation Read/write Web article "The Famous Science and technology blog: Google News and Techmeme big battle."
"Slashdot's influence began to weaken after the first round of the dotcom bust," said the Economist's commercial Review, published by the Economic Watch in June 2007, which featured the online community. In recent years, the emerging techmeme began to replace the former status. ”
Alex Barnett, who has been named Microsoft's top ten hottest blogger, also published the article "How I find stuff I like" on May 23, saying Techmeme was one of his three content filters: "The three main methods I use the to find content I'll be interested in are:2. Techmeme-two or three times daily. Tells me what's hot and what's not. "
Intro
In January 2006, I wrote and published the memeengine discussion set i , two , three (click here to download the full PDF document). It was also noted that several people had announced in the media that they wanted to replicate the techmeme, but then there was no further context. Perhaps it is because Techmeme's link analysis algorithm simply can't move to China.
Always on the road
In March 2006, I started looking for the meme engine with Chinese characteristics and soon found that only text mining algorithms could do it.
The text of the blog content mining, there is still a big problem in China to solve. Blogs are much more complicated than news:
L Text style: Blog style is very different, often do not follow the cards to the card, unrestrained, far more than the standard writing of the news to be difficult to analyze.
L involved in the scope: blog Anything to talk about, big to state affairs, small to personal feelings, and even water accounts.
Information sources scattered: domestic large and small hundreds of BSP, millions of blogs published articles, it is difficult to collect them in the first time and quickly expand large-scale computing.
In September 2006, I worked with Dr. Zhang Junlinghan, a software institute of CAS, to create a network of games, aiming at the future general direction of information filters and human filters.
In October 2006, Dr. Zhang introduced the "Hot Spot auto discovery" algorithm. But this time, the algorithm is not very mature, in the non-event-driven, non-news-driven classification of poor performance, such as: the Internet, gender, in the news-driven star, social performance is good. This situation is not open to the outside.
After we have developed the "topic Cluster aggregation" and "topic time context" algorithm for the content engine, we go back and re-optimize the hotspot auto-discovery algorithm. This time the accuracy has risen to a new height, really can do:
From the crawler crawl to the output of various areas of the hotspot, the entire process without any manual, without editing the audit can be directly released to the ordinary user to see.