Real-time web data analysis [reprinted]

Source: Internet
Author: User

Real Time web mostly refers to the rapid indexing of the massive data of micro-blogs headed by Twitter, and the real-time scrolling display of search results. Fast index: the competition is how often the index is updated, 5 minutes, 1 minute or even 10 seconds, such as any input query keyword, you can see the micro-blog messages published within one minute (the micro-blog messages are short enough to have the opportunity to quickly index ).

However, real-time analysis is not that easy. There are several types of real-time analysis:

  • Trend Analysis: changes in the number of times that the query keyword is referenced on different websites on a daily basis. Ubervu is doing well, such as Chile.
  • Popular Link Analysis: select the popular and important links in the association results for separate display. The links can be sorted by time or by importance. Oneriot is better at this, such as searching for Chile. The selection of popular links is generally based on the number of micro-blog messages such as Twitter.
  • Semantic Analysis:
    • Sentiment trend analysis: sentiment analysis or opinion mining. Big event is doing well, such as the Wang Xing pie chart on the left side of the Meituan web page, and the Liu Qian and Han pie charts in Han PK Liu Qian. Ubervu makes an emotional Analysis on every conversation and draws a trend curve of emotional changes.
    • Association Analysis ::
      • Other Search suggestion words associated with the query keyword
      • Related or similar topics or entities: Daylife and evri do a good job.

In addition to oneriot, the following two real time search engine websites are analyzed in Real Time:

I. Inspiration from ubervu

Ubervu claims to be real-time social media analytics and has some features.

1,

It is interesting to integrate the data under a keyword into a bunch of conversations. In fact, it is to find many popular links in the associated information, and then call the information entries that have been recommended and commented on the core linksN peopleDiscussed the story. The entire story is called conversation. In fact, it is a hot link selection.

In this way, noise in associated information can be removed, and more important information can be placed at the top. It is the same as oneriot.

2,

It can also analyze the emotional trend in each story, but it can be viewed only when it is paid as a member. The estimation is the overall trend of computing. I personally think this statistical method is very unreliable. I usually emphasize the calculation of sentiment trends for "Anchors". Otherwise, they will be non-invasive and easy to be distorted.

3,

It can show the proportion of different social sites in associated information, and draw different mentions frequency trend curves for different sites.

Not significant. But it is also an observation dimension.

 
Original Image

2. ellerdale's inspiration

Ellerdale trends processes massive volumes of data, Twitter, Wikipedia, and many other types of data sources. Ellerdale claims to be good at fast indexing of high-volume data feeds. One of the data sources is Twitter's firehose. Therefore, they developed a fault-tolerant distributed database.

It is not just a search engine. Its Semantic engine creates a topic database (which topics can be detected, and it even has such topics: united States Senate and United States Presidential Election 2008). Topics are classified into people, sports, films, and politics categories.

Like ubervu, it provides up-to-the-second analysis capabilities (real-time analysis based on search data) with almost no lag.

You will find that it has a common goal with ubervu: to better understand the information and opinions that people are sharing.

Topics

For topics automatically detected by a machine, such as Jessica Alba, the topics category list is displayed, and its page is listed in sequence:

  • Wiki entry explanation: provides an entry explanation. N wiki links are also listed, such as freebase and Wikipedia. In addition, the link type varies with characters in different fields. For example, Hollywood stars will list links to character entries on several well-known movie websites. For example, political figures Sarah Palin will list her Twitter account links. John McCain even gave a New York Times character link and two other official website links.
  • Other topics associated: such as Halle Berry, generally people and organizations in the same field.
  • Message stream: this is mainly a Twitter data stream. The disadvantage is that duplicate messages are not merged, or even duplicate messages sent by the same ID are not merged, which causes an ID to be flushed.
  • Message History: A curve trend chart.
  • Top articles: associated information. Indicates the rank level. And there are several mentions, which may refer to the number of times this link was mentioned on Twitter. It seems that there is a positive correlation between rank and mentions.

Classification channel list mode

Live trending lists top ten people in a certain field, and each ranking lists three topics, such as films.

A number is listed on the right side of each topic, such as 3,647 mph. It seems to be the number of associated articles or messages per hour.

Many semantic applications have performed topics aggregation, sorting, and analysis, such:

Daylife connection engine, such as woods, such as Johnny Depp;

Evri, such as Steve Jobs, such as iPad.

Zheng @ playpaper RT Beijing Report

References:

Rww's beyond Twitter search: Semantic Analysis of the real-time web;

Kosmix's Web 3.0 and semantic search;

Zheng's [semantic] sentiment analysis direction situation · 0908.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.