In recent years, Weibo has become the most fashionable Internet application. It is not only an emerging product of internet development, but also an application branch of social platform which is richer and refined. Since 2009 Sina launched the domestic first micro-blogging platform, micro-bo in the domestic development is springing up, all over the north and south.
2010 ushered in the domestic micro-bo development of the spring, Sina Tencent and other portals are launched micro-blog business. Since last year, the number of microblogging users has been growing rapidly. Sina's first quarterly report, published in May this year, shows that the number of Sina Weibo users has risen to 324 million. And another microblogging giant-Tencent Weibo, also showed a rapid development posture, Weibo registered users also exceeded the 300 million mark.
The rapid development of Weibo has deep reasons. On the one hand, the content of micro-blog is composed of simple words, the user's technical and writing skills requirements are lower, and in the language of the Organization, there is no blog so high. On the other hand, with the promotion of Weibo, operators open APIs enable users to update and track microblogging content in real time through various terminals and system platforms such as mobile phones, tablets, PCs, etc.
In addition, the most important reason is the large population base in China, the number of netizens in China, the number of users of micro-blog is also large, state information updates frequently, information dissemination quickly. According to the China Internet Information Center, as of the end of December 2011, the number of Chinese Internet users exceeded 500 million, to 513 million. Many netizens behind, not only help expand the micro-blogging user group size, but also for the vast number of businesses and operators from Weibo to obtain business opportunities brought convenience.
Micro-broad data: big business opportunities Big Trouble
Intuitive, convenient and efficient communication and forwarding mode, micro-blogging operators to explore the potential power of business opportunities. Each Weibo registered user is both a user and a consumer. CNNMoney, the US financial website, has written that every user in Facebook can contribute 1.21 of dollars per quarter, and in this era of microblogging, who has grabbed the initiative of Weibo, who will stand out in the fierce competition.
As users increase, microblogs will be commercialized gradually. Its core is to provide users with value-added services, the use of advertising pages to attract fans to interact with the brand and products to promote, to help micro-blogging operators to achieve profitability; on the other hand, there are many professional data mining and analysis institutions at home and abroad, using micro-blogging platform to collect large amounts of data, the microblog users of the comments and interests of analysis Mining business value from Weibo "Big data".
However, because of Weibo users, with the continuous increase of the content and complexity of the microblog, it is very challenging for any data mining enterprise or micro-blogging operator to realize efficient and fast mining valuable information from the content of mass micro-blogging, and extract the decision analysis data with commercial value from it.
Yang Weihua, chief architect of Sina's microblogging platform, said on the one hand, micro-blogging operators need to provide efficient, reliable, stable microblogging platform to support the growing number of microblogging users and micro-blog content, especially audio and video, such as the high volume of unstructured data to bring the demand, on the other hand, to meet the Open, Easy-to-use and support customization, can easily expand the data mining platform, make full use of existing hardware platform to support efficient and flexible data mining and sharing applications.
Construction of data mining platform with fine "core"
Many of the challenges facing Weibo have also witnessed the common dilemma of large data applications. Micro-blogging operators need to build support for the growing demand for user access, and provide an open, customizable API for operators and third parties to achieve micro-BO data value mining to lay the foundation.
Yang Weihua, the chief architect of Sina's microblogging platform, has said that the spikes in the number of unexpected events will pose serious challenges for microblogging operators. "[In addition] we have to focus on how to build a high-performance architecture." Yang Weihua said. The essence of these problems is that the architecture needs to consider the problem of high traffic, easy scalability, low latency, high availability and distributed distribution. Sina Weibo has billions of external Web pages and API access requirements every day. High performance systems have the characteristics of low latency and high real-time. The core value of Weibo is to achieve high real time, while the core of real-time is to keep the data close to the CPU as far as possible to avoid disk IO problems.
Sina Research and Development platform, senior director Dongjian also told reporters, now Sina Weibo server group, at the peak of the night, to accept more than 1 million response requests per second, the pressure is huge. Sina is also constantly looking for more powerful servers to meet their needs. To this end, Sina Weibo from the launch from the beginning of the establishment of a broad partnership with Intel. Thanks to its unique advantages, the Intel Xeon Platform delivers significant performance benefits, instant response to millions of access requests and microblogging Message Queuing processing. On the basis of this, on the one hand, x86 architecture can provide more cost-effective solution, can adapt to and meet the initial launch of Sina Weibo has not been profitable, and support the continued development of micro-blog and business expansion; On the other hand, the open architecture helps to promote and open the API for Sina Weibo, Let more third party rely on microblogging platform to develop micro-BO data mining applications. Its openness is also reflected in the better compatibility and support of micro-blogging program code optimization to meet higher resource integration and performance requirements.
Sina Weibo platform of the wind and cloud, micro-data, micro-report, as well as the third party micro-BO data mining, is based on micro-blog content of massive data mining and value extraction of the typical application. Based on Intel architecture, Sina also attaches particular importance to software-level large data solutions.
According to Yang Weihua's introduction, Sina Weibo mainly uses 2 kinds of methods to deal with massive data, which is traditional relational database and NoSQL. In the relational database, the data can be dispersed at most servers through sharding, and the hot microblogging content or key words in different time periods are sharding by time slicing. For example, for micro-Boges words or micro-blog account influence according to a certain number of rules, provide ranking, also can support micro-blogging users to identify their own influence and understand the current hot topics. NoSQL is a non relational database and a hbase module in the Hadoop framework, which can build a solution for micro-Bohai data. For the audio and video, voting rankings and other unstructured data, can be based on the micro-blog data by industry category for mining, analysis and processing, and the results of the process to form a micro-report to guide the operation of the work. The open API Sina Weibo can also provide an external interface for third parties to develop richer micro-blogging data mining applications. In the near future, Sina Weibo will upgrade the system to achieve a large data complete solution directly using the Intel Hadoop release, which is capable of achieving the best possible support for the existing architecture and maximizing performance.
Intel's Hadoop release, a series of optimizations tailored to the Intel architecture platform, can achieve exponentially higher performance than non-Intel release Hadoop to achieve or close to real-time results, while ensuring better stability. Intel Hadoop Manager 2.0 can help administrators simplify the deployment and management of Hadoop and improve efficiency. These, let the already deployed Intel hardware platform Sina Weibo to see hope, fine "core" built soft hard integrated data mining platform, for the open API to provide third party more micro-BO data mining to provide better support.
Summary:
Big data is both an opportunity and a challenge. As the largest microblogging platform in the country, Sina Weibo, while responding to the challenges posed by growing microblogging users and data content, also needs to take special advantage of the enormous commercial value of Weibo. The Intel platform based infrastructure and the Intel Hadoop release distributed processing system can help provide a reliable, efficient and easily scalable microblogging platform. In the realization of Sina Weibo through micro-bo data mining, to meet the micro-blog users personalized application experience, but also to meet the third party mining micro-BO data value for enterprises to provide decision-making reference needs.
(editor: Heritage)