Tianya, the most famous Chinese community in the world, has been carrying the "colorful" of Chinese society in the form of forum. In the current era of information dissemination without Borders, Tianya is undoubtedly a strong position of public opinion, but also a brand of public opinion, how many major social events, how many times the transmission of social love, how many back to the fiery social topic of discussion, I think has been countless, They give us a clear sense of Tianya's position and influence in China's internet world.
Along with this status and influence comes the huge amount of traffic, and the long accumulated data, which is obviously a huge challenge for Tianya IT systems and the overall architecture. At present, large data as one of the most eye-catching topics in the IT field, is the end of the world is also using it to deal with their own hosting the access pressure? In this respect, I recently interviewed Tianya cloud computing and large data department director Wang Qingpo, on related topics in-depth discussion.
The horizon of Big data
Tianya was founded in 1999, is China's first batch of Internet enterprises, so far has been developed 13 years, currently registered users 72 million, daily average UV over 10 million, daily average PV reached 100 million. Wang Qingpo, the company is now moving towards social networks, the tourism industry, and wireless internet and other fields, to increase efforts to develop.
However, these new development directions certainly need some new application development and technology deployment, for Tianya it, also means more needs and challenges. Wang Qingpo said, "The entire it needs of Tianya, and many high-speed development of the Internet companies are similar to a wide range of applications, on-line update frequency." Because of the need to quickly try business innovation and product innovation, we have a faster and more flexible demand for it, we need the underlying it technology to better serve the products and services. And that's one of the keys to how to make good use of the big data that's been accumulating.
As a typical forum to start the Community platform, a long time the data are unstructured, Tianya users are very active production of data volume is very large, so has been faced with data processing, the pressure of mining. Wang Qingpo said, "We really started to do this thing from 2009 onwards, the company in order to provide better, more intelligent information navigation and user data analysis, gradually put a lot of effort to do some data mining work." ”
In terms of data volume, Tianya as an established Internet enterprise, has over billion users covered, nearly tens of billions of forum main posts and reply to information. "The end of the Earth every day over billion user access behavior, is an important part of the Tianya large data, but also our large data work of a basic data source." ”
Tianya large data mainly consists of three parts: the first is registered users of the database, the second is the user generated daily data, such as posts, replies, upload pictures and so on, the third is user behavior data, forum log data.
Wang Qingpo stressed, "If there is no data, talk about large data is relatively empty, that is, it does not have a real foothold, no large-scale data all kinds of work can not carry out, and the end of the earth has a huge amount of data." These massive data, for us is a cornucopia, but also not fully exploited resources, we will do a lot of data analysis and excavation. ”
So, from the perspective of the horizon cloud computing and large data, Wang Qingpo that they are in line with the development of the new technology and ideas, Tianya now the IT equipment, is already close to 2000 (including all servers, storage, network and other hardware equipment), and by 2015, the expected number of equipment will increase to 5000 units, They will become the future of the IT infrastructure, which will build cloud computing architecture, there is a lot of resources to invest in large data. "Obviously we have a strong demand for cloud computing and big data," he said. "Because we already have a lot of data, we need to use this data well and serve our users better," Wang Qingpo said. ”
Then through the large data analysis, the end of the earth will get much higher value? To this, Wang Qingpo explained, through to the Tianya of these data analysis, first we can better understand the Tianya user group, in the domestic society, they belong to the grassroots elite stratum. Second, we based on this analysis, the entire forum content preparation and organization, there will be a great adjustment, we in the forum, will be targeted at the user group characteristics of the content of the launch. In addition, we will according to each user, or according to the user's own some behavioral trajectory, to understand what kind of content they like to do accurate recommendation and screening. ”
However, the analysis and use of data is not in fact proposed today, a long time ago, "massive data" of the reference, so today's it circle of "big data is a gimmick" controversy. In Wang Qingpo's view, "Big data" still has its new place.
"I think essentially, it's an evolution of a quantitative change: the previous analysis of tens of millions of levels of data, and a database can be a good analysis, especially structured data." Wang Qingpo said, "but the big data we see now, the first is very large, such as Tianya forum data is tens of billions of such data volume, such a scale is the original data of two orders of magnitude." This can lead to technical differences and differences in the results of the analysis. The second is that the depth of our analysis is very different from the original, we need to get more in-depth analysis results. ”
"At present we will be very accurate analysis of what a person has a hobby," Wang Qingpo for the analysis of the depth of the emphasis on the interpretation, "we may have only analyzed a certain type of people's hobbies, but now can be more accurate to the individual, the original data analysis we do more like statistics, such as our Tianya users average age is 28 years old , now we have to be more precise, the person he likes is the stock, finance, fashion, and he is concerned about the type of fashion clothing, to give him a number of labels, labeled he is a fashion leader or a follower. These require more accurate data analysis and behavioral analysis, and this precise analysis requires a very different technique than it would have been, and there are many different ways to process it. ”
Provide a better experience with Hadoop
Starting from 2009, the end of the Earth began to pay attention to the application of large data, the industry related to large data technology has also entered the horizon, the end of the Earth in 2010 began to carry out the corresponding work, at that time to a variety of technology paths have been tried, including enterprise-class solutions, and finally chose Hadoop We've tried the enterprise-class data warehouse or some technology like data processing, and finally we find that none of them can be well satisfied with our need for data size, "Wang Qingpo said." Later, we started to try Hadoop, and in 2010 we started using Hadoop. "The reason, Wang Qing believes that the first is that Hadoop is an open platform, but also in foreign countries have a successful experience, increased the confidence of the world, and through trial testing and analysis, we found that it is really able to meet the most of the needs of the world platform."
At the beginning of the 2010, the end of the Tianya Hadoop cluster, the use of the official version of Apache, the beginning of only less than 10 server size, after more than two years of development, today's Tianya Hadoop cluster has dozens of server size. The type of server node is determined by the business requirements of Tianya, there is a clear standard in the selection of server, Wang Qingpo said, "We are more emphasis on computing power, then the second is the data io ability." ”
The Hadoop cluster deployed in 2010 was based on the Intel processor platform (4 core models), each node with 8-16GB memory, 2U models to ensure larger storage expansion space, each datanode with 4-8TB local storage. In the latter part of the procurement, also have to calculate the ability to choose the standard, in the future will be added more Xeon E5 series platform. It can be said that excellent computing performance is the Tianya choice of Intel platform server to build Hadoop cluster root cause.
At present, Tianya Hadoop cluster's main application is to do the end of the user behavior analysis and data analysis, and the previous talk of the cloud platform combined, for the Tianya business innovation to provide a solid foundation.
"In fact, cloud computing and large data are the technical areas of our focus, we are also in the field of practice, we feel that these two technologies, will give the future development of the Tianya, especially the technology-driven development, to provide a strong backing, will be the future of our innovative business development, play a very good help. "Our Future IT systems, with the environment of cloud computing and big data, will be very flexible, reliable and efficient, making our product development cycle shorter and our business innovation binding faster," Wang Qingpo said. "With the help of the Hadoop cluster, Tianya can quickly and efficiently analyze user behavior, so as to provide users with rationalization, even instant customization services, which will undoubtedly greatly enhance the user experience of the Tianya."
and when it comes to the combination of future clouds and big data, Wang Qingpo to the current trend of virtualization Hadoop clusters, "If cloud computing happens, then virtualization will be widely adopted, if you want to provide this analysis and computing services, it must be based on a virtualized environment", he said, " So with the development of cloud computing and large data, the combination of the two is an inevitable path. Because you have to do data analysis, you need to compute the resources, the means of implementation is usually virtualized, so the two when the demand is strong enough, will certainly be merged together. "However, this also has higher requirements for the virtualization capabilities of the system platform." At the end of the interview, Wang Qingpo about his experience and advice to friends who were ready to use Hadoop. "The first thing to do is to be business-oriented and not be able to use a Hadoop data-processing environment because you want to pursue new technologies." You should first evaluate your business volume and processing requirements, decide whether to use Hadoop and how to use it based on your business. "Second, if you've chosen to use Hadoop, but you're not very familiar with it, it's recommended that you start with some basic features and do some of the core business, rather than being entangled in Hadoop's entire ecosystem, the entire tool chain of Hadoop is still relatively long, But from the most basic functions, it will solve 80% of the business needs. In the case of surplus force, then explore some more advanced functions.