From Hadoop practice to business based analysis

Source: Internet
Author: User
Keywords Algorithm as well as
Tags .mall allows users analysis based business continuous cost ctrip

The Open-source implementation of Hadoop, Google's 3 papers, allows developers to do massive data processing on commercial servers, which, while not suitable for real-time scenarios, allows users to do off-line data analysis at a lower cost, which is why when the dirty-handling framework is lined up, Hadoop is still widely used and the reason for discussion. This issue starts with Hadoop and opens the door to business based data analysis.

The first to share was the architect, Zhou, from the nine city technical department, who was in charge of Hadoop. He is also our CSDN high-performance computing Plate Hadoop plate master, the network name is the big wet, I believe that the students often login to the forum will have some understanding. The theme of today's Zhou is "Hadoop large data Analysis", which mainly shares a large data platform based on Hadoop in nine cities, while also sharing relevant specific cases.

Hadoop is a software framework that enables distributed processing of large amounts of data. Hadoop relies on low-end servers, so its cost is low and can be used by anyone, playing a key role in enterprise-wide data applications. At the same time, the reliability, efficiency and scalability of Hadoop are also a reason for our choice. Driven by the Hadoop engine, it is no longer a distant dream to harness large data easily.

Zhou shared a simple experiment with a team of nine cities, with a small machine and hadoop to do the operation comparison, with a small machine may cost more than million, but with some cheap machines, through deployment only need three hundred thousand or four hundred thousand to achieve the same operation as the small machine effect, while ensuring that storage is more reliable, which is also divide and conquer, HDFs wait for the model described later. Shortly thereafter Zhou shared the architecture of Hadoop and the Hadoop analysis tool. Now it's about Hadoop, about big data, and finally we're always going to go back to the analysis tool. This one he talked about Mahout,mahout is a very important analysis tool and predictive tool in the Hadoop system, and also shares the case of how to use these tools for data statistics in the nine City games.

Next Zhou shares nine city's product orientation recommendation function, what call the orientation recommendation? Directed recommendation is based on the user's behavior habits, behavioral characteristics, as well as some of the user's special attributes, or group class properties for users to recommend some users may need the system forecast, may need the goods. This is called the product orientation recommendation, this and General Electric Dealer's recommendation is different. Then Zhou shows how the nine city is to achieve orientation recommendation, Zhou said first need to establish a data matrix, through the real data ETL division latitude and longitude, using cosine similarity formula to calculate, to get a simple data. These are actually very simple, as long as there is a data model by hive to calculate can be achieved. But some of the more difficult data requires Hadoop to be implemented with two algorithms: USERCF and ITEMCF. USERCF and Itemcf are two of the oldest algorithms in collaborative filtering, and are widely used in top-n recommendations. The two algorithms are important because the two algorithms use the basic assumptions of two different recommendation systems. USERCF thinks that a person would like something that someone who has the same hobby likes, and Itemcf thinks a person likes something similar to what he used to like, and the two algorithms do have similar precision. So, these two algorithms are very complementary.

Finally Zhou introduced the Skynet of the nine city, he said Skynet from the technical aspect of the first role is the business framework, the second is the coprocessor distributed computing, in the case of small amount of data, you can do some real-time reports, from the business side Skynet have real-time data query, real-time analysis of reports, The role of online data monitoring and online data mining is the business structure that Skynet can do. At the same time Zhou has more wonderful content to share, please refer to the following PPT:

The next topic to be shared is the Zhou Haiyan, which is responsible for the capacity platform of the website Operation Center, and the theme sharing is "Ctrip Web Capacity Analysis method". This paper mainly introduces the capacity planning task of Ctrip, evaluates and predicts what resources the system needs and when more resources are needed by using current performance as baseline data. It includes the forecast of business volume based on the periodic seasonal exponential forecasting method, and the Web capacity prediction based on regression analysis.

First of all, Zhou Haiyan in terms of capacity and performance, she says performance optimization has no fixed time, no fixed plan, it is impossible to say that within one weeks, the application on this cluster is optimized, and the performance is reduced from 70% to 50%. But capacity is different, capacity is a cyclical and planned undertaking of the enterprise, including quarterly capacity purchases. All companies want to grow their business, business growth is to bring all the resources of the site, all need to grow, need to continue to expand, continuous procurement, such companies will have a quarter or six months, a procurement plan, is from the capacity analysis.

Zhou Haiyan concludes: The first step in the expansion analysis should be to understand the current infrastructure work situation. The second step is to develop capacity health standards. Third, continuous monitoring and acquisition capacity-related indicators, data sources of continuous monitoring and acquisition capacity-related indicators. The fourth step is to build your own capacity model. The fifth step is to establish the trend forecast model. The last step is repetitive, iterative, and calibrated capacity planning. So when you really want to do your plan, you will have the most accurate data. Some people ask what is the use of capacity planning for such capacity analysis? The most direct thing is not going down! Zhou Haiyan also shares capacity analysis indicators and capacity calculation formula, as well as Ctrip's own case of expansion, as well as Ctrip Ctrip automated capacity analysis platform, and so on, more content to share details can refer to the following PPT:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.