A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
November 2013 22-23rd, as the only large-scale industry event dedicated to the sharing of Hadoop technology and applications, the 2013 Hadoop China Technology Summit (Chinese Hadoop Summit 2013) was held at four points by Sheraton Beijing Group Hotel. Nearly thousands of CIOs, CTO, architects, IT managers, consultants, engineers, enthusiasts for Hadoop technology, and it vendors and technologists engaged in Hadoop research and promotion will be involved in a range of industries from home and abroad.
Wang, director of mobile Internet Product Development Division of China Unicom Institute, introduced the typical application of Hadoop and large data in the industry.
▲ Wang, director of mobile Internet Product Development Division, China Unicom Research Institute
Director Wang mainly from four aspects introduced the application of large data: the first source, second, telecom operators have what large data, the third, China Unicom built in the use of large data business system, four, the prospect of large data applications to give a few simple examples.
We are entering the era of mobile internet, almost everyone has a mobile phone, now in the mobile phone, more personal computer, the work done in addition to doing some basic voice and text messaging functions, the vast majority of the work on the phone is the use of data flow, mobile communications from the era of voice across to the era of data, Operators have great opportunities, while operators are encountering a lot of traffic consumption disputes.
The current flow consumption dispute has jumped to become the first complaint of user communications services. The first problem is that data flow consumption is much less transparent than voice consumption. Voice consumption when dialing a phone, who is the other, how long to play, this time can be perceived. Operators can also voice a detailed list of calls, if it is to send text messages, how many text messages are generally well aware of.
Flow consumption is the first billing unit is KB, traffic consumption has a certain degree of uncertainty. Just used a mobile phone to brush a little bo, used a while micro-letter, exactly how much traffic, he did not know exactly how to charge. So many users based on this understanding, may be a lot of time subjectively think that they do not use traffic, or the use of relatively small traffic, why sometimes there will be a relatively high traffic costs, when the user operators to tell me, this traffic where to go? On what web site, With what application generated what flow, rather than simply say this month with 1G or 700 trillion traffic, the traditional way has not met the needs of users now.
Now 3G customer data traffic disputes accounted for the 3G business complaint is 10%, now the whole proportion is gradually rising. Individual provinces have reached 20% per cent. At present, China Unicom hit 10010 of customer service traffic on the complaint is nearly million. At the same time many users are also based on the operator can not provide a detailed list of online records, the legal action. For example, users of an iphone contract program, he is in the early morning to four o'clock sleep during a huge amount of traffic, the smartphone may be the application of voice, there are many automatic update applications, these applications are not used to generate traffic, in this case the user is difficult to understand. The operator's metering equipment could not provide a detailed list of lawsuits filed. Operator's metering equipment is the same as the home of the water meter, now is not to distinguish between cooking, flushing toilets, washing and how much to take. If it is to provide users with detailed list, we need to do accurate metering equipment to do the flow of the distinction.
The original operator how to provide detailed list, is mainly produced in the Web page equipment, GGSN, before the way to produce a single flow of traffic accumulated to a certain limit, or to achieve a certain length of time, or is now the network has been shut down, this time is the flow of the list, which is mainly operators to do the cost, Not to the user to explain the situation. The information contained in it may have mobile phone number, page traffic is how much, the next page traffic is how much, or have the duration of the bill, but does not contain information on the Web site and access to records of information.
In this case, China Unicom's mobile business, there is a customer service department of Statistics, the revenue per million due to be unable to provide online record details of the data, resulting in complaints and refund payment is 60 yuan. GGSN is not only used by China Unicom, from Ericsson, Huawei, ZTE, Nokia are in use, this sophisticated equipment, the probability of deviation is very small, the vast majority of the compensation is the operator said not clear, users have complaints, in order to avoid the expansion of disputes, operators are used to compensate the settlement of the way to deal with.
This shows that providing users with online records of detailed list, become the Internet transparent health environment, the key factor, this is what operators want to do.
Internet records are typical big data
For example, each user, may be the monthly call record is hundreds of, thousands of, the Internet record is not this order of magnitude, may be tens of thousands of, the use of a large number of possible hundreds of thousands of internet data. For example, a mobile phone to visit the homepage of Sina is roughly generated more than 20 records, including mobile phone launch, DS query, including the Web page of each element of the download, in fact, the network is a separate request this will produce a record. If you use the ipad, Sina's home page will produce 40 records, if you look at the ipad news, come over will produce 180 records.
For example, access to Taobao touch pad will also produce 6 records, in addition to a large number of background push message, the equivalent of Apple's mobile phone has a lot of notification Services, such as the micro-letter, a lot of notice of the service industry in the quiet.
After the statistics, China Unicom users online record is more than 2 trillion per month, and is still growing. Data volume is the country's current operators of all types of billing list of more than 30 times times, including voice details, short message details, the adoption of a detailed list and including the previous operators to the flow records detailed list, all the data volume of more than 30 times times.
Mobile Internet is a fast development period, about 8 months flow will double, the end of this year, 4G license will be issued, in the era of LTE, the user's traffic will be more and more consumption, is now 2 trillion, next year this time is 5 trillion, then perhaps 8 trillion, the data is huge.
Internet data is a typical big data
What is the way to store and retrieve it is a big problem, previously, the architecture of the carrier is IUE architecture, with IBM minicomputer, with a commercial relational database, with high reliability of EMC storage, build both the billing system and the account system, many systems are built in this way is very expensive, But it doesn't solve our problems. Store such a large amount of data, beyond the manageable capacity of the online. When making queries, relational databases have a severe decline in performance during large-scale operations.
The amount of data reached at 500G may be 3,000 seconds, means that 2 trillion records of data, staging, the table to save up to 500G users have a query request means one hours to user response, even if the speed of the optimization query is more than half an hour, the audit company has done experiments, Often a query is a few hours to query the user's detailed list.
The problem we face is fast data writes, there are 2 trillion records per month, more than 70 billion records a day, such a large amount of data can be stored quickly, that record in a steady stream of generation, we must keep enough speed to record, the second our data how to quickly search to provide users, At what time on what web site with how much traffic. The data recorded on the internet itself is a high value data, it is so far may be the user in the mobile Internet behavior, one of the most basic, the most raw data, how the data for efficient analysis and mining. Such a large amount of data, how to carry out low-cost storage, are the problems faced at that time.
Hadoop can help us solve these problems
Hadoop uses an Open-source approach that builds on a common PC server, discards high-end storage, guarantees high reliability, is suitable for fast data writing, and has fast retrieval, which is equivalent to 1 billion of business requirements that can solve problems, and Hadoop helps us This is the first intimate contact we have with Hadoop out of the lab concept, the real business system.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service