IDC China in October just released the "China Large data technology and services market 2012-2016 forecast and Analysis" report, which data show that China's large data technology and service market in the next 5 years of composite growth rate will reach 51.4%. The report noted that the Internet giants such as Taobao, Tencent, and Baidu were among the first to use large data technology, while the telecoms and banking sectors were also beginning to have a strong interest in large data technology and services.
Gartner also predicts that by 2015, large data will bring 4.4 million IT jobs globally, 960,000 of which will be in the Asia-Pacific region. Each large data-related position will generate three of non-it jobs, bringing the total number of jobs in the Asia-Pacific region to 400,000. Various data show that large data will not only affect people's daily life, but also a great opportunity for the whole it ecosystem.
Big Data in life
For ordinary users, although not necessarily concerned about the concept of large data is what, but some cases can be straight into life. Take just the past November 11, the day of the Cat mall created 19.1 billion sales legend, businesses have learned how to use the user's previous purchase records to push targeted goods. and Wal-Mart's "Diaper + beer" marketing saga is still being talked about. Diapers, beers, which look like two completely different products for the customer base, create surprisingly good results in Wal-Mart's sales record. The story is that if the wife asked the husband to go downstairs to buy diapers, the husband will generally follow his own needs and then incidentally to the two beers, so that beer and diapers are the most opportunities to buy together. The case is a secret from Wal-Mart's intelligent Information Analysis system, which has made Wal-Mart the "most knowledgeable consumer" retailer, and the story has become a classic case of big data affecting people's daily lives.
As the private and non-profit organization of the Department of Oncology at the University of Turin, the school's Cancer Institute has been working to understand the basic mechanisms of cancer and to provide the best diagnosis and treatment services for patients. For now, researchers are trying to make a lot of research data more meaningful. They have recently adopted a product called Genomecruzer produced by Kairos3d company. Driven by the Gilgamesh large data 3D visualization engine, Genomecruzer provides an environment that consolidates data patterns and relationships and enables the entire dataset to be seen and developed. Dr Enzo medico of the Cancer Institute said: "This magical tool has enhanced our utilization of large datasets and has gained unprecedented data analysis speed." This advantage of the tool has been confirmed. "This latest interactive 3D data visualization can advance the cancer research process and shorten the cancer treatment process," he said.
The above two examples are enough to make us aware of the benefits of large data. "In many vertical industries, data volumes are growing rapidly," said Derek Dicker, marketing vice president of the PMC Company's corporate storage division, not only in the retail and healthcare industries. such as oil, natural gas, climate modelling and forecasting, life sciences and other industries. "In the climate industry, for example, government agencies need to create and analyse large data sets in order to more accurately predict the intensity, direction and duration of climate change, which will affect the daily lives of millions of of people," the researchers said. In order to achieve the above case, a large data analysis platform is essential, and the emerging technologies for large data are beginning to emerge.
Aliyun's Big Data experience
With the heat of a large data concept, Hadoop, represented by an elephant, quickly swept the eyes of the IT community. This distributed computing model is almost universally acknowledged as a large data-processing model, followed by the IT giants joining the Hadoop camp. Although open source is a big trend, ESG China general manager Wang Cong also believes that users want to really use Hadoop to deploy their own projects, the first need is to hire a lot of talent to learn Hadoop, talent cost is high. At the same time, we can see that there are some large data processing platforms with independent intellectual property rights in China. Aliyun is one of the best.
Aliyun as a member of Alibaba, compared to Taobao, Alipay and other companies low-key many, but when it was created by their own distributed computing system into people's eyes, caused by the industry's huge repercussions. The business involved in Aliyun includes flexible computing, mass storage, large-scale data processing, search, maps, mailboxes, and so on. In the process of Aliyun platform development, the research and development process has also been blocked, the period has been questioned, but it is undeniable that Aliyun now has a huge user base, and in creating "data-centric open cloud computing service platform" on the road more and more smooth.
Ali Cloud Computing Co., Ltd. researcher Shiguirong in an interview, said that the current access to Aliyun registered users up to more than 1.7 million, the platform tenants have more than 5,000, directly or indirectly enjoy Aliyun platform services, end users can reach hundreds of millions of users. and Aliyun most value is "data". Combined with the current big data boom, Aliyun's goal is to provide powerful computing power to help users deal with the new era of oil-data.
Shiguirong introduced, after three years of development, Aliyun played a completely independent research and development of large-scale distributed computing system-flying. In addition to MapReduce, the system supports the broadest range of programming models. Moreover, Aliyun's technicians have realized data storage, elastic computation, search and so on on the same platform. And this function, in addition to Aliyun, the world only Google can do. And Aliyun's strategy, we can also read as Amazon+google and beyond, using Google-style technology to do Amazon mode of operation.
With the current Hadoop craze, perhaps readers will be confused and spend a long time to develop a large data platform, why not use Hadoop directly? Aliyun has his own opinion on this issue. Aliyun President Jian once said: "Hadoop is valuable for off-line data processing, but it cannot solve the problem of our company's public cloud computing services." Because we already have online cloud services that are far beyond Hadoop's capabilities, this is related to the company's positioning. Now, flying has well supported the Aliyun business, including large data processing, in this regard, flying has actually surpassed Hadoop. ”
Shiguirong also said that without a more professional Hadoop team to maintain, Hadoop's ability to rise would be limited. No matter how, after the experience of questioning, obstacles, flying success. After that, flying will be in the maintenance of Aliyun technical team, continue to expand their computing capacity, to provide a wider range of large data processing services.
Talking about the current big data problem, Shiguirong summarizes the following four experiences to share with readers:
The first is the construction of the cloud computing platform. For large data, the back-end processing capacity is the foundation, which is what Aliyun has been doing for the last three years. Take the search business for example, to the world's trillions of pages indexed all over again, relying solely on a machine to deal with is completely impossible to achieve. Therefore, for large data processing platform, how to thousands of, tens of thousands of machines into a cluster is the most important thing. This large-scale distributed computing system is the core of flying systems.
Shiguirong introduced, flying from the first line of code began to write their own, the current distributed computing model is also actively working with ISVs to be able to provide users in other industries a "autonomous" large data processing tools.
Second, Shiguirong that large data platforms require intelligent technology. He takes "propositional composition" for example, the large data platform can not be based on questions to find answers, but should be intelligent to provide users with valuable information. IBM's robot Watson, for example, was able to answer an unfixed question, and the backend needed a powerful analysis system. At present, in the field of artificial intelligence in-depth study, self-learning and lifelong learning have made some breakthrough progress, it is worthwhile to try.
Third, the cost problem. Shiguirong said that large data could not be a burning money project, so for users, cost is very important, so in the Aliyun cluster all adopt the inexpensive PC server. This is where big data echoes the cloud. Cloud computing can provide a flexible, low-cost platform for the processing of large data, which in turn contributes to the development of cloud computing.
All in all, for a powerful large data analysis platform, intelligent, flexible, cluster expansion capabilities are essential. But the most basic part, the underlying IT infrastructure must be strong enough to cope with so many applications on the top. Therefore, including equipment providers, chip manufacturers are also beginning to force in the large data field.
12 Next