Talking about Alibaba Big Data: Data + platform

Source: Internet
Author: User
Tags big data data processing data development data platform data plus

VI) Data + platform

On January 20, 2016, Alibaba Cloud announced the opening of Alibaba's ten-year big data capability at the 2016 Shanghai Summit of Yunqi Conference, and released the world's first one-stop big data platform "Data plus".

This platform carries the ideal of Alibaba Cloud "Pu Hui Big Data", which allows any company and individual in the world to use big data. The first batch of digital add-on platforms has released 20 products, covering the data production, computing engine, data processing, data analysis, machine learning, data applications and other data production chain.

"This is an era when everyone is talking about big data, but only a very small number of people are using big data." Xu Changliang, senior director of Alibaba Cloud Data Division, stressed that "these technologies are at least three years ahead of the industry" and are exporting themselves. At the same time as data capabilities, "data plus" is also open to teams with data development capabilities. These teams can be stationed in the "Data plus" and provide data services to various industries with the help of several tools. "It's like opening a store in Taobao, but they are selling professional skills."


What is the number of additions? We analyze them from the following aspects:


1) Data + 

I think Alibaba pays very much attention to the data and is very willing to invest.

As early as the establishment of Alibaba Cloud, it should be around 06 and 2007. Seven public companies formed a data platform department, which is the predecessor of the Data Division (CDO). In the cloud, Data Cube, Taobao Time Machine, Taobao Index, TCIF, Alimama DMP, Panorama Insight, etc. are all from this team, this team specializes in solving Taobao's early data warehouse, data mart, and data analysis related professional issues.

In 2009, Wang Jian came to Alibaba and talked about the vision of cloud computing and big data in the future. However, few people could understand it at the time. However, I think Ma is still very powerful. He believes. Then, Alibaba proposed a cloud computing, big data related strategy, and Alibaba Cloud was established at that time.

The Data Platform Business Unit first used not the MaxCompute (original ODPS) currently in use, but Hadoop. The original Hadoop cluster was named Cloud Ladder 1. At that time, Alibaba was also developing its own computing platform, which was the original ODPS, and named it. Cloud Ladder 2.

At the beginning of the Cloud Ladder 2, it was not very useful, but it got a big customer inside, which is Ant Financial's loan service. Basically, it can be said that without the merging of ants and small loans, it is basically difficult to have the current MaxCompute.

The Cloud Ladder and the Cloud Ladder 2 have been quarreling for a long time inside. Later, due to various considerations, the company decided to start the moon landing project and move from the Cloud Ladder 1 to the Cloud Ladder 2.

Whether it is the Cloud Ladder 1, or the Cloud Ladder 2, in fact, it is only a small piece of the entire big data technology ecosystem, that is, the calculation engine. As I said above, one belongs to the Hadoop ecosystem, and one belongs to Alibaba Cloud. The number of ecosystems built.

The Cloud Ladder 1 was also widely used internally. All internal data processing and data applications were basically based on the Cloud Ladder 1. Moreover, the Cloud Ladder 1 successfully expanded the scale of the single cluster to 5,000 through the 5K project. The ecosystem of the Cloud Ladder 2 was slowly built up, including the underlying computing platform, development tools/components, computing engines/services based on their respective algorithms, and the top-level data applications/products that were moved to the ladder at the Cloud Ladder 1. In the process of 2, these tools, engines, and applications are slowly improved and unified.

Of course, the entire system of Alibaba Cloud, including the internal system, is applied to the technology of the data plus. This is also the habit of Alibaba, providing internal use and verification to the society. The advantage of this is that compared with big data companies that simply make products, there are scenarios, needs, and maturity.

 

2) The composition of the data plus platform ecosystem

 

2-1) Data plus platform ecosystem

Personally feel that the following several levels can be used to describe the entire data plus ecosystem:

Data plus bottom technology platform mainly include:

Maxcompute (formerly known as ODPS) is the underlying computational engine for the "data plus". There are two dimensions to see the performance of this computing engine:

  1. 6 hours to process 100PB data, equivalent to 100 million HD movies. 

  2. The single cluster scale exceeds 10,000 units and supports multi-cluster joint computing.

Analytic DB is a real-time multidimensional analysis engine that can achieve tens of billions of multidimensional queries in just 100 milliseconds. Many online big data queries for Alibaba's products for a large number of Internet users depend heavily on Analytic DB.

StreamCompute has the characteristics of low latency and high performance. The query rate per second can reach 10 million levels, and the average daily processing of trillions of messages and petabyte level data.

On top of the calculation engine, "Data plus" provides the most abundant cloud data development kit, and developers can complete data processing in one stop. These products include: data integration, data development, scheduling systems, data management, operation and maintenance video, data quality, task monitoring.

Overall, the advantages of the big data development kit include: support for more than 100 people collaborative design, development, operation and maintenance; with good scalability; provide Open API for each product function module, can be re-developed; between multiple data instances Data authorization mechanism to ensure that data can only be used but not visible; provide white-screen operation and maintenance capabilities, as well as field-level data quality monitoring, machine warning, resource usage monitoring and other functions, so that users can better control their own data and Data task.

The calculation engine and the big data development kit depend on each other to form the underlying technology platform of the data plus, corresponding to the Hadoop technology platform I mentioned above.

Alibaba Cloud's main goal should be to do this technology platform and open up the platform's capabilities more quickly and better. This layer is the core competitiveness of Alibaba Cloud Big Data.

 

2-2) Data plus application platform ecosystem

Based on the above technology platform, Alibaba also added data engines, services, and products such as rule engine, recommendation engine, text recognition, intelligent voice interaction, and DataV visualization. Many of these products are extracted from Alibaba's own business and can be directly supplied to enterprises and combined into various solutions. For example:

The machine learning released by "Data plus" can predict user behavior, industry trends, weather, traffic, etc. based on massive data. Graphical programming allows users to develop without the need for coding and dragging standardized components with the mouse. The product also integrates Alibaba's core algorithm library, including feature engineering, large-scale machine learning, and deep learning.

The rules engine is an online service for resolving frequent changes in business rules. It can write business rules and make business decisions by simply combining predefined condition factors. For example, the bank will set up a phone call if the user is trading in two provinces within 10 minutes.

The recommendation engine is a data tool for predicting user preferences for items in real time. It helps customers discover what users are most interested in.

Text recognition provides detection, recognition, and common document type detection and recognition in pictures taken in natural scenes.

Intelligent voice interaction is an online service based on voice and natural language technology. It provides intelligent human-computer interaction experience for smartphones, smart TVs and Internet of Things.

The ultimate goal of Data plus is not that Alibaba Cloud will develop all of these data services by itself. The key point is that the "Data plus" big data platform will also be open to teams with data development capabilities. These teams can be stationed in the "data plus" and use the tools added to provide data services to all walks of life. Alibaba Cloud plans to attract 1,000 partners to settle in 3 years and share 1 trillion big data cakes.

Based on the underlying technology platform, the upper layer can form a rich ecosystem. Through an open platform, condensing the power of the industry and providing big data services to more enterprises and individuals, this is the era of Pratt & Whitney. Data analysis from the industry to predict the direction of the industry; as small as each of us, we can enjoy big data services and facilitate personal life.

 

2-3) The data plus transaction ecosystem

Based on the technology platform and application platform, the individual feels that in the future, the number of transactions that can build a big data can be included in the market, which can include:

Application Trading: In the above, I focused on the data ecosystem and algorithmic economy. Algorithms are another important element in the era of big data, and the future is also tradable. Algorithm-based engines, services, applications, etc., can be developed based on the number of plus, not just for their own use, or even as a public service or product for sale.

Data trading: Data is one of the important basic elements of the era of big data, and it is also the basic production data in the era of big data, the blood of the era of big data. As such an important means of production, it must be circulated in order to maximize the value of big data. The data plus through multi-tenancy, available invisible, secured transactions and other designs, in the future can solve various problems in data transactions.

Of course, if you want to achieve big data transactions, you must first solve the data privacy, security, laws and regulations, supervision and other issues. There is still a long way to go before these problems are resolved.

 

3) Why choose data plus

Small businesses not only lack their own data, but they can't afford to build big data platforms. They often have long cycles and high costs. Many self-built big data platforms have various problems because they have not undergone various practical tests and there are no corresponding development tools or tools.

However, the emergence of several plus will be expected to improve this situation.

According to the measurement data disclosed by Alibaba Cloud: the cost of self-built Hadoop cluster is more than three times that of the data plus, and the cost of EMR for foreign computing vendors is five times that of the number.

In terms of operational efficiency, on October 28 last year, Sort Benchmark announced the final results of the 2015 sorting competition on the official website. Among them, Alibaba Cloud completed 100TB of data sorting in 377 seconds, breaking the 23.4 minute record created by Apache Spark.

In the two gold-rated GraySort and MinuteSort evaluation systems, Alibaba Cloud created four world records in the general and special purpose sorting categories.

The data plus carries Alibaba's Exabyte scale data processing calculations and has undergone practical tests by tens of thousands of engineers.

With big data technology, Alibaba has achieved great commercial success. Through the analysis of customer behavior on the e-commerce platform, ant small loans, flower plaques, and borrowings were born; the rookie network provided technical methods for the upgrade of the express industry through data products such as electronic documents, logistics clouds, and rookie world.

It can be seen that through the addition of the number, companies can not only obtain various development tools more conveniently and cheaply. In fact, more important than development tools is the ecosystem of big data in the future. In addition to the number, they can easily obtain all kinds of data and services they want.

The release of "data plus" has clearly lowered the application threshold for big data. Through "data plus", any enterprise and individual can carry out the development and application of big data very conveniently. At the very least, there is a great improvement in speed, cost and development efficiency.

 

4) Problems that Data + needs to face

 

4-1) Security issues of data plus based on the public cloud

Some people worry that Alibaba will peek or use the data, but actually do not believe in Alibaba Cloud. Of course, Alibaba Cloud's official answer is categorical: No!

Xu Changliang, senior director of Alibaba Cloud Data Division, stressed that data is a valuable asset for customers, and no cloud computing platform can be used for other purposes. Alibaba Cloud will strictly abide by the Data Protection Proposal launched last July, and hopes that the whole industry will be able to self-discipline and jointly meet the outbreak of the big data industry.

4-2) Volume issues of data plus based on the private cloud

If companies are really worried about data security issues and want to build their own proprietary cloud solutions, at present, the solution based on data plus is too complicated, there is no big budget, and basically there is no way to implement a proprietary cloud solution, so For SMEs, it is unrealistic to adopt a solution based on a dedicated cloud data plus.

In my opinion, the future trend must be a public cloud solution.

 

Because:

  1. Data needs to flow, and each other has the greatest value, so the data must be exchanged and traded. This depends on the public cloud.

  2. Data processing tools, algorithms, products, etc., is also a shared ecosystem, can not expect all things to be developed by themselves. Like the industrial society, the era of big data in the future is an era of global division of labor. We cannot expect our factories to solve all problems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.