As the pioneer of the Internet finance field, founded at the end of 2009 Ali Small Loan development intention is to Taobao and Cat on the seller to provide microfinance business, to achieve "let the world No difficult business" goal. With the rich form of products, the expansion of service objects, small loan business growth quickly, on average annual growth rate of four to five times times. After nearly 5 years of development, there have been more than 360,000 people from Ali small loan loans, the minimum loan amount of 1 yuan.
Unlike the traditional bank, Ali small loan has reshaped a set of credit evaluation system and credit concept, its microfinance model does not need collateral, it is pure credit loan. Where does credit come from? From the borrower online business credit data, financial data, and so on, Ali small loan to the borrower on the Internet accumulated massive data analysis to give credit. Data become the basis of Ali's small loan business model and also the core. At its root, this is a data business.
The data business, from the first day, is rooted in Ali cloud computing flying platform.
Data-driven growth, Nuggets Ali data
"3-minute application, 1 seconds of lending, 0 human intervention," This is Ali small loan unique set of efficient work model, such high efficiency depends on its unique business model, it will be lending, loan quotas and risk assessment, such as traditional banks most valued business links to the large data processing platform. " Ali Small Loan Data Warehouse total data volume of more than 10 PB. Daily processing of PB of data, including shop level, collection, evaluation and other tens of billions of information items, the operation of hundreds of data models, and even need to assess the user's assumptions about the extent of the cover-up and lying. The end user can apply for a loan, how much money can be loaned, relying entirely on the large data platform for its calculation of the credit value. These undoubtedly to the large data processing platform reliability, security and the accuracy of the calculation put forward very high requirements, the wrong one may result in a relatively large loss of funds.
In order to ensure the accuracy of the data calculation, Ali small loan to mining from all levels to apply for the loan seller or consumer information, the process will almost use the entire Ali group including Taobao, Cat, Business-to-business and 1688 and many other business units of data, as well as the group's external social data. This also makes Ali small loan gathered a number of project technical backbone. Among them, is responsible for the Ali small Loan Data Warehouse Foundation model and the Foundation platform construction architect Chen Pengyu and is responsible for the early operation Dimension work Senior Manager Zhang is the project important technical person. Recently, they deeply share the development process of Ali small loan technology.
According to their introduction, the small loan business is now daily from the external synchronization of the amount of data on hundreds of TB. Data synchronized after the small loan data Warehouse three-layer processing, processing results to the indicators (output model to be used offline variable) to pass the form of Ali Group's general decision system Agds. The universal decision system can support the off-line variables provided by tens of thousands of data warehouses, including the large number of data reflected by buyers, sellers and store transactions. In addition, it will obtain some real-time information from other systems, such as the seller's current transaction situation, the specific punishment situation and so on, and combine these real-time data and the off-line variables of the data warehouse to calculate the seller's risk degree and the credit model dynamically. Therefore, the general decision model is a small loan real business engine, on which there are about thousands of rules, can be understood as a formula with thousands of variables, can calculate the seller's ultimate credit model. When the loan system is applied, the engine can tell whether it can lend, how much it can put in, and so on. This is the whole process of making business decisions based on data calculations, as shown in Figure 1.
In addition, the business characteristics of Ali small loan determines its security and real-time also has a high demand. Ali small loan used a variety of data, including the group's internal data and other partners to provide data. These data are highly sensitive, Ali small loans as users have the obligation to ensure data security, so the security of the platform has a higher demand and the issuance of loans, to do a good job after the loan monitoring. A small loan usually evaluates its credit risk based on the borrower's online behavior data. The more timely this assessment is made, the more likely it is to give early warning and salvage losses in the case of anomalies. And the accuracy, security and real-time, such as the core business characteristics of Ali small loans have been the strong support of flying platform.
Grounding flying platform, the rapid growth of business
Because Ali small loan business and data dependent, with the wide growth of data sources, the database will become more and more large, no traditional data platform to support this business model, while the aforementioned Ali small loan to the accuracy and security of stringent requirements, at that time the industry did not mature solutions, Combined with the cloud platform in storage and computing can provide a strong expansion capability, Ali small loan into the "flying" Platform for the first batch of users, and witnessed the flying platform every step of growth (as shown in Figure 2). In turn, it is the "flying" such a powerful cloud platform to support small loans on average four to five times times a year such a near savage business growth. April 10, 2010 small loan business first based on the "flying" loan products-Taobao order loans released, only for Hangzhou to provide services. At that time the flying cluster only 30 servers, small loan business with two clusters, data processing engine is SQL Engine 0.2. Two clusters run the same business at the same time, and after the results are calculated, it is necessary to manually back up the results to another cluster. In September of that year, Ali small loan decided to open the business to the whole country, but in all ready, found that cluster size can not support such a large amount of calculation. At this time, the flying team did something that looked great at the time, the cluster scale from 30 to 100, SQL Engine 0.2 upgraded to 0.8 version, the guarantee of Ali small loans in November 2010 to expand the business smoothly to the whole country.
At the end of 2010, the loan model of Ali small loan was developed into 10 by the original two (credit and post credit warning). We gradually found that to support so many models can no longer use the chimney-style method, because the data to go after a long processing link, to get the indicators to use. At this point, you need to have a common database at the bottom, so you start building data marts. At this point, the same problem with the expansion of the national business, cluster processing capacity is again challenged. One more version upgrade of the Flying platform, upgrading the data processing engine from SQL Engine 0.8 to Dataengine 0.2,data Engine 0.2 has two major breakthroughs: 1. Storage compression, compression ratio can reach 2~5 times 2. Computing performance has increased dramatically. Finally, the bottleneck of cluster size was successfully broken, flying for Ali Small loan performance upgrade provides sufficient storage and computing capacity.
With the development of the business, data mining projects to Ali historical data needs more and more, some projects even from the 2003 Taobao since the establishment of all the data of the seller to dig information, this will be the data span and quality of the high requirements. So, Ali small loan began to build large data warehouse, but this undoubtedly to flying put forward a greater challenge, especially in the stability aspect must have the assurance. ODPs's research and development began by merging the data engine with Moye, which runs on ladder 1 to support 1500 servers, and Moye as the ODPs data processing engine. During the same period, flying will cluster Server scale to the thousand levels. At this time to Ali small loan, equal to ODPs will flying platform package in the bottom, mainly by ODPs to Ali small loan business to provide support.
At the beginning of 2012, the group began to migrate the data cube, Taobao Index, TCIF and other important business from the Hadoop cluster to the ODPs cluster, to some extent to solve the problem of the data sharing has been plagued Ali small loans. Previously, Ali small loan share group data most to go to the Hadoop cluster to take, data synchronization, upstream data replication, downstream time convergence and so on may have problems, it is difficult to guarantee the quality of data output and stability.
After the business migration was completed, a group's data center, the ODS layer, was built on the ODPs cluster, which compiled data from Ali's business units and provided a basic unified data service on it. Overall, the contrast Hadoop,odps advantage is obvious: 1. ODPs has a set of data isolation, authorization mode characteristics, in the logical isolation of data to do well, at the same time provide a variety of rights management strategy: ACL, POLICY, Labe, the most powerful is, in the case of strict data isolation support hybrid computing--protected mode; 2. ODPs is more useful in data calculation and data management, and can be compared with traditional RDBMS in many aspects, such as complex open window function query, etc. 3. ODPs product ease of use, easy to use; 4. After 5K, the scalability of the ODPs cluster is theoretically capable of unlimited expansion; 5. ODPs supports multi-tenant model, which can be flexibly controlled in data, resource allocation and isolation, etc. 6. Performance and stability, although the difference between the two small, different scenarios have advantages and disadvantages, but ODPs obviously go farther.
5K appearance, platform and technology walk in front of the business
The group's ODPs data center and Ali's small loan business system are placed in the same cluster, which will inevitably result in a scramble for resources as business grows. But if the data is stored separately in two clusters, then the data is copied from the Hadoop cluster to the
The ODPs cluster is calculated without a second. At this time, the expansion of the scale of flying clusters become inevitable choice. Also at this time, "flying" has moved towards the 5K era, platform and technology development in front of the business needs.
Mentioned flying 5K, in addition to a single cluster size from more than 1000 units to upgrade to 5000 units, there is another important level of significance can not be ignored, these 5000 is not a ceiling, ODPs cluster has a strong cross-cluster replication capabilities to ensure good connectivity between the clusters, the data have better sharing. Now, the ODPs data center has moved to another 5K cluster, but thanks to 5K's ability to replicate across clusters, the ODS layer can provide data to multiple parties without explicit copying, and users will not feel the change in the cluster or data transfer latency.
July 8, 2014, ODPs officially opened to provide services to the outside world. Previously, ODPs has been Alibaba internal secret weapon, the first user is Ali small loan. Traditional data warehouses are generally built in Oracle and other relational databases, and Ali small loan set up in the flying platform, it is inevitable that ODPs can provide a lot of relational database functions, such as storage process programming capabilities, as well as a lot of data distribution of open functions and data analysis functions. This is technically challenging because it is much harder to solve these problems in a distributed framework such as ODPs than in a relational database. ODPs more realistically simulates the physical table and view concepts of traditional RDBMS, including storage partitions like Oracle, which support many traditional data functions.
However, in the process of building a large data warehouse on ODPs, there are still a few places to note, sharing one of the most critical points-the partitioning of the granularity of data management, the partitioning of ODPs and project. Basically, ODPs can be understood as a basic unit of data management, and there are several common ways to divide granularity: by layers, one project per layer, or the entire data warehouse for one project, or each subject data as a project. This granularity division has a significant impact on the management costs of the latter. However, there are different partitioning schemes for different stages of different teams. There is no absolute division of the plan, generally from several dimensions to think: Data Warehouse for the user scope and data business to face the crowd, not only developers, users will also be in the Data Warehouse direct use of data; How to make decisions and division of labor; some strategies for data security, such as the fact that the underlying layer of the data warehouse may not be open, how it outputs to other systems, and different output strategies can also affect the partitioning of data management granularity. Be sure to divide the granularity of data management at the beginning of the data warehouse, because it is almost impossible to backtrack unless you have a larger refactoring.
Borrow force cloud, small enterprise and big enterprise synchronous start
From Ali small loan This data business development process is not difficult to see, the small loan business and Aliyun between both running-in, and mutual promotion, thus achieving the rapid development of each other's business. Cloud computing for the small loan business is not only the support of the platform, cost reduction, in a sense, cloud computing is the core of the new Internet financial business can rely on the natural platform.
With the continuous improvement of the underlying technology and sustainable development, more SMEs can be in the cloud computing platform to obtain data storage, data processing services, and thus build a wealth of data applications, cloud computing makes every small enterprise with a large enterprise synchronized starting with the emboldened, supporting more innovative services emerging, This is the importance of cloud computing load.