In 2008, Taobao was separated from Taobao Mall's system, including commodities, transactions, marketing and stores. At that time, we were maintaining a huge system, and the code and various problems seriously restricted the development of the entire technology. In this context, we have made a distributed transformation.
This is very helpful for technology. For example, our database, from small changes to large re-selection of databases, can be more transparent. If you don't pull out the shared service layer, then the change of this thing is unimaginable. Now we do a lot of innovation work on the basic technology, but the upper business system does not need to modify any code at all. So one is the innovation of technology, one is that the efficiency of the business has brought about a very positive change.
The entire architecture has changed. It turned out to be an application, which became like this after the application was completed (see figure). This is only version 0.1 and is used for teaching. But now there is no way to learn, there have been more than a few thousand systems, such a scene that has not been painted.
So what technologies are needed to make a larger distributed architecture? The first Alibaba, but also under the APS, but also under the screening, as well as the service framework, as well as the governance of data, in addition to our application software, as well as our development framework, the bottom may be a storage, Very important is storage, a lot of data is added to the mitigation. For example, if you visit Taobao now, the data of the members and the data of the products that are frequently visited are all cached. There are also file systems. These images exist in the system. This scale is also very large, and it is also a very large file system in the country. There is also Alibaba's relational database. We use TDDL to do a sub-database partitioning of data, and we use message middleware to solve asynchronous processes and solve the problem of data consistency. Through these seven middleware products, the distributed architecture of the entire Alibaba technology system is formed. This makes it easy for our engineers to develop a stand-alone version of the program while developing our system.
What I just talked about is the presentation of distributed technology. In fact, after using distributed technology, the direct impact is that our entire scalability has been greatly improved. At the same time, it has changed from the original commercial system. After the distributed system, the cost reduction is also very obvious.
Which technologies were born in the "Double Eleven Singles Day"? "Double Eleven Singles Day" It is a for the engineer, just like facing the college entrance examination, you may have taken a college entrance examination once in the room. Our engineers have to do the college entrance examination every year. It is the same as everyone, and it is very serious.
A total of seven years of "double eleven singles day" down, no big problems. In the past two years, the entire technical system has some precipitation. From 2009 to 2010, there was still a ignorant process, and there was basically no problem. Engineers did not have any special perception.
However, there are some minor problems. I realized that the biggest problem at that time was the picture. Because the capacity of the CDN was up, we did one thing to display the last pictures of some products on Taobao. The second thing, because just before the transformation in the middle and late 2008, after the whole system is distributed, the scale of the system increases, and the number of systems increases, which leads to the logic of the whole system, or the whole dependence. Relationships can no longer be placed and sorted out from the perspective of engineers. A tool has been used to govern the dependencies, flows, strengths and weaknesses between systems.
In 2011 and 2012, it was 19.2 billion and 370 billion. At this time, the challenge to the entire trading system was very great. Therefore, the main expansion is the entire capacity, and the entire capacity is expanded very quickly. Before that, we said that there are two biggest challenges in the "Double Eleven Singles Day". The first is how to assess the capacity, how much capacity the system can support, and the evaluation of this capacity began at that time. The second one, the generation of governance, capacity planning, that is to say, doing these things, in fact, did not perform well, in 2011, 2012, there were a lot of problems in the first 30 minutes, suddenly the shopping cart products were deleted. The red envelope is not useful when you place an order, and various problems. After these problems, the next generation of technology was introduced, for example, we did an automated data correction system.
In 2013 and 2014, we had an advanced technology. We just mentioned in the video that we are doing this before the “Double Eleven Singles Day”. It is a new height in capacity planning. When we did the capacity assessment before, we evaluated how the capacity of each system was evaluated. By draining, the current user, we put the user's traffic on one, look at the application, what your performance looks like when your traffic is rising, our CPU, our memory, and so on. When it has an inflection point, we record these values so that we know the capacity of the system. This method is very accurate, but there is a problem, because the amount of your system is very large, even if I participate in the entire "Double Eleven" system, I predict, because I can't figure out how many, there are several hundred systems. Direct participation in the "Double Eleven" transaction is the part of the space shuttle. If a part has a problem, the whole will be destroyed.
Before the "Double Eleven Singles Day" we did such a thing, it is based on the completely real scene, all use the current server, and then simulate the user behavior that may occur on the "double eleven singles day" day, system, server, also There is real logic to do the pressure test. Compared with the original pressure measurement method, it has several better places. The first original method is real traffic, but the actual situation is when the promotion is "double eleven singles day" when it is promoted. The usual model is different. The usual path is to search or turn pages. "Double Eleven Singles Day" is definitely not like this. "Double Eleven" is definitely all things are added to the shopping cart. This time I Rarely do search or go to the order, and even some of the browser tools to grab something faster, the model is not the same, but we need to simulate the scene at the time. To sum up, we need such a technology: the first is to do some simulation based on business scenarios, the second is to generate huge traffic, this huge traffic can bring us the entire cluster, more than hundreds of thousands of traffic Filled up, then we add a new server to that server, and then we do it again, flattening the traffic of all servers.
There is another thing we did BCP anti-loss reconciliation, this product is done, if there is a problem will be found in the second level. When I first started doing BCP, I didn't think it was very big, because I thought that if all the code was written by engineers, I think that as long as the system is tested and tested, the logic will definitely not be a problem. The sense is not very big, but I have not refused to do this. Later, after discovering it, I found this thing still very meaningful. What is the reason? An abnormal process in our system can cause our data to be abnormal. For example, when I talked about "Double Eleven Singles Day", suddenly because my coupon was not used, I found it at this time, so in 2013 and 2014, the entire transaction amount reached 570 billion, but the overall performance has been very good. That is, this performance is different from other companies in our industry. No matter from the data, the experience is smoother than the peers.
In 2015, we reached a turnover of 91.2 billion, and the whole experience was very good, but we encountered new challenges. In order to support our massive calculations after 91.2 billion, after this day, it was very busy. It becomes very idle, and this cost waste is very large. But don't worry, because you can use Alibaba Cloud to solve the problem of flexibility.
I will introduce a whole Alibaba culture later. We are the most open source projects, such as 65 open source projects in 2015, 18 high-volume projects, and some open source projects that are well-known in China, all of which are engineers from Alibaba. It may be some precipitation at work, or a spare time to contribute. There are more than 300 people directly involved in open source operations in Alibaba, which is an open source culture.