First of all, the three scenes are introduced. The first one is the story of diaper and beer sales. Wal-Mart stores put beer and diaper on sale and finds that the sales of beer will increase with the sales of diapers. The most original big data analysis; the second scene is the match between Alpha Dog and Korean professional Go players. In the end, Alpha Dog defeated the professional Go player and became the first artificial intelligence robot to defeat the World Championship of Go; the third scene It is now that when we enter the subway and the train station, we can not only brush the ticket, but also identify the face. These three processes represent the history of the entire artificial intelligence, the first process is the initials of big data, we obtain data and analyze the data; the second scenario represents a peak in machine learning artificial intelligence; third The scenario shows that we have applied the technology of machine learning artificial intelligence to our lives.
At the same time, market research shows that the market revenue of AI category increased by 9% from 2015 to 2016. The market for BIL software only grew by 4.4%. Among the more than 2,000 companies surveyed, 11% have deployed AI solutions, and 53% plan to deploy AI solutions in the next five years. AI will become more and more in the future.
One of the most important scenarios for machine learning is the recommendation in e-commerce. In e-commerce, recommendation is a common technique. In addition, there are many areas where deep learning is used. In the medical field, it is very common to analyze symptoms through imaging data. However, some diseases cannot be analyzed by means of imaging data, such as the symptoms of Amosheimer. If we see cerebellar atrophy through imaging data, this person has already got the disease. Now we need to use the data to predict how likely this person is to get the disease. Another area is weather, and the Central Meteorological Observatory will forecast weather based on various data. We now have a lot of weather-making apps besides the Central Meteorological Station. These apps are not based on big data, they are based on machine learning or artificial intelligence techniques.
If you want to do something good, you must first sharpen it.
Computing and storage are two prerequisites for doing artificial intelligence. Calculation and storage are the most expensive of the entire implementation process. Machine learning also requires a large number of talents who must have knowledge of information theory, calculus, matrix theory, programming, and probability theory. At the same time, machine learning is a project. The whole process needs to preprocess the data, extract the features, and also need to use the algorithm to train the model.
As shown in the above figure, Alibaba Cloud AI platform borrows Ali's existing high-performance cloud computing to reduce storage and computing power, and applies Ali's existing optimized algorithms and frameworks to apply products to the product package. This can reduce the threshold for users to use machine learning.
In the machine learning process, 20% of the data is processed, 15% for sample generation, 5% for model evaluation, 15% for feature extraction, 40% for model training, and 5% for model application. How to connect the entire process together, this requires some functions of the machine learning PAI. The first one is the overall architecture. With the MAC layer at the bottom of Alibaba Cloud, the CPU/GPU computing power is provided. The MPA is abstracted on the upper layer framework, and the encapsulation algorithm, classification algorithm, regression algorithm and Sequence algorithm, etc. The top layer is some of the applications that users use to develop themselves, such as weather, traffic, banking, and more. This platform provides a large number of algorithms, including data processing, feature engineering, statistical analysis, as well as some commonly used machine learning algorithms and some deep learning frameworks. At the same time, some visual experiment environment is provided, because the experiment process is a process, and the visual experiment environment is provided to show the whole process. We only need to set some parameters for the data, algorithms, evaluation, and prediction related components of the whole process, and the whole process can run.
We are different
Open source in deep learning requires the use of machine learning PAI on Alibaba Cloud, and users cannot build their own platforms. This is not the same as open source, but also has some compatibility with open source. Different places are reflected in Ali's e-commerce platform recommendation. Ali's merchandise and user data volume is very large, and ordinary algorithms can hardly meet this demand. I have encountered many challenges in doing e-commerce recommendations. We have optimized open source algorithms to solve these problems. Efficient distributed communication reduces network consumption is one of the optimization points. We need to use distributed communication without affecting the job. We need to notify another computer to start the job when a node fails, download the data from the failed node and run the job again to ensure the long-term operation of the entire operation. .
First of all, PAI is an incubating product. It must support multiple genealogy. It can't be used for only one individual. It needs to provide services to all groups and all the Internet. Users can provide jobs on the platform at the same time, so that these jobs need to be safely isolated during the operation. The most important thing about isolation is the isolation on the network.
Optimized with Ring AllReduce for communication, Ring AllReduce is a very efficient solution for HBC. Baidu Silicon Valley's lab ported the Ring AllReduce solution to GPU communication. Ring AllReduce is very simple. The principle is that when the first round of data is being communicated, each node is doing data transmission to the next node. This data is passed to the next node in an orderly manner, and the traffic of the data does not increase as the number of nodes increases. There are many ways to implement Ring AllReduce. Ali is based on the Rendzvous interface. Rendzvous is an interface of gRPC, RDMA, and NCCL. After implementing Ring AllReduce based on the underlying implementation, we only need to make a declaration in the code.
Communication performance optimization is the underlying gRPC handler using multi-thread parallel processing. The blue color shown above is multi-threaded communication, and orange is the most primitive open source communication. From the above data, we can see that the number of oranges is decreasing when changing from 64 cards to 128 cards, while the multi-thread communication blue is increasing. As can be seen from the figure, the use of multi-threaded parallel processing 64 card will increase by 1 time, 128 cards increased by more than 3 times. Therefore, the more cards, the higher the performance.
Internal case
One of the cases is the review of Taobao pictures. If the seller posts some pornographic images, they will be automatically recognized in the background. The first two cards were used, and the entire model took 288 hours to train. Later, I used Ali's framework to train with 16 machines and 32 cards in a distributed environment. It took only 20 hours to complete the training, which was 14 times faster than the single machine.