Paddle Mobile is a deep learning framework dedicated to embedded platform in Paddlepaddle organization, integrating the practical experience of Baidu Mobile prediction, providing multi-platform support, based on the support of the project and compression of the underlying algorithm model, through the CPU, mall GPU and other hardware acceleration, Action on Dueros, Baidu app, Baidu Network disk app and other mobile scenarios. Currently, paddle Mobile has supported platforms such as CPUs, Mali GPUs, images, faces, and OCR models under the Paddlepaddle 0.14 version. It is worth mentioning that its size is very small, the minimum is only 300K. Today this article is by Baidu Big search senior research and development engineer Sheper Teacher, for everybody introduce paddle mobile technology realization and business landing.
The following is a transcript of Sheper's speech
Paddle Mobile Framework Overview
Figure one: Paddle mobile framework
The paddle mobile framework focuses on the combination of AI hardware to improve local user performance.
? Paddle Mobile training is divided into training process and operation process, training process is compatible with unified training model, training based on providing quantitative depth compression function.
? Target detection has single body detection, multi-body detection, gesture recognition.
? Real-time OCR guarantees full mapping of each picture in high-performance situations.
? Voice wakeup, resulting in countless sample outputs.
? Multi-hardware platform support, support all ARM Cpu,mali GPU, Qualcomm Dsp,arm-linux FPGA.
Architecture Design of Paddle mobile
Paddle Mobile architecture design is mainly divided into: Loader module, program module, Executor module, Op module, Kernel module, Scope variable Tensor module. Its model is divided into two structures: one for the parameter file is diffuse, such as, the red box is the model structure of the Protobuf file, the rest of the parameter file.
Figure II: Model
The role of the loader module is to load the model structure information into memory, such as the Protobuf file in the red box load into the memory, and optimize the model structure (such as a few fine-grained op fusion into coarse-grained op, such as the conv, add, Batchnorm, re LU Fusion is Conv_add_batchnorm_relu), which facilitates algorithm optimization. Since Conv can be converted to two large matrix multiplication, it can be further divided into a number of rows of a column of small matrix multiplication, and the final operation is obtained.
Figure III: OP Fusion
Program for the Loader module results, including the optimization of the model structure of the object, as well as the optimized model structure of the object, this module corresponds to the PADDLEPADDLE model structure, about the paddle model concept definition, detailed design can refer to program.md.
Kernel is implemented for OP's underlying operations, with two functions, Init and Compute, respectively, for initialization, preprocessing, and operation, and it is worth proposing that kernel will be based on generics to different platforms:
Figure IV: Platform
Different platform kernel implementation, for the same kernel class different generic implementation, there are currently three platforms, ARM, Mali, FPGA, the central-arm-func\ directory in the diagram for the OP kernel arm implementation, it undertook the arm\ Recording the underlying implementation of the kernel, while the ARM processor as the central processing Unit, the central-arm-func\ can also be used as the underlying implementation of other coprocessors, such as: One of the FPGA op kernel does not have the FPGA coprocessor implementation, you can directly call the use Here's the arm implementation.
Optimization of Paddle Mobile
The optimization of paddle mobile is embodied in two aspects, volume optimization and efficiency improvement. The volume optimization is carried out from three aspects: first, the model quantization compression scheme; second, the code is deeply streamlined; third, the model is packaged. The efficiency improvement is mainly from the operator fusion and each platform special realization two aspects only then carries on the promotion.
Scope is used to store the variable that the management needs to use to store different types of objects, mainly the matrix is tensor, that is, Scpoe manages all the parameter matrices in the OP operation, input and output matrices. Scope can be understood as a map, where a layer of scope is sealed on the map to facilitate memory management. Scope can be used to store different types of objects, paddle mobile mainly use it to store tensor,tensor represents the matrix, through generics can be used to store different types of matrices, but it should be noted that the type of deposit and withdrawal must be consistent, if the type is inconsistent, Cannot pass type checking.
Question Session
Question: Put a lot of model compression on the mobile side, compression is manual or automatic.
Sheper: We provide a script that, like the paddle Mobile project read model, we map to a value of 255 multiplied by 255 for the budget. Concrete practice can see paddle mobile this project has quantified script, make a simple fragment.
Question: Compression is to read the matrix A little bit, which part is read?
Sheper: My understanding of this algorithm is not particularly deep, this would like to learn more about the Paddlepaddle can be opened in the document list search.
Question: Can you provide a simple algorithm and then spread it, we have a simple budget?
Sheper: No, the follow-up consideration will support.
Question: How big would it be if you put a good model on your phone?
Sheper: At present, after compression, almost 5 trillion, 6 trillion or so, very small. For example, some AR team set up an egg model, only more than 50 k, do not quantify, quantization after 10K, there is no need. Model size is related to complexity, with a data training, the parameters are large, the model design is simple, the parameters are few, the model is small. Model size is the arrangement of pure data, very simple, how many numbers you have. So we design a model to consider the question is how to design a space-saving, efficient, and accurate model.
Question: Which model is used for your model.
Sheper: Model because it belongs to commercial things, we open up a part of the official website can query and watch the demo, external also provides an interface, the project compiled, ARM model on the phone, processing good data input, you can get all the results of the point. Very simple to use.
End of record
Sheper, Baidu Search senior research and development engineer, with many years of experience in Android development, multi-mode search team members of innovation, Paddle Mobile team members, landing features a real-time translation, lite version of voice search, suspension ball search and so on. At present, paddle mobile from the academic to the on-line work to facilitate the implementation of technology to the business landing.
Simple Search--paddle mobile technology implementation and business landing