TENSORRT Depth Learning Inference Framework Introduction _

TENSORRT Depth Learning Inference Framework Introduction __ Depth study

Last Update:2018-08-20 Source: Internet

Author: User

Tags assert pytorch mxnet

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Creating a background

The development of in-depth learning led to a number of in-depth learning framework, Caffe, TensorFlow, Pytorch, and so on, for the huge amount of CNN, efficiency has been the concern of everyone, contact with the depth of network compression students should know the network compression of the two key ideas, pruning and quantification.

TENSORRT is quantization, the FP32 value data is optimized to FP16 or INT8, and the inference precision does not decrease obviously.

The first thing to know about TENSORRT is the following points:

1. TENSORRT is a deep learning inference tool developed by NVIDIA, which only supports inference and does not support training;

At present, TENSORRT3 has supported the mainstream depth learning Library of Caffe, Caffe2, TensorFlow, Mxnet and Pytorch.

2. The TENSORRT bottom is optimized for NVIDIA graphics cards in many ways, not just quantization, but can be used in conjunction with CUDA CODEC SDK,

Which is another development package deepstream;

3. TENSORRT is independent of the depth learning framework, through the resolution of the framework file to achieve, do not need to install additional DL library;

Reference schematic:

two. Use of TENSORRT

Above is the introduction of TENSORRT, can also refer to official documents, more authoritative some: HTTPS://DEVELOPER.NVIDIA.COM/TENSORRT

The following example describes the use of TENSORRT in Caffe:

1. Caffetogiemodel-Convert Caffe model to TENSORRT format

+ void Caffetogiemodel (const std::string& Deployfile,//name for Caffe Prototxt
Const std::string& Modelfile,//name for model
Const std::vector<std::string>& outputs,//network outputs
unsigned int maxbatchsize,//batch SIZE-NB must is at least as large as the batch we want to run with)
Ihostmemory *&giemodelstream)//output buffer for the GIE model
{

1. Create builder
ibuilder* builder = createinferbuilder (Glogger);

2. Parse Caffe model, save to network
inetworkdefinition* network = Builder->createnetwork ();
icaffeparser* parser = Createcaffeparser ();
Const iblobnametotensor* blobnametotensor = Parser->parse (LocateFile (deployfile,                 directories). C_STR (), LocateFile (Modelfile, directories). C_STR (), *network, datatype::kfloat);

3. Specify output tensor for
(auto& s:outputs)
    network->markoutput (*blobnametotensor->find (S.c_str ()));

4. Construction of Engine
builder->setmaxbatchsize (maxbatchsize);
Builder->setmaxworkspacesize (1 <<);

icudaengine* engine = Builder->buildcudaengine (*network);
ASSERT (engine);

5. Destruction of parser
Network->destroy ();
Parser->destroy ();

6. Engine serialization to Gie, exit
Giemodelstream = Engine->serialize ();
Engine->destroy ();
Builder->destroy ();

｝

2. Execution Process Main

1. Create Gie model from Caffe model, serialize to stream
ihostmemory *giemodelstream{nullptr};
Caffetogiemodel ("Mnist.prototxt", "Mnist.caffemodel", std::vector < std::string > {output_blob_name}, 1, Giemodelstream);

X. Data Acquisition (abbreviated)
//X. Parsing of Mean Files (abbreviated)

//2. Deserialization, get runtime engine 
iruntime* runtime = Createinferruntime (Glogger);
icudaengine* engine = Runtime->deserializecudaengine (Giemodelstream->data (), Giemodelstream->size (), NULLPTR);
if (Giemodelstream) Giemodelstream->destroy ();

3. Create context
IExecutionContext *context = Engine->createexecutioncontext ();

4. Run inference
float prob[output_size];
Doinference (*context, data, prob, 1);

5. Destruction of Engine
Context->destroy ();
Engine->destroy ();
Runtime->destroy ();

3. Inference Process Doinference
void Doinference (iexecutioncontext& context, float* input, float* output, int batchsize)
{

Const icudaengine& engine = Context.getengine ();
The input output buffer pointer passed to the engine-requires precise iengine::getnbbindings (), where an input + 1 output assert (engine.getnbbindings () = 2);

void* buffers[2]; 1. In order to bind the buffer, you need to know the input and output tensor names int inputindex = Engine.getbindingindex (input_blob_name), Outputindex =

Engine.getbindingindex (Output_blob_name);
2. Create GPU buffer and stream CHECK (Cudamalloc (&buffers[inputindex], batchsize * input_h * input_w * sizeof (float)));

CHECK (Cudamalloc (&buffers[outputindex], batchsize * output_size * sizeof (float)));
cudastream_t stream;

CHECK (Cudastreamcreate (&stream)); 3. Via DMA input to GPU, asynchronous line batch, and DMA return CHECK (Cudamemcpyasync (buffers[inputindex), input, batchsize * input_h * input_w *
sizeof (float), Cudamemcpyhosttodevice, stream));
Context.enqueue (batchsize, buffers, stream, nullptr); CHECK (Cudamemcpyasync (output, Buffers[outputindex], batchsize * output_size*sizeof (float), Cudamemcpydevicetohost,
stream));

Cudastreamsynchronize (stream); 4. Release Stream and BuffER Cudastreamdestroy (stream);
CHECK (Cudafree (Buffers[inputindex])); CHECK (Cudafree (Buffers[outputindex]));

｝

Three. Model transformation

TensorRT3.0, although known as supporting Caffe, Caffe2, TensorFlow, Pytorch and other network models, in fact, the example only provides Caffe and TensorFlow direct support.

The support for Caffe is simpler and can be done directly by loading deploy file and Caffemodel, while for TensorFlow it is loaded by converting to a uff format, which can refer to the sample program.

Network model conversion and deployment can be divided into three steps:

1 training model and save as. pb file;

2 Convert the. pb file into the. uff format;

(3) using TENSORRT to load and run the model;

For the Caffe2, temporarily did not find a better conversion tool, experienced friends can explore.

Four. Discussion on the unified model
In the context of the current deep learning framework, TensorFlow, Pytorch, Caffe, Caffe2, Mxnet, CNTK, and so on, have their own user base, the format between the different frameworks for communication and data sharing has brought great drawbacks, various conversion code, Another siege lion, there is competition must have alliances, unification is the inevitable result.

For example, we look at the Caffe-> CAFFE2 model format conversion process

You can use the conversion scripts provided by Caffe2 caffe_translator.py

Python-m Caffe2.python.caffe_translator deploy.prototxt Pretrained.caffemodel

Format conversions between different frames through scripting. NNVM & Onnx

NNVM from the Chen Tianchi team, for its more intuitive description, we can refer to:

TVM and the previously released modular depth learning system NNVM together, "a complete optimization tool chain that is composed of deep learning to various hardware."

Consistent with LLVM, you can understand the compiler that compiles, optimizes, and links different languages (the DL framework). The goal is to consolidate users of multiple DL frameworks and to PK (the same functional module at the bottom of the TF).

Or, based on NNVM, you can implement a set of your own depth learning framework (using Tinyflow), the code needs only 2000 lines, can be implemented to a variety of hardware rapid compilation deployment, listen to very tool.

ONNX is an open standard jointly launched by Facebook and Microsoft to achieve interoperability between different frameworks;

It sounds like NNVM is so grand and what kind of relationship it will have to follow.

For the moment, Onnx first focuses on the transformation of the result model, which is the inference part.

(On the standard and the entrance has always been a military strategist, the current stage does not need to invest too much energy, static and other winners of the book, the situation soon clear)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More