Simulation of key technology of Caffe (I.)

Source: Internet
Author: User
Tags mathematical functions theano

Source

From the popular "programming master must read the Linux source code", to the market a variety of XXX source code analysis, analysis, we have seen too much too bad source code.

Read a source of the most painful is that suddenly pop out a large piece of code, data structure of a recognized, do not know where the variable comes from, the function is not understand.

It seems that those very bad writers, always like to meet the difficulties, the less you like the large piece of code, he will first paste a large piece of code at a time.

Do not optimize the order, do not know what is important, what is not important, even a profile is not.

This is not called the source code analysis, this is called with the Tang notes.

High school, once read Houtie teacher's "in the simplest of MFC," one of the most interesting is its second chapter "six Key technologies of MFC simulation."

This with many write source analysis of the author of large phase, there is neither a large piece of code, nor do any generalization.

Its "deep" essence is very simple, is to dismantle the source code, the complex, rigorous source of the implementation, combined with their own understanding, a concise, vivid way to simulate.

Reading such code, is very easy, because the original very complex source code, after the author carefully choreographed and two times rewrite, it seems to be more amiable.

Of course, we do not have to write books, but at least to learn to dismantle the source code, to simulate, simplify, two times rewrite the source code.

Read a complex source code, remember above his business. The biggest problem I've had when I rewrite Caffe two times is that, despite the same code that I've seen many times,

Every time the understanding is different, wrong, there is insufficient, only hand-compiled, debug, only to find that the original has been read "wrong".

"Do not build high-rise in the floating platform," This is the "in-depth MFC" The essence of this book, in fact, can also be extended to most of the source code analysis issues.

Caffe Key Technology ① Open and advanced design concept 1.1 references library: Look at why you're configuring Caffe for a day

Caffe is based on C + +, it is not C, it is strict C + +, strict to almost with C + + primer advocated, similar to the modern C + + programming style.

The C + + standard library is tightly controlled by the ISO C + + committee, unlike Java, so its built-in library resources are tightly constrained.

But this does not prevent the ability of C + + 's powerful reference libraries, but it can be a big hassle to configure these libraries.

Broadly speaking, the libraries referenced by Caffe are divided into the following categories:

1. Parallel Math calculation: Cblas (CPU), Cublas (GPU)

2. GPU Scheduling: CUDA

3. Local Database: Lmdb/leveldb

4. C + + standard library extension: Boost library (open source community maintenance, the most dazzling and largest collection of C + + libraries in history)

5, log Management and debugging: Google Glog

6. Perform command parsing: Google Gflags

7. Image processing and Transformation: OpenCV

8, data structure design, and high-speed serialization/deserialization: Google Protocol Buffer

9. Parameter expansion and portability: HDF5

These libraries, although you may not have used them, but they are in the process of related tasks, the speed is absolutely the most

Of course, these libraries have a very indisputable fact, and Google has an ambiguous relationship, and Google is also a wall. This is related to the author Yangqing Jia's working environment.

Caffe written at the end of 2013, the author interned at Google MTV in 2012 and participated in the Google Brain Project.

The biggest contribution of the Google Brain Project is the first generation of large-scale machine learning systems Distbelief (second on behalf of TensorFlow, open source)

So, Caffe has a special, it is written by people in academia, but this person is proficient in industrial technology.

I call it, Caffe is a semi-industrial and semi-academic framework.

In the Caffe code, you can see some typical industrial-grade code ideas and practices, such as [strict function parameters passed in]:

Google internal code for its programmers, in order to maintain good readability, the following provisions:

void Fun (std::string &str);
Such a function makes it difficult to know whether STR is an input or output parameter.

So it is recommended to write:
1. Input parameters
void Fun (const std::string &STR);
2. Output parameters
void Fun (std::string *str);


This style of code is strictly enforced in the Caffe.

C + + is a rigorous language, and good C + + code, you must have the same amount of rigor.

1.2 Cpu/gpu Hybrid Programming

Google Brain's distbelief system is flawed in that it is calculated based on distributed CPU nodes.

The head of Andrew Ng after the job to Baidu, very regret his original design concept, he said:

A few years ago I started and led a project that was still in Google, the project called the Google Brain. The project uses Google's computing infrastructure to build neural networks.

The scale is about 100 times times larger than the previous neural network, and our approach is to use about 1000 computers. This has indeed made considerable progress in deep learning. Use a considerable number of computers. Before long I realized that I didn't realize that using 1000 computers was a very expensive technology. So my friends and I realized that using a different technology, just three computers instead of 1000, could do that, and the secret was to use GPU technology.

So, Caffe in the original design concept, is the GPU as the core computing, CPU for auxiliary control and I/O framework.

The compiler macro functionality provided by C/+ + enables Caffe to create code with different platform requirements by simply adding a single macro to the flexible mix programming.

The latest version of Caffe, on the CPU and GPU, the balance is very good. CPU multithreading control and I/O,GPU multithreading calculation, both in harmony.

The opposite is Theano, which virtually does not release the full computational force of the entire machine, no matter what library (Keras,lasagne) you have two times in the package.

At best, it's just an upgraded version of the toy, so Theano is good for getting started, but don't rely too much on it because it's too slow.

1.3 Data structures

The object-oriented design concept is sometimes a hassle.

"Class variables should be private, only and should be provided only by public methods for external control", Caffe still rigorously enforces object-oriented design principles.

This is not a good idea, because the source will be all over the get, set method, resulting in poor readability.

So Google provides a way to manage all of the data structures in a way that is quickly defined in scripts and automatically generated by the machine cumbersome get and set.

This is the first reason to use Google Buffer protocol.

The classic system-level application program design, for data management, it is necessary to achieve sustainable-frequently in memory, hard disk data exchange.

This is called serialization, deserialization, in application design.

The traditional serialization process requires the programmer to manually memorize sequential contents of sequence. When deserializing, it is cumbersome to recover in strict order.

More importantly, the deserialization of general general purpose program frameworks is not efficient (e.g. QT, MFC)

In buffer protocol, Google uses an efficient coding method that makes deserialization efficient and significantly enhances I/O capabilities.

This is the second and most important reason to use Google Buffer protocol.

1.4 Database

Google Buffer protocol enables arbitrary complex data structures to be quickly encoded into a single string, paving the bridge for a "key-value" type database.

For a long time, application-level program development, usually using SQLite as a local database (most typical of Android development).

SQL is a convenient way to store data, but at the same time it is slow and can hardly be used to provide I/O buffering for high performance big data calculations.

At this point, Google also developed a matching fast "key-value" type of local database leveldb, skip SQL, directly below the implementation of BST to store.

Leveldb's I/O speed is almost dozens of times times that of SQLite, so early Caffe, the data package is leveldb responsible.

In the latest version of Caffe, the Lmdb is widely replaced. It sacrificed a bit more memory space for caching, but it brought better than LEVELDB.

The I/O bandwidth. In today's memory devaluation, cabbage price era, Lmdb is obviously popular. After all, I/O bandwidth is too important for parallel programming.

1.5 Logging and debugging

Detailed logging of the execution of the program, which is the basic practice of industrial-level code.

Because once the program crashes, as a supervisor, you can not only quickly locate the location, find the reason, but also to catch the hidden behind the "pot-man."

Caffe, almost all over the logging code, Glog simple and easy to use, the work must be done.

Glog log records are divided into four levels, INFO, WARNING, ERROR, FATAL.

You can specify the level of log output you want through google::setlogdestination, and these four levels are interesting.

The info level is usually some execution process information, and the warning level is where you need to be aware, but it doesn't really work much.

The error level is a serious error, but does not terminate the program, and the author cannot determine what will happen next.

The fatal level is a fatal error and must terminate the program, which is an indisputable fact.

The implementation of the fatal-level log is essentially encapsulated with the assert (assert) functionality provided by C + +.

The most common means during the writing of the Assert code, Glog uses assert, and defines some condition check macros, such as:

CHECK, CHECK_EQ,CHECK_LT, these macros can simplify the logic of the program, to facilitate debugging.

After all, a rigorous procedure that requires the elimination of any outside circumstances. Any assessment of the implementation of the problem should be

Use check macros to assert markup to ensure the rigor of the program.

1.6 C + + Core 1001 Nights

The boost library is one of the most exciting libraries in C + +, not only because it takes up 3G of space and has more than 10,000 referenced header files.

The most important thing is that it provides some very handy features, the most widely used in Caffe, is the four smart pointers:

shared_ptr (Global auto-release), Scoped_ptr (partial auto-release)

WEAK_PTR (multithreaded Secure access), thread_specific_ptr (multithreaded copy pointers)

These pointers make Caffe's code delicate, secure, and flexible.

Secondly, multithreaded programming has always been the battleground of the operating system. The multi-threaded function of modern operating system is generally encapsulated in the operating system kernel.

Some developers will directly use the pthread provided by the Linux kernel for code design, such as Tomas Mikolov Word2vec.

This gives the cross-platform to compile the source code, brings the trouble (many people in order to compile Word2vec, install Linux, or look for virtual machine).

Caffe uses the multi-threaded function of the boost library by default, and as with the QT design concept, the Boost library is the Windows/linux kernel.

Multi-threaded kernel functions are encapsulated in a unified interface. This actually facilitates the porting of Caffe for Windows.

In addition, the multi-threaded solution of the boost library is powerful enough to design large, secure multithreaded applications.

1.7 Mathematical calculations

The function interfaces of Cblas and Cublas are almost identical.

In fact, when designing Cublas, Nvidia was deliberately imitated, easing the code clutter of CPU/GPU hybrid programming.

The Caffe encapsulates some mathematical functions at the bottom, and also implements some advanced mathematical functions.

Do not memorize them, and do not swallowed one-step implementation.

They run through all of Caffe's computational code, so you just have to choose to implement a subset of the functional programming implementations in the early stages of rewriting Caffe.

Simulation of key technology of Caffe (I.)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.