Original: http://www.cnblogs.com/zhiranok/archive/2012/05/13/cpp_multi_thread.html
In the development of C + + programs, generally in the throughput, concurrency, real-time high requirements. When designing a C + + program, the following points can be summed up to improve efficiency:
- L Concurrency
- L Async
- L Cache
The following will be my usual work encountered some problems twos, the design of the idea is nothing more than three points.
1 Task Queue 1.1 design task queue with producer-consumer model
The producer-consumer model is a very familiar model, such as in a server program, when the user data is modified by the logical module, it generates a task of updating the database (produce), to the IO Module task queue, The IO module pulls the task execution SQL operation (consume) from the task queue.
To design a common task queue, the sample code is as follows:
Detailed implementations can be found in:
Http://ffown.googlecode.com/svn/trunk/fflib/include/detail/task_queue_impl.h
void
task_queue_t::produce(
const task_t& task_) {
lock_guard_t lock(m_mutex);
if
(m_tasklist->empty()){
//! 条件满足唤醒等待线程
m_cond.
signal
();
}
m_tasklist->push_back(task_);
}
int
task_queue_t::comsume(task_t& task_){
lock_guard_t lock(m_mutex);
while
(m_tasklist->empty())
//! 当没有作业时,就等待直到条件满足被唤醒{
if
(
false
== m_flag){
return
-1;
}
m_cond.wait();
}
task_ = m_tasklist->front();
m_tasklist->pop_front();
return
0;
}
|
1.2 Task Queue Usage tips 1.2.1 IO vs. logic separation
For example, in a network game server program, the network module receives a message packet, which is returned immediately after it is posted to the logical layer, continuing to accept the next message packet. Logical threads operate in an environment that does not have an IO operation to guarantee real-time performance. Example:
void handle_xx_msg( long uid, const xx_msg_t& msg){ logic_task_queue->post(boost::bind(&servie_t::proces, uid, msg)); } |
Note that this mode is a single-task queue, with each task queue single-threaded.
1.2.2 Parallel pipelining
The above only completes the parallel of IO and CPU operation, while the logical operation in CPU is serial. In some cases, the CPU logic operation part can also be implemented in parallel, such as in-game User a vegetables and b vegetables two operations can be completely parallel, because two operations do not share data. The simplest way is that A and B related operations are assigned to different task queues. Examples are as follows:
void handle_xx_msg( long uid, const xx_msg_t& msg) { logic_task_queue_array[uid % sizeof (logic_task_queue_array)]->post( boost::bind(&servie_t::proces, uid, msg)); } |
Note that this mode is a multi-task queue, with one single thread per task queue.
1.2.3 Connection pooling and asynchronous callbacks
For example, the logic service module needs the database module to load the user data asynchronously, and do the subsequent processing calculation. While the database module has a connection pool with a fixed number of connections, when the task of executing SQL arrives, select an idle connection, execute the SQL, and pass SQL through the callback function to the logical layer. The steps are as follows:
- N pre-allocating the thread pool, each creating a connection to the database
- n Create a task queue for the database module, all of which are consumers of this task queue
- n logical layer wants the database module to post SQL to perform tasks while passing a callback function to accept SQL execution results
Examples are as follows:
void db_t:load ( long uid_, boost::function< Code class= "CPP keyword bold" >void (user_data_t&) func_) { &NBSP;&NBSP;&NBSP;&NBSP; //! SQL Execute, construct user_data_t user< /code>&NBSP;&NBSP;&NBSP;&NBSP; func_ (user) void process_user_ Data_loaded (user_data_t&) { &NBSP;&NBSP;&NBSP;&NBSP; //! todo something } db_task_queue->post (Boost::bind (&db_t:load, UID, func)); |
Note that this mode is a single-task queue, with each task queue multithreaded.
2. Log
This article mainly talk about C + + multithreaded programming, the log system is not to improve the efficiency of the program, but in the program debugging, run the wrong time, the log is an irreplaceable tool, I believe that the development of background program friends will use the log. There are several common ways to use logs:
- n-flow, e.g. LogStream << "start Servie time[%d" "<< time (0) <<" app name[%s "<< app_string.c_str () &l t;< Endl;
- n Printf format such as: Logtrace (Log_module, "Start Servie time[%d] app name[%s]", Time (0), app_string.c_str ());
Both have advantages and disadvantages, streaming is thread-safe, printf format format string will be more direct, but the disadvantage is that thread is unsafe, if the APP_STRING.C_STR () replaced by app_string (std::string), the compilation is passed, But the run time will be crash (if the luck is good every time crash, bad luck occasionally will crash). I personally love the printf style and can do the following improvements:
- Increased thread safety, using the traits mechanism of C + + templates, enables thread safety. Example:
template < typename ARG1> void logtrace( const char * module, const char * fmt, ARG1 arg1){ boost::format s(fmt); f % arg1; } |
This way, except for standard type +std::string incoming other types will compile without passing. Here is an example of a parameter that can be overloaded to support more parameters, and can support 9 parameters or more if you wish.
- L add color to the log, add control characters in printf, can display color on the screen terminal, linux example: printf ("\033[32;49;1m [done] \033[39;49;0m")
For more color schemes see:
Http://hi.baidu.com/jiemnij/blog/item/d95df8c28ac2815cb219a80e.html
- When each thread starts, it should use the log to print what function the thread is responsible for. This way, when the program runs up through the top–h–p PID can tell how much the function uses the CPU. In fact, every line of my log prints the thread ID, which is not pthread_id, but is actually the process ID number of the system assigned by the thread.
3. Performance monitoring
Although there are many tools that can analyze the performance of C + + programs, most of them run in the program debug phase. We need a means to monitor the program in both the debug and release phases, on the one hand, the bottleneck of the program, and the early detection of which components in the run-time anomalies.
It is common to use gettimeofday to calculate a function cost that can be precise to subtle. Can take advantage of the deterministic destructor of C + +, it is very convenient to implement a small tool to get the function cost, such as the following
struct profiler{ Profiler (const char* func_name) { gettimeofday (&TV, NULL); m_func_name=func_name; } ~profiler () { struct timeval tv2; Gettimeofday (&TV2, NULL); Long cost = (tv.tv_sec-tv.tv_sec) * 1000000 + (tv.tv_usec-tv.tv_usec); //! Post to some manager } struct Timeval TV; const char * m_func_name;}; #define Profiler () Profiler ____profiler_instance# #__LINE__ (__function__)
Cost should be posted to the performance statistics manager, which regularly outputs the performance statistics to a file.
4 LAMBDA programming using foreach instead of iterators
Many programming languages have built-in foreach, but C + + has not. It is recommended that you write the foreach function where you need to traverse the container. People who are accustomed to functional programming should be very fond of using foreach, and some of the benefits of using foreach are some, such as:
Http://www.cnblogs.com/chsword/archive/2007/09/28/910011.html
But mainly in the aspect of programming philosophy.
Example:
void user_mgr_t::foreach(boost::function< void (user_t&)> func_){ for (iterator it = m_users.begin(); it != m_users.end() ++it){ func_(it->second); } } |
For example, to implement the dump interface, you do not need to rewrite the code about the iterator
void
user_mgr_t:dump(){
struct
lambda {
static
void
print(user_t& user){
//! print(tostring(user);
}
};
this
->foreach(lambda::print);
}
|
In fact, the above code is a workaround to generate anonymous functions, if it is the C + + 11 standard compiler, can write more concise:
This->foreach ([] (user_t& user) {});
But most of the time I've written a program that runs on CentOS, you know, its GCC version is GCC 4.1.2, so most of the time I use lambda functions in a flexible way.
LAMBDA functions with task queues for asynchronous
The common code to implement async using a task queue is as follows:
void service_t: Async_update_user ( long uid) { Code class= "CPP Spaces" >&NBSP;&NBSP;&NBSP;&NBSP; task_queue->post (Boost::bind ( &service_t:sync_update_user_impl, this , UID)); } void Service_t:sync_update_user_impl ( long UID) { &NBSP;&NBSP;&NBSP;&NBSP; user_t& user = Get_user (UID); &NBSP;&NBSP;&NBSP;&NBSP; user.update () |
The disadvantage of this is that an interface to respond to the write two-pass function, if the parameters of a function has changed, then another parameter should be changed. And the code is not very beautiful either. Using lambda can make async look more intuitive, as if it were done immediately in an interface function. Example code:
void
service_t:async_update_user(
long
uid){
struct
lambda {
static
void
update_user_impl(service_t* servie,
long uid){
user_t& user = servie->get_user(uid);
user.update();
}
};
task_queue->post(boost::bind(&lambda:update_user_impl,
this
, uid));
}
|
This makes it very straightforward to modify the code directly within the interface when you want to change the interface.
5. Artifice uses shared_ptr to achieve map/reduce
The semantics of Map/reduce is that the task is divided into multiple tasks, delivered to multiple workers and executed concurrently, resulting in a result that is aggregated by reduce to produce the final result. What is the semantics of shared_ptr? When the last shared_ptr is destructor, the destructor for the managed object is called. The semantics and map/reduce processes are very similar. We just need to make our own request to divide multiple tasks. The sample procedure is as follows:
- L Define a request managed object, add the number of times we need to search for "oh nice" strings in 10 files, define the managed structure as follows:
struct
reducer{
void
set_result(
int
index,
long result) {
m_result[index] = result;
}
~reducer(){
long
total = 0;
for
(
int i = 0; i <
sizeof
(m_result); ++i){
total += m_result[i];
}
//! post total to somewhere
}
long
m_result[10];
};
|
- L DEFINE the worker who performs the task
void worker_t:exe( int index_, shared_ptr<reducer> ret) { ret->set_result(index, 100); } |
- L post a task to a different worker
shared_ptr<reducer> ret( new reducer()); for ( int i = 0; i < 10; ++i) { task_queue[i]->post(boost::bind(&worker_t:exe, i, ret)); }
|
Reproduced Summary of C + + multithreading programming