Kernel module programming (12): concurrency and competition control

Source: Internet
Author: User

This article is one of the Reading Notes of Chapter 5 Concurrency and Race Conditions of LDD3 in Linux Device Drivers, but we are not limited to this content. We are recently working on the business needs of mobile phones, I have a lot of documents and have no time to learn. I would like to recommend a good book, that is, Mr. Qian Mu's political gains and losses of history in China. The book is very thin, but it is rich in content and recommended recommendations.

  

The bug caused by concurrency is a major problem in the OS program and is hard to be found. In fact, this is also true for our applications. I have a java sip as project. In the test of large traffic, there are only 6 9 and 99.9999x %, no matter whether I provide call pressure or reduce it appropriately, the error exception can be tracked. Of course, the exception can be recovered through exception handling, without affecting the next session. 6 or 9 telecom devices are actually very satisfied, but I don't know why 0101 of the computer is faulty. According to the test environment, all the sessions are in the normal process, very strange. It may take more than one year to improve performance (it may take nearly one month to check this problem, and that time is relatively free). Finally, the problem is found to be caused by concurrency, this is not only multi-threaded but also multi-process. In this case, I don't know how many lines of code. The problematic part is actually very easy to handle, that is, adding 1 to index in sequence, setting 0 when reaching a threshold, and restarting, the problem is that the probability of exceeding the threshold is a few thousandths of a number under multi-thread processing. You only need to change the order of the statements before and after the two rows to solve this problem, later, I pressed hundreds of millions of sessions (thanks to the Abacus instruments, ^ _ ^) and there was no error.

  

This is only a problem with better program performance, but if you cannot control concurrency and cause competition, the consequences are very serious, such as malloc memory at the same time, but in fact, only one of the memory can be released at the end, which will lead to memory leakage. Any program with Memory leakage is unstable and cannot be executed for a long time. If the situation is worse, the system crash may be triggered if another address is free. These errors may occur at any time when the program is running and cannot be easily located. Concurrency competition includes the following reasons: 1. Process Execution of multiple user controls may trigger your kernel program in a variety of combinations; 2. Concurrent processing of multiple CPUs; 3. the CPU is heavily occupied during execution and may continue under any circumstances, that is, new triggers may occur at any time during the execution of a function in sequence; 4. For hot swapping devices, the device may no longer exist when the driver is executing a program. Everything may happen at any time, so we should follow some principles to avoid:[Programming philosophy: principles of competition]

I,Avoid sharing resources (data and hardware) in programs as much as possible ),

Competition is caused by sharing resources. As long as there is no shared resources, there is no risk of competition. In our programs,Minimize the use of global data

, Which must be followed in all program development. Recently, I am tracking a bug in Java, passing an object in a constructor of a network connection class, and some States are recorded through a certain attribute of this object, when the network is unstable, it becomes stable, even if I design this connection class to have the automatic recovery function. At first, there was only one TCP connection between the client and the server. No problem occurred in this method. Later, the number of connections between the client and the server was increased to 10. This error is a bug. Of course, this is because the framework design was changed later, and the original code was not so carefully considered. However, it would not happen if we followed this design principle at the beginning. As you can imagine, if we were two different developers at the beginning and later, the search would be huge.

2. When shared resources (data or hardware) can be executed in multiple threads/processes and one thread may not use this resource consecutively, You need to precisely manage the resource acquisition. It is usually done by locking or mutex to ensure that only one thread is executing the shared resource. That is to say:When resources must be shared, ensure that there is only one thread in the continuous operations of resources (such as a function of the program, or processing of one or more segments.

3. An object will always exist as long as there are external references. That is to say, when we release an object, we must ensure that there is no external reference. In Java, automatic collection of garbage is based on this mechanism. In C/C ++, we must plan carefully.

 

Related links:
Kernel module programming (13): semaphore, mutex lock, read/write semaphore, and completion volume

My articles related to the kernel module

My articles related to programming ideas

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.