Conflicts in Linux and Their Countermeasures

Source: Internet
Author: User
Article Title: Conflict in Linux and its countermeasures. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
The stability of Linux systems has become a good weapon for many critics who oppose conflicting Windows systems. However, although there are few conflicts in Linux systems, once unexpected circumstances occur, it is easy to make people into trouble. It is very important to learn some common means to prevent these conflicts. It can help Linux system administrators avoid these dilemmas.
  
In an interview on this site, Mark Wilding and Dan Behman provided a simple and clear method for preventing and repairing Linux system conflicts. They both published a new book, Self-Service Linux: Mastering the Art of Problem Determination.
  
It is generally believed that there is no conflict in the Linux server system, but there are some conflicts and stagnation problems in the system. What is the difference between the application software-level conflicts or stagnation and the kernel-level issues?
  
Mark Wilding: application-level conflicts or stagnation are limited to a specific thread or process. This type of conflict or stagnation does not cause conflicts or stagnation of other threads or processes running on the same system. However, if it occurs at the kernel level, it will affect all processes running in the system.
  
   What is the difference between system conflicts and stagnation?
  
Dan Behman: at any level, the attributes of conflict and stagnation are basically the same. The process or thread has to wait because of some locking or busy hardware resources. Waiting for some locks or resources often happens, but the system will be stuck only when such locks or resources cannot be implemented.
  
It is also important to note that the problem of stagnation can be diagnosed early. I mean, for example, a resource is very busy at a specific time. It takes a long time for a process or thread to wait for such a resource, until the resource is idle. The user often does not know the busy status of resources, but only sees the process waiting, so he thinks the system is stuck, but the system is still following the established workflow at this time, but the speed is slow.
  
The system conflicts are different from the above-mentioned stagnation. They are mainly caused by some unknown hardware or software errors. When such an error occurs, a special error handler will probably call the diagnostic information and reports to track the cause of the error.
  
A conflict problem can be seen as a fatal problem. It can be analyzed only after it is completed. The stagnation problem can be seen as a real-time problem, which can be analyzed and solved in real time.
  
I know that the biggest advantage of Linux is its open source code. In addition, is there another reason why Linux is easier to solve than other operating systems?
  
Behman: With the openness of such source code, there are a considerable number of reference files at every level of the Linux system. At the same time, since the source code is open, its development team is also open. In this way, you can turn to the Linux kernel developers for help, including the first developers and even Linus Torvalds, all these help programs only send an email. As far as I know, this capability of Linux is missing from operating systems that do not open source code.
  
   What are the difficulties and challenges in dealing with stagnant issues?
  
Wilding: there are multiple causes of application stagnation, including those that may be caused by kernel space problems. This means that sometimes these problems are not controlled by developers. However, this is the advantage of Linux. All source code is open, so if you encounter kernel blocking of a process, you can contact its source code to see how the process works in the kernel. However, in most cases, there is no need for such in-depth research. What are the causes of process stagnation? Application Software developers need to carefully study the status and evidence at the software level (such as the stack path ).
  
For users or maintenance personnel, they generally do not know the specific working procedures of the application software, nor can they enter the source code level for testing. This is a problem of system stagnation that can be flexibly handled. For example, in some cases, process A is waiting for the resources released after process B ends, and process B is waiting for the resources occupied by process. This is the so-called "deadlock", which is also a common problem in complex application software and can be used as a diagnostic solution for stagnant problems.
  
If you do not know the specific reason for waiting for process A and process B, you do not even need to know whether it is A "deadlock", but you have no choice to close the two processes and restart them. This is a similar situation. Therefore, it is very important for the application software to track full resources and locks, which can help solve this difficult problem.
  
Behman: another challenge to the stagnation issue is that when a stagnation problem occurs, the process or thread often does not know whether it is stuck or when it will be stuck. This situation is different from the conflict issue. When a conflict occurs, the process can intercept most of the signals, and the signal processing program can be added to the platform system to handle these special situations, for example, memory cleaning and stack tracing. However, when a stagnant problem occurs, such special processing programs are not completely impossible, but they are often flexible and not fixed.
  
When a stagnant problem occurs, the system or application software is often restarted. One thing to remember is that when a stagnant problem occurs, some information and evidence for diagnosing the problem is often captured by the active kernel and application software. If you do not collect such important information and restart immediately, you will never know how to diagnose the problem, and thus it is impossible to prevent it from happening again in the future.
  
For some important environments, the stability and reliability of the system are closely linked with the problem diagnosis and solution speed. Therefore, we need to adhere to a reasonable idea, that is, "first collect error information and then restart ".
  
   Compared with the conflict issue, what is the first thing to do when there is a stagnant problem?
  
Behman: the stagnation at the processing kernel level is very different from the stagnation at the processing application software level.
  
If you are asking about the application software layer. When a conflict occurs, there is a special function called "signal processing" to call and process a variety of information, such as information in the memory and stack trace feedback. In general, when a conflict occurs, the primary problem is to collect, sort, and analyze the data.
  
In the case of a stagnant problem, such data is not automatically collected, which is often a manual operation process. The two key points for collecting stagnant state data are tracing output results and stack tracing feedback. This method of tracing the output results can obtain information about the role of the process because it is constantly monitoring the process, such as whether the process is still working. The trace feedback of the stack can provide the source code of the current process. This is very important for developers, so that they can study the causes of the stuck process.
  
   What are the main causes of conflicts and stagnation?
  
Wilding: for conflicting problems, we can divide the main reasons into two types: Preventive and error handling. A preventive conflict occurs when the kernel or application software encounters a severe situation. The software is aware of this problem and generates a "suicide" method to prevent the further occurrence of the error, so as to avoid more serious problems. For the error handling type conflict, it means that some illegal content in the memory enters, almost all of which are program errors. In this case, the hardware detects the application and sends a signal to stop the process of the software.
  
There are two possible causes for the stagnation problem. One is the process or thread waiting for resources, which cannot be solved. Other processes or threads constrain the resource (for example, locking), so that the process or thread occupies the resource while waiting, so that other processes or threads can only wait. One example is that a process locks important resources and receives Internet information without any intention. The common cause of the second is a "dependent return-type" Wait. Two or more processes are waiting for each other's resources, causing a "deadlock ". In this case, you can release a lock or share the memory in a space.
  
   In these conflicts and stagnant situations, what basic investigation and research rules can managers apply?
  
Wilding: one of the best basic principles is organizational participation. It is very important to put the collected data in a specific place with rules so that it can be easily found in the future. This is especially useful for situations that encounter multiple problems at the same time.
  
Behman: another basic criterion is to collect data quantitatively, rather than qualitatively. For example, "The system memory usage was low at last night", which is a qualitative observation. This does not play a major role in problem handling. The quantitative version of this example should collect and save all the output data commands and other related diagnostic commands. The purpose is to collect enough data so that you can avoid the problem from happening again. This is the "one-time in-place" method, without the need to repeat the problem, you can obtain complete data only after multiple collection.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.