Independent blog
Proactive debug logging (part 3)
In the previous two sections, we discussed where debugging logs are written and how to classify and classify debugging logs. So far, we have not touched on the core issue: Where should I write debugging logs? How should I write debugging logs? This is the topic of this part.
How to Write debugging logs
The core issue of writing debugging logs is where to write content. When we consider writing logs, our most basic consideration is still the core standard mentioned in the first part: How to facilitate addition, analysis and processing, and then easy to locate errors.
We are now considering what content should be included in the debugging log, so that we can locate the error from various aspects. First of all, there is no doubt that the necessary meta information should be printed, such as the time when the debug log is printed. According to the suggestions in the second part, the level and category of the debug log are also necessary. To facilitate multi-thread trackingProgramProcess ID and thread ID are well-handled. If it is easy for developers to locate the log inCodeIt is also necessary to include the code file, code line, and function name of the statement where the log is printed. Fortunately, under a good logging module, most of these problems can be solved through helper functions or built-in macros (such as _ file __,__ line __, __function _) is automatically obtained and added, so developers do not need to worry too much. Therefore, when developers need to add logs, they only need to worry about the Log Level and the log message string.
Tip 10: include necessary meta information in log entries, such as time, PID/tid, log level and classification, code files, code lines, and functions.
Here we will discuss the time output separately. There are multiple ways to print the time. Generally, we prefer to use standard formats to print time, such as the standard Internet time format specified in rfc822 or the asctime output format, and include necessary time zone information. Using the standard format to print the time can make the log easy to be processed by the script program, because the mainstream script programs all have the corresponding library for time parsing; the necessary time zone information is added to obtain more information about the customer's environment, such as failover and non-standard hour time zones.
Tip 11: The time when the debug log contains the necessary time zone information.
The message string of the log is the most flexible part of the log. Because of the high flexibility, developers may also miss some information, resulting in the inability to effectively analyze the debugging logs collected by users. Based on some of my experiences, I will discuss how to include valid information here.
The first question to consider is the log relevance. For example, a server program using an asynchronous Io model serves a large number of sessions through a limited number of threads, because each thread may serve multiple sessions, each session may be served by different threads. Therefore, you need to add a key value in the log that can associate these logs, log Analysis developers can aggregate operations on the same object for analysis. This key value should be able to uniquely identify the operated object (such as the session object), such as the address of the object, the socket value in the session, business-Related Object ID, such as the Message ID of the email.
Tip 12: The debugging information contains the key values that can distinguish different operation objects, so as to ensure the relevance of logs for operations on the same object.
Another problem is that the execution path of the program can be tracked based on the obtained logs. A simple method is to add appropriate logs to each condition Branch to ensure that each branch is overwritten by logs. Generally, the condition is divided into two categories: one is the branch of error processing, and the other is the branch related to business logic. For the former, it is more common to check the return value after the function call ends. For example:
File * pF = fopen (filename, "R ");
If (null = PF ){
Debuglog (log_error, "Open File % s to read failed. CRT errno = % d", filename, errno );
// Other error handling code
}
Debuglog (log_info, "Open File % s to read succeeded", filename );
In this example, debug logs are printed no matter which branch you enter. In addition, in normal operation, the customer can adjust the log level to an error or higher, unless an error occurs, you do not need to worry about the disk space occupied by a large amount of log output and the efficiency loss. By adding error-level logs to the error processing branch and adding information or debugging-level logs to the normal process, you can effectively track the program execution path before an error occurs.
For Conditional branches caused by business logic, you can add logs at the entrance of each branch in a similar way. Ideally, any condition branch statement can be overwritten by the debug log.
Tip 13: add an error-level log to the error handling branch in every error handling place, and add information or debug-level logs to the normal process.
Tip 14: For Conditional branches caused by business logic, add logs at the entrance of each branch to ensure that any branch path is covered by the debugging log.
The above example also shows several other issues that need attention when printing debugging logs. To make the log easy to understand, it is best to let the log be organized according to common syntaxes. For example, in the example, the log is organized according to the syntax "what is done, what is the result, why. On the contrary, if an error occurs when a file is opened, only "cannot open file" is printed. In fact, troubleshooting is useless: The developer who obtains this log cannot know which file has an error, I do not know why the error occurs.
Some common syntaxes are listed here:
- [Operation] [an object] [success/failure], (if it fails, the cause of failure is [a reason])
Example: Open File [D:/test.txt] For read failed. Win32 last error = 2. The system cannot file the file specified.
- Now the [condition] is true and requires [some operation]
Example: Command [test] is not recoginized, will ignore it.
- The system is in a [certain State] (if it is not normal, why)
Example: virus scanning is disabled, invalid activation code is provided.
Example 2: SMTP server status is smtp_ehlo_issued.
- [An object] performs [some operation]
Example: Server thread #1 started.
- [An object] is about to perform [some operation]
Example: will send response [220 ready] To socket 0x1234.
Tip 15: try to ensure the integrity of each log and organize each debugging message according to the common syntax mode.
In the preceding syntaxes, we often need to output the cause of an error. We can use several methods for output: print the error code or convert it into a string that is easy to understand for printing. The former is easier to add to the process, but the latter is easier to view. However, we recommend that you print out the original error code even if you use the latter method, because the error code is more deterministic for developers, it is easier to locate the cause of the error from the code perspective. At the same time, we recommend that you print out "What type of error code is this" when printing error code, because different libraries and codes use different error code systems, in a slightly complex windows program, there are at least CRT errno, Win32 last error, com hresult, ntstatus, Winsock last error, and may also include custom error codes of the application. If "What type of error codes" is not printed, it is easy to confuse these error codes, making it difficult to trace. For example, in the example, "CRT errno = % d" is output. In this way, when analyzing logs, developers can find the corresponding cause in the values of the CRT errno, if "error no = % d" is output, it may cause confusion. For example, 13 indicates access denied in CRT errno and invalid data in Win32 error code.
Tip 16: when the error code is output in the log, the type of the error code is also output.
In multi-threaded programs, different threads may print different log types. For example, the log types of the server thread of the agent server software are obviously different from those of the client thread. When automatic analysis is required using tools such as scripts, the types of threads must be distinguished. At the same time, because the thread life cycle may be shorter than the whole process, the necessary information is output at the beginning and end of the thread entry function, in addition, the log information that can distinguish different threads is provided at the beginning, such as "server listen thread #1 started" or "client listen thread #1 exit with code 0x00.
Tip 17: Add logs at the beginning and end of the thread function, and provide a string that can distinguish thread types.
In some cases, the thread may need to wait for a synchronization object, because the waiting results are diverse and may be caused by other threads calling terminatethread or similar functions, therefore, we recommend that you print a log before waiting for the operation, and then print another log based on the waiting result. The advantage is that, the developer can find out from the log that the thread is terminated abnormally or the waiting result is abnormal. The same recommendation applies to some time-consuming operations, such as when I/O related functions of socket are called. This log can also perform simple performance analysis without a performance profiler.
Tip 18: Add logs for tracking before and after operations that may take a long time to detect exceptions or perform simple performance analysis.
Similar to thread lifecycle tracking, it is best to perform proper tracking for Object Construction and Analysis. We recommend that you add this log information to constructor, copy constructor, destructor, and assignment operator. This tracing information can be used to check memory leakage and analyze it with tools.
Tip 19: Add debugging logs to the object structure, analysis, and copy, which contains the object address or other key values that can mark the object, used to track the object's lifecycle.
For some objects with complex states, you may need to know the object state after each operation during tracking, especially when the object involves complex thread synchronization. Therefore, you may need to provide dump-like operations for some objects and write the dump results into Debugging logs as needed. For protocol-related problems, you can dump data and write logs after receiving the data or before sending the data for more detailed analysis. This type of log may consume a lot of execution time because it may be frequently printed and the data volume may be very large each time. Therefore, we recommend that you use an unusually low level for such logs, at least it should not be higher than the debug or tracing level. If necessary, it can even be a single level.
Tip 20: if necessary, dump of object or protocol data can be provided at a lower debugging level for analysis.
Sometimes the configuration of software products may be very complex, and different configurations often mean different behaviors and different dependencies on the environment. Therefore, to troubleshoot, You can output information related to the user environment in the log, such as the current operating system version and the values of necessary configuration items. If you use this method, do not disclose user privacy, such as password hashing or other private data.
Tip 21: if necessary, output the environment information of the software running in the log, such as the software configuration and operating system information.
Next, we will discuss how to use exceptions. In general, although exceptions are very effective for writing simple programs, they are often difficult to track when errors occur. Exceptions are often passed up along the call stack, but are caught at the outermost layer. In this way, even in the catch on the outermost layer, it is difficult to give an answer to the cause of the error. The C ++ Standard Library provides a small number of predefined exceptions, so try-catch is often the enemy in analyzing debugging logs. To minimize the difficulty of using exceptions to analyze debugging logs, we recommend that you store as detailed information as possible in the throw exception object, which information can be stored based on a complete error log, such as the code line and code file where an exception is thrown, the cause of the exception, and necessary information, for example, the file name cannot be opened. Wherever this exception is caught, the information should be dumped and printed as a debug log.
Obviously, this will greatly increase the complexity of the exception object, but we believe that this complexity is necessary to make debugging logs easier to understand and track, in addition, similar to debugging log printing, we can use some macros or auxiliary functions to simplify programming.
Tip 22: if an exception is used, try to include the complete metadata of an error log in the exception and print it to the debug log at the catch exception.
all the suggestions for writing logs can be easily understood and analyzed by automated programs. Now we have a well-organized log file and how to analyze it. This is the question to be discussed in the next section.