Analysis of kernel log and PRINTK structure

Source: Internet
Author: User
Tags syslog

 
Tekkamanninja acknowledgment: Thanks to the Chinaunix technology community Tekkamanninja provide manuscripts, if necessary, please indicate the source.

This period of time to review the kernel debugging system, pay attention to the implementation of PRINTK and the kernel log related knowledge, here to do a summary.
I. Overview of PRINTKFor the Linux kernel development people, PRINTK is really familiar. Most of the information displayed at the kernel startup is done by her, and most of the time when it comes to kernel-driven debugging, it's enough. One reason why she is so widely used is that she is easy to use, and one of the important reasons is her robustness. It is widely used and can be called almost anywhere within the kernel. You can either call her in the interrupt context, the process up or down, or call her at any time you hold the lock, and you can call her in the SMP system, and the chain doesn't have to be used when calling. This good adaptability comes from her design, a simple "ring buffer" controlled by three pointers.
Note the above: "Almost" can be used anywhere in the kernel. Where is the use of the "problem"? That is, in the early stages of the system boot process, some places before the terminal initialization are available, but all information is cached in the PRINTK's simple ring buffer (ring buffers) until the terminal and console are initialized, until the terminal and console are initialized. All cached information is output.
If you are debugging the starting part of the startup process (such as Setup_arch ()), you can rely on a hardware device (such as a serial port) that can work at this time to communicate with the outside world, using the variant EARLY_PRINTK () function of PRINTK (). She has the ability to print on the terminal at the beginning of the boot process, similar in function to Prink (), except that:

(1) Function name (2) can work earlier (output information) (3) She has her own small cache (typically 512B) (4) One-time output to the hardware device, no longer in the form of ring buffer to retain information.
However, this function is not possible on some architectures, so this approach lacks portability. (most architectures are available, including x86 and arm).
Therefore, we think that PRINTK () can work under any circumstances unless it is to be output at the terminal at the beginning of the startup. This can be seen from the kernel's boot code that the kernel version information is printed through PRINTK shortly after it has entered Start_kernel: PRINTK (kern_notice "%s", Linux_banner);。
second, the use of PRINTK
The main difference in use of printf () in PRINTK () and C libraries is that PRINTK () specifies the log level.
1. Log Level
The kernel determines whether a message is printed on the terminal (console) based on the log level: The kernel displays all messages with a lower level than a specific value on the terminal (console). But all the information is recorded in the "Ring buffer" of PRINTK.
The PRINTK has 8 loglevel, defined in:
#define Kern_emerg "<0>"/* System not available */
#define Kern_alert "<1>"/* need to take immediate action */
#define KERN_CRIT "<2>"/* Critical situation */
#define KERN_ERR "<3>"/* ERROR condition */
#define Kern_warning "<4>"/* Warning SITUATION */
#define KERN_NOTICE "<5>"/* normal, but worth noting * *
#define KERN_INFO "<6>"/* Informational message */
#define KERN_DEBUG "<7>"/* Debug Level information */

/* Use the default kernel log level */
#define Kern_default ""
/*
* Labeled as a "continuous" Log print output line (only for one
* After a closed line is not used). Can only be used to start the initial Core/arch code
* (otherwise, non-SMP security).
*/
#define Kern_cont ""

If you do not specify a log level when you use it, the kernel chooses Default_message_loglevel, which is defined in KERNEL/PRINTK.C:
/* printk ' s without a loglevel use this: */
#define Default_message_loglevel Config_default_message_loglevel can be seen, this level can be specified in the kernel configuration,    This mechanism is starting from 2.6.39, if you do not go special configuration, then the default is <4>, that is, kern_warning.    The kernel Kern_emerg the most important record level to "", and the insignificant debug record level "Kern_debug" is set to "<7>". The kernel uses these macros to specify the log level and the current terminal's record level console_loglevel to determine whether to print to the terminal, using the following example:
PRINTK (Kern_emerg "log_level:%s\n", Kern_emerg);
When the compilation preprocessing is complete, the code in the preceding section is actually compiled into the following format:
PRINTK ("<0>" "log_level:%s\n", Kern_emerg);
to a PRINTK () What log rank is entirely up to you. Messages that are formal and need to be maintained should be given a corresponding log level based on the nature of the information. But the only way to fix a problem is to get it everywhere. Debugging information can be done in two ways:
    • One option is to keep the default record level of the terminal unchanged, giving all debug information Kern Crit or lower to ensure that information is bound to be output.
    • On the other hand, the debug level is Kern for all debugging information, and the default record level of the adjustment terminal is 7, and all debug information can be output.

Both approaches have pros and cons.
Here's the default logging level for tuning the kernel, as described in my 3.5 study note, "Linux Device Driver Learning (2)-Debugging technology," which can be changed by/PROC/SYS/KERNEL/PRINTK, or a C program invoking a syslog system call. But now the GLIBC function interface changed, because the use of the word syslog is too extensive, the name of the function is modified to KLOGCTL, so the original blog code can not be used, the following is the new version of the Console_loglevel code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#define __LIBRARY__/* _syscall3 and Friends is only available through this */
//#include <linux/unistd.h>
/ * Define the system call, to override the library function */
//_syscall3 (int, syslog, int, type, char *, BUFP, int, Len);

int main (int argc, char **argv)
{
int level;
if (argc = = 2) {
Level = Atoi (argv[1]);/* The Chosen console */
} else {
fprintf (stderr, "%s:need a single arg\n", Argv[0]);
exit (1);
      }
if (Klogctl (8, NULL, level) < 0) {
fprintf (stderr, "%s:syslog (setlevel):%s\n",
argv[0], strerror (errno));
Exit (1);
      }
exit (0);
}
2. Related Auxiliary macros
If you determine the level of log required for PRINTK, it seems a bit cumbersome to add the above macro to it each time. So kernel hackers have defined macros to make it easy for PRINTK to use, and these macros are ubiquitous in kernel code:
#ifndef PR_FMT
#define PR_FMT (FMT) FMT
#endif

#define Pr_emerg (FMT, ...) \
PRINTK (Kern_emerg pr_fmt (FMT), # #__VA_ARGS__)
#define Pr_alert (FMT, ...) \
PRINTK (Kern_alert pr_fmt (FMT), # #__VA_ARGS__)
#define PR_CRIT (FMT, ...) \
PRINTK (Kern_crit pr_fmt (FMT), # #__VA_ARGS__)
#define PR_ERR (FMT, ...) \
PRINTK (Kern_err pr_fmt (FMT), # #__VA_ARGS__)
#define Pr_warning (FMT, ...) \
PRINTK (kern_warning pr_fmt (FMT), # #__VA_ARGS__)
#define Pr_warn pr_warning
#define Pr_notice (FMT, ...) \
PRINTK (Kern_notice pr_fmt (FMT), # #__VA_ARGS__)
#define PR_INFO (FMT, ...) \
PRINTK (Kern_info pr_fmt (FMT), # #__VA_ARGS__)
#define Pr_cont (FMT, ...) \
PRINTK (Kern_cont fmt, # #__VA_ARGS__)

/* Pr_devel () does not produce any code unless debug is defined */
#ifdef DEBUG
#define PR_DEVEL (FMT, ...) \
PRINTK (Kern_debug pr_fmt (FMT), # #__VA_ARGS__)
#else
#define PR_DEVEL (FMT, ...) \
NO_PRINTK (Kern_debug pr_fmt (FMT), # #__VA_ARGS__)
#endif

/ * If you are writing a driver, please use dev_dbg */
#if defined (DEBUG)
#define PR_DEBUG (FMT, ...) \
PRINTK (Kern_debug pr_fmt (FMT), # #__VA_ARGS__)
#elif defined (config_dynamic_debug)
/ * DYNAMIC_PR_DEBUG () uses PR_FMT () internally so we don t need it's here */
#define PR_DEBUG (FMT, ...) \
Dynamic_pr_debug (FMT, # #__VA_ARGS__)
#else
#define PR_DEBUG (FMT, ...) \
NO_PRINTK (Kern_debug pr_fmt (FMT), # #__VA_ARGS__)
#endif
From the code above you should be able to know the use of these macros. It is important to note that Pr_devel and Pr_debug macros only produce actual PRINTK code after the debug is defined, which facilitates kernel development: Using these macros in code, when debugging is complete, simply #undef Debug can eliminate the code used by these debugs without really removing the debug output code.

3. Output Rate Control
At the time of debugging, sometimes some parts may printk produce a large amount of output, causing the system to not function properly and may overflow the system log ring buffer (old information is quickly overwritten). In particular, excessive output can slow down the system when using a slow console device, such as a serial port. This makes it difficult to find out where the system has gone wrong.   So you should be very careful: you should not print anything during normal operation, the printed output should be an indication of the exceptions that need attention, and be careful not to overdo it. In some cases, it is best to set a flag variable representation: You have already printed this, and you will no longer print any of this information. The control core for the print rate already provides a ready-made macro:
#define PRINTK_RATELIMIT () __printk_ratelimit (__func__)
This function should be called before you think of printing a message that could be a large number of duplicates, and if the function returns a value other than 0, continue printing your message, or skip it. The typical invocation is as follows:
if (Printk_ratelimit ())
PRINTK (Kern_notice "The printer is still on fire\n");
Printk_ratelimit works by tracking the number and time of messages sent to the console. When the output exceeds a limit, Printk_ratelimit begins to return 0 so that the message is discarded. We can modify it by:
/proc/sys/kern/printk_ratelimit (can be regarded as a monitoring cycle, in this cycle can only emit the following control amount of information)/proc/sys/kernel/printk_ratelimit_burst ( Maximum number of messages in the above period, if more than Printk_ratelimit () returns 0)
To control the output of the message. Other macros, such as printk_ratelimited (FMT, ...) are also defined in. And so on, interested friends will go to the document to see the knowledge, very well understood.
4. Finally, special reminders:
1, although PRINTK is very strong, but read the source code you know, the efficiency of this function is very low: Copy a character copy only one byte at a time, and to call the console output may also produce an interrupt. So if your driver does the performance test or release after the function debugging is done, remember to minimize the PRINTK output, so that only a small amount of information is output when the error occurs. Otherwise, the output of useless information to the console affects performance. I just started to learn to drive the time I made such an idiot mistake, in testing can drive performance when incredibly printk out information to check, the results of direct downtime.
2, PRINTK temporary cache printk_buf only 1 K, all once PRINTK function can only record <1k information to the log buffer, and PRINTK use of "ring buffer"
Third, the kernel implementation of PRINTK
The PRINTK kernel log for Linux is often referred to as kernel ring buffer, because the PRINTK cache implementation is using a simple ring buffer (ring buffers). However, it is important to note that the kernel trace system ring buffer is not confused, although they are to track debugging, but the trace system is more sophisticated ring buffer implementation, and the PRINTK ring buffer is very simple, In fact, a character array is defined:
static Char __log_buf[__log_buf_len];
and use a set of pointers to manage:
/*
* There are no restrictions on log_buf_len when pointing to log_buf-so they
* Must be treated with a mask before use as subscript (remove config_log_buf_shift above) */
static unsigned log_start; /* Index in LOG_BUF: Point to the next character read by Syslog () */
static unsigned con_start; /* Index in LOG_BUF: Point to the next character sent to the console */
static unsigned log_end; /* Index in LOG_BUF: The most recently written character address + 1 */

Specific implementation of the CU has already had a Bo friend wrote, I here is no longer wordy, I reproduced back up a bit: "PRINTK realization Analysis"
Four, user space access to the kernel log
There are two interfaces for user space access and control of the kernel log:
(1) Call the kernel's syslog system call via the Klogctl function interface of GLIBC (2) via the Procfs interface exported in the FS/PROC/KMSG.C kernel module:/proc/kmsg file.    In fact, they finally called the Do_syslog function in/KERNEL/PRINTK.C to implement the access to __LOG_BUF and the modification of related variables. But it's worth noting that: when data is fetched from/PROC/KMSG, the data that is read in __LOG_BUF is no longer retained (that is, the Log_start pointer is modified), but the syslog system calls back the log data and preserves the data to other processes. Reading the/proc file is the default practice for KLOGD. The DMESG command can be used to view the contents of the cache and keep it, in fact, it returns everything in __log_buf to stdout, regardless of whether it has been read or not.
Here I still recommend everyone rtfsc–read the fucking source code, look at these code more than anything strong, I just lead a way here.
There are programs dedicated to logging system logs in user space, collectively referred to as the "syslog daemon". Most of the early and present embedded systems use the KLOGD+SYSLOGD combination, and most distributions now use RSYSLOGD or Syslogd-ng. These user-space programs I do not analyze here, I am not good at, operations may be more clear. All I know is that they roughly call relationships.
Here I use a graph to summarize the overall structure of the kernel PRINTK and log system:

Analysis of kernel log and PRINTK structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.