Windows CE system-level log Design

Source: Internet
Author: User
In my experience, reliable products are not designed, but are gradually achieved through continuous bug fixing. Of course, I am not saying that design is not important, I want to emphasize that it is necessary to provide some facilities in the product to help diagnose problems. In this regard, the most common means is system logs. The embedded system consists of customized hardware platforms, customized drivers, customized system components, and customized applications. It is not easy to integrate so many custom components in a short time. Any part of them may have problems, so the application-level logging module is obviously insufficient to record enough information, and we need a system-level solution. On the Windows CE platform, the most useful log should undoubtedly be the debugging information of CE's built-in serial port output. Its biggest advantage is basically all the modules of CE platform, from boot loader to oal, from driver to application, debugging information can be easily output to the serial port. Basically, you want me to diagnose a Windows CE problem. The first thing I want to know is whether you have the debugging information output by the serial port. If yes, the problem may be solved in half. However, in actual use, it is very difficult to obtain the serial port output when a problem occurs, because: 1, some devices may not provide the serial port at all; 2, few people connect to the computer with a serial port when using the device, and drive hypterterminal; 3. Some errors are not easy to reproduce. Even if your device has a serial port output, you also have a serial line, there is still a computer that can be connected, but you cannot reproduce the error.
In this article, I want to talk about how to create a system-level log module, which can capture all the debugging information output to the serial port to the log file. It sounds like it is not difficult to do this. Basically, you need to re-write oemwritedebugstring () in BSP and redirect the string originally output to the serial port to the file. Yes, that's exactly what it is. But to do so, there are many technical issues that must be taken into account. There are two main difficulties:

1. The storage of log files involves the file system. This brings about three problems. First, the file system can be accessed only after the kernel is started and the driver related to the file system is loaded. In the boot loader boot phase and the kernel initialization phase, the file system cannot be used. How can I solve the log storage problem? Second, write operations on files are generally slow, which has a serious impact on the system. Third, file access depends on the driver stack of the file system (including file system drivers, partition management drivers, and block device drivers), and involves context switching between processes and threads, compared with the serial output, the complexity is increased by more than one order of magnitude, so the reliability is greatly reduced.

2. Any module in the system may output debugging information in any status. What does it mean? This means: First, you must be very careful. If the output of one module cannot block the output of another module, the synchronization lock during the output is required, but the code in the locked state must be very reliable and fast. 2. Some outputs are very special, such as debugging information output from the interrupt processing routine. At this time, all hardware interruptions and thread schedulers are in the prohibited state, that is to say, all APIs that may cause blocking cannot be used. For example, you may want to cache the string in the memory in oemwritedebugstring (), and then use setevent to trigger a thread waiting for this event to write it into the log file, here, setevent will suspend the system-because the thread Scheduler cannot work at this time.

The following describes the design points:
1. The reliability and performance of the interface provided by the Log Module to other modules for output information (that is, the implementation of oemwritedebugstring) are very important. Because all modules in the system are dependent on it. In systems with real-time requirements, this interface must also meet real-time requirements. This means that the time required for calling the interface from the call to the return must be limited theoretically. But for file systems, these three points are difficult to meet. Therefore, it is necessary to cache logs. The basic idea is that oemwritedebugstring does not involve any file-related operations, but caches logs in the memory. The actual operations on Log File writing are completed by a separate thread. This is similar to CE's interrupt processing method, which is divided into ISR and ist: oemwritedebugstring is ISR, and the log file writing thread is ist. This also solves the problem of how to store the log output before the file system is available.

2. How to allocate and access the cache? A simple practice is to keep a fixed area in the memory and access it directly in the boot loader stage. In the kernel stage, this memory is not managed by the kernel's memory subsystem. Logging ISR (that is, oemwritedebugstring) can be accessed directly. In the IST, setkmode API (ce5) can be used to access the kernel state, in ce6, the IST runs in the kernel state and can also be accessed directly. If you use other methods, such as dynamic memory allocation, you need to consider how to read the logs output in the boot loader phase in the kernel phase, and how to access this cache in the IST, be especially careful with address aliasing.
3. How to obtain the time stamp. Time stamp is very necessary for the log system. Generally, the time stamp of the application-level log module can be obtained using Windows APIs such as getsystemtime. But it is obviously unavailable here, because it does not exist at all in the boot loader stage. In the kernel stage, ISR calls may cause system suspension. Although it is okay to call in IST, it is too late. My approach is to use gettickcount to record the time, because the tick count mechanism is generally implemented by the CPU, which is simple, reliable, and fast. In ist, use getsystemtime to get the current tick count at the same time, and get the real timestamp compared with the tick count recorded when the log is generated.

4. How does logging ISR notify the IST to process log information. The simple idea is to wait for an event in the IST and trigger it with setevent In the ISR. The problem is that, as described above, when you call setevent to process logs from interrupt call routines, the system will be suspended. My solution is to assign a logical interrupt number to the log module, use nksetinterruptevent in oemwritedebugstring to trigger the interrupt, and process (Hook interrupt, wait for the interrupt to trigger, etc.) in the ist) the implementation is the same as that of the CE standard ist.
5. Implementation of oemwritedebugstring. This is the most critical part of the entire Log Module. Its implementation must meet the following requirements: high reliability (the thread or interrupt processing routine cannot be suspended in any situation), high performance (the time overhead must be as small as possible) real-time (with a clear time limit) and completely reentrant (multiple modules can be called at the same time ).

6. How to synchronize access to the cache. In the boot loader stage, no synchronization is required unless you enable hardware interruption and output log information in the interrupt response code, because Boot Loader itself is a single-task system, there is no concurrent access to the cache. The kernel running stage is different. Windows CE's kernel is a multi-task system, and any module may call oemwritedebugstring output information (write access) at any time ), logging ist may read information (read access) from the cache at any time. Therefore, the log module is a multi-writer, single-reader system that must synchronize access to the cache. Which methods can be selected?

Method A. As we all know, a typical practice in application-level development is to use kernel objects such as mutex and event to synchronize writer and reader. The problem is that our writer is very special (oemwritedebugstring). Its callers include not only common threads, but also interrupt response routines (ISR) and exception handling routines (data/prefetch abort handler), when processing these special routines, the CPU is not allowed to switch to other context for execution. In addition, a deadlock may occur even if these kernel objects are used for synchronization in a normal thread context. For example, if you have an application that depends on a DLL, This dll will output some debugging information (via retailmsg or outputdebugstring) in dllmain during loading, prompting it to be loaded. This is a common situation, but you may not realize this will cause a deadlock. The problem is that dllmain is executed under the lock protection of a process-level (Process Wide). In other words, when dllmain is called, all other threads in the process are blocked, if a thread also outputs a line of debugging information before being suspended and obtains the write access lock of oemwritedebugstring, a deadlock will occur, this is because the loader lock obtained by dllmain blocks other threads of the process, and the writer lock of oemwritedebugstring that needs to be obtained when it outputs debugging information is obtained by other threads of the process, this thread will never be awakened-this is a typical lock order inversion. For DLL Loading Locks, see Microsoft's White Paper: Best practices for creating DLLs. Simply put, conventional kernel object methods won't work.

Method B, the key problem is that the implementation of oemwritedebugstring requires synchronization and cannot introduce blocking operations. You can consider putting it under the protection of syscall. The cost is that when the oemwritedebugstring is executed, the interruption cannot respond, the thread Scheduler cannot work either, which affects the interrupt response performance and Thread Scheduling Performance of the entire system, thus affecting the real-time performance of the system.

Method C, you can also consider splitting the cache by writer. For example, you can divide the cache into 1000 blocks, and each writer occupies one block. That is to say, the number of threads running simultaneously in the system plus ISR, data abort handler, and prefectch abort handler cannot exceed 1000, so that writer will not affect each other during write access. The disadvantage is that the next information cannot be output before the last output of the same writer is processed by the reader. Worse, this design indirectly introduces the dependency of the output interface on the file system, which seriously reduces the system reliability and is not feasible.

Method d. This is my actual method. Although the general Kernel Object-based locks cannot be used, we are lucky to have a simple lock that will not cause blocking, that is, interlocked APIs (interlockedexchange, interlockedincrement, etc ). Although interlocked APIs can only guarantee the atomicity of operations on 32-bit simple numeric variables, they cannot guarantee the atomicity of code snippet execution. But it has more important advantages. It is very simple, reliable, and efficient, it will not cause blocking (it is worth mentioning how the implementation of interlocked APIs ensures the atomicity of operations without blocking other threads. I will discuss it separately later ). How can we use these APIs to implement the multi-writer/Single-reader model we need? My idea is that because logs come in chronological order, the cache block can be constructed into a FIFO round queue. How to allocate space for each log? A simple way is to divide the cache into N small pieces in advance, and each piece of information occupies one piece. The advantage of this is that the processing logic is simple and fast, and the allocation of space for log information can be simplified to applying for an idle block index (using interlocked API to ensure the atomicity of applying for index operations ), real cache log operations do not need to be locked. Because information blocks are evenly divided, the INDEX request to be locked has nothing to do with the length of log information, so it is the efficiency of O (1. Because logs are stored in the queue, only two index values (the index of the first allocated block and the index of the last allocated block) can be used to define the allocated block and idle block, processing is very simple. The disadvantage is that too long information has to be truncated. This problem can be solved by setting the block size appropriately and specifying the maximum log length in the interface specification, because the main purpose of a log is to let the user know the status of the system during running, if necessary, the length can be reduced by reorganizing the output information.

7. Determine the cache and block size. This may require you to make an observation of your system to see how large the log output is under normal circumstances, and then add some buffering. How much space does each block need? A normal log line contains dozens of characters and rarely contains more than 100 characters. Therefore, 256 bytes should be sufficient. If the system memory is small, 128 bytes should be enough. If 1000 logs are stored at the same time, the total cache is about 256 kb.

8. Auxiliary log information. When a log is generated, recording its generation time, thread ID, and Teb is also helpful for diagnosing the problem. For example, as mentioned in "how to diagnose Windows CE application crash", when an application crashes, the CE kernel will output Relevant crash information (data abort/prefetch abort ), it contains the Teb value of the crashed thread. If the thread has some log output before, it can easily match the crashed thread with the source code.
9. What should I do if the log generation speed is too fast and the reader cannot process it, leading to cache overflow? Obviously, you can only lose some logs. Is the problem the latest or the oldest one? I tend to discard the latest one, because as long as the cache is large enough, this rarely happens. In this case, a thread enters an endless loop and continuously outputs information in it, therefore, it is important to keep the logs generated when an endless loop occurs, rather than the junk logs generated in these endless loops.

10. What is the performance of oemwritedebugstring? First, estimate the efficiency of oemwritedebugstring output to the serial port. Assume that the serial port is set to 11520 baud rate, 8 bit data bit, 1 bit stop bit, no parity check, the transmission rate = 115200/9/1000 = 12.8 byte/MS, the time spent to transmit a byte is: 10 ^ 6/12800 = 78 US/byte. This is the theoretical maximum value. In practice, there is also the call overhead of oemwritedebugstring. In my system (CPU clock speed is 520 MHz, CPU system bus is 208 MHz, and external BUS (SDRAM) 104 MHz). The actual test value is 93 US/byte. It is forbidden to output data to the serial port in oemwritedebugstring. The call overhead of oemwritedebugstring can be measured, which is about 5 us. That is to say, a line of logs is output using retailmsg, which takes about 5 ~ 6 microseconds. This is a very efficient interface. If your CPU supports a microsecond timer, you can use it to diagnose high-speed modules, such as network interface chip data transmission problems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.