An attempt to use LTT to tune the bluetooth module
1 goalThe basic goal is to use LTT to observe the system running status of the bluetooth module during mass data transmission, and strive to find some factors that affect the system performance and optimize them to improve the performance of the BT module, reduce CPU usage
2. Data collection planThe basic framework is: enable the kernel LTT trace option. Compile and insert the main module of LTT trace as the module. Use the Bluetooth obex_ftp transmission function and a2dp as the collection object of performance data. Use the default parameter after the related module is started. call tracedaemon of LTT, direct the output file of trace to the/tmp directory in the memory to speed up output file writing and minimize the impact and interference of trace on system performance. Ø disable tracedaemon after typical function operations (file transfer/playback of stereo music.
3. Start data collection and analysisUse obex_ftp to connect to the PC and transmit a file about KB. Shows the collected data:
3.1 Data AnalysisAfter careful analysis of the data, we found that the Bluetooth main process generated a total of 22.8 system calls within 133997 seconds of data collection, of which 129275 were gettimeofday system calls, it takes 1657 milliseconds. The frequency of system calls is surprisingly high. As shown in, the first half of the system is idle, and each line is separated by about 100 milliseconds. Enlarge one of the lines, as shown in: it can be seen that the gettimeofday system calls are basically used here. There are about 70 gettimeofday system calls in this cycle, which takes about 1.80 milliseconds in total. Further analysis shows that each gettimeofday system call has an average of about 0.008 milliseconds from entry to exit, and the interval between the two calls is about 0.024 milliseconds.
3.2 code analysisBy searching the code, we found that the gettimeofday system calls were initiated in the sched Scheduling main function of the bluetooth module, it is mainly used to obtain the system time to determine whether the task in the task list has a scheduled task time, and needs to be scheduled. There are about 70 tasks in this task list, and the current time will be updated by calling the gettimeofday system before each task is judged cyclically. By reading the gettimeofday system call in the kernel, we found that in our system, the benchmark is based on the system's jiff value, so its accuracy is about 10 ms, if the call is performed 70 times in 1.80 milliseconds, the results will not change (unless cross-border requests occur ), however, the upper-layer scheduling in the bluetooth module does not require a high time precision of 10 ms, So theoretically, if the round-robin can be completed every time it is 1.80 ms, only one gettimeofday call is required. Theoretically, if a gettimeofday system call does not occur, the polling process reduces the time by 8/24 to 33%, that is, to about 1.2 milliseconds. When the BT module is running, many tasks have scheduled tasks to be executed during data transmission. Polling is interrupted by these tasks, and polling may be interrupted by other processes, in this way, the time of a polling process is not fixed. Therefore, after each scheduled task is completed, call gettimeofday to update the current time of the system.
4. backend data collection
4.1 Data AnalysisBased on the above analysis, modify the code and re-collect data to obtain the system call status of each round robin during system idle time. As shown in the following figure, during one round robin, the number of system calls is greatly reduced, the gettimeofday system calls are about 5-7 times. A total of 13641 gettimeofday system calls occurred in a similar operation process and data collection time period, taking 278 milliseconds. The number of scheduling times is reduced by about 90%. The average time consumption of one round-robin is about 0.8 milliseconds on average, which is less than our estimate of 1.2 milliseconds. This is because of the existence of trace, the trace module needs to record more data and consume more CPU time due to more system calls in the worker, the difference of 0.4 milliseconds should be attributed to the time saved by the trace module. Therefore, if there is no trace interference before the code is modified, it is estimated that a round-robin can be completed in 1.1-1.2 milliseconds, and 0.7-0.8 milliseconds after the code is modified. The improvement ratio is basically consistent with our estimation: when the system is idle, 35% of the time is saved. When the system is idle for 100 milliseconds, only one round robin saves 0.5-0.6 milliseconds, which is equivalent to saving 0.5% of the CPU usage. During data transmission, when the system is busy, the polling frequency is increased to 5 to 30 milliseconds, so that the actual CPU usage saved may be between 2-10%.
5 a2dp performance comparison before and after modification
5.1 data collection planWhen loading the LTT trace module, the actual performance is always affected. Therefore, when we compare the data of a2dp performance changes before and after the code is modified, the LTT kernel is not enabled. In actual tests, a2dp is used to play stereo music of 44.1k and 48k respectively. The music is stored on the T-flash card as WAV Files to reduce the CPU time consumed by reading files. Play the same music twice or more to compare the performance of music data that is not in the memory cache and is already in the memory cache. Use top to view the CPU usage of the system and collect the CPU usage range.
5.2 Data AnalysisA2dp: 44.1 k no cache 31-33% 44.1 K cache 28-33% 48 k no cache 33-36% 48 k cache 29-31% a2dp use the modified Code: 44.1 k no cache 28-30% 44.1 K cache 25-28% 48 k no cache 28-32% 48 k cache 25-27%: the absolute CPU usage time is reduced by 3-5%, in relative proportion, the CPU usage is reduced by 10-15%. Match the previous prediction.