Summary of bug solving

Source: Internet
Author: User
1. Bugs that interact with other applications
Background:
A mobile music player stores multimedia on an external storage card (sdcard) of the mobile phone ). Therefore, media can be played only when the SD card is mounted to the mobile phone. The music player monitors the SD card status. When the SD card is detached or popped up from its hand, the player saves the scene and stops playing the video. When the SD card is mounted back to the mobile phone, the player resumes the live video and continues playing the video. That is, when the player receives the "SD card eject" message, it stops and continues playing when it receives the "SD card Mount" message. The player can play media in the background.
One file manager is another application that can be used to edit (such as delete) files on the SD card. After the file changes, the third-party application sends a message such as "SD card Mount" to inform other applications and systems that the file has changed.
Problem:
When the player plays media in the background, start the File Manager and delete a file (or a media file being played). At this time, the player stops playing media, when the player is turned on, it is found that the player is playing, but there is no sound.
Analysis: The player does not stop the playing media for no reason. The following are the reasons why the player stops playing: when the user requests to stop the media, the SD card eject and the SD card mount. After debugging and tracking the log information, it is found that none of the above three situations have occurred. The cause of the problem is that the media stops when the SD card is mounted. So why is the recovery on-site operation performed, because the SD card is not mounted by eject or mount. Further debugging and tracking logs show that when the File Manager deletes a file, the "SD card Mount" message is sent to inform other application files of changes.
Solution:
When the cause of the problem is analyzed, it is easy to solve the problem. You only need to check the player before restoring the scene. Previously, the SD card had an eject, now it is the SD card mount. Otherwise, ignore the message.
Lessons learned:
This should be a bug caused by Software defects. The music player should have done such a condition check. The strategy of the music player itself is correct. when no other application sends such a message, it can work completely. This is a typical bug that occurs when interacting with other applications or modules.
The most important clue for such bugs is that problems may occur when operating with other applications. This bug is relatively simple, because it is clearly related to the File Manager. However, in some systems with complex relationships, it is difficult to find out whether the module that represents the problem has a problem or another module has a problem. This requires debugging and tracking logs to narrow down the scope. During debugging, we need to test and exclude modules one by one to narrow down the scope and conduct further investigations.
2. Cause and problem are far away from bugs
Background:
A module is used to process multimedia files on the SD card, parse these files, obtain their meta information, and store them in the media database. It is implemented by Combining JAVA and C ++ with JNI. Java controls the upper-layer process and writes data to the database. c ++ is responsible for parsing files to obtain object metadata. They communicate with each other through JNI.
Problem:
When processing a special file, the Java JVM (Virtual Machine) will exit unexpectedly (JVM abort) and print an error message: "JNI warning: Illegal start byte 0xb1 ".
Analysis:
This is a very serious error because it will cause JVM crash and the process will be killed by the kernel. The only clue is the message "JNI warning: Illegal start byte 0xb1" printed during JVM crash ". by searching this message (thanks to some source code), it is found that it is checkjni within the JVM. c file. This file is an important file in JVM. It checks the validity of all JNI parameters. Especially strings. Because Java uses the modified UTF-8 encoding format, so checkjni. the C file checks the encoding of the passed strings. If the strings are not in a valid modified UTF-8 format, a warning is issued and the JVM is stopped. The problem is found here because the JVM crashes only when it detects invalid strings. So where does an invalid string come from? Which module does it appear in? Because JNI is used in countless places in the system, you cannot blindly search for JNI. Because this occurs when a media scanner scans a media set, the main target is the JNI part of the media scanner. The C ++ part of the media scanner parses multimedia files and then returns the results (usually strings) to the Java layer through JNI. The problem is very likely here, because the multimedia file information has a variety of encoding formats, the meta information may be illegal modified UTF-8 encoding format. After debugging and tracking, we found that there was indeed a problem here.
Solution:
The solution to this problem is still not very good. One method is to encode and convert it, but you need to know the original encoding method of the string. Another simple method is to perform an encoding validity check before passing it to JNI to filter out invalid strings. Finally, the last method is used to solve this problem.
Lessons learned:
First, the information given during system crash and the information given when an error occurs are the first important clues. Although they may not be the real cause of the problem, they can be traced from them. Second, the error message is printed out by the code, so when you do not know which module has an error, you can use the error message to perform a local search for the source code, you can locate the module and source code. Finally, if the problem is caused by a series of reasons, you can solve the problem at the source or at a certain stage, as long as the program does not crash.
3. NULL pointer exception nullpointerexception
Background:
In C/C ++/Java, null pointer exceptions are common causes of program crashes. In C/C ++/Java, if the pointer or object used is not properly initialized, nullpointerexception may easily occur.
Problem:
When nullpointerexception occurs, the program usually crashes due to an exception. However, the stack information for running is usually printed.
Analysis:
From the stack information of the program, you can easily see the problematic code location, so that you can find the direct cause. However, this is a small part of the problem. Specifically, it is not so easy to find the cause of the problem because the object is empty.
Solution:
To solve this problem, you can add a null pointer check. If the pointer or object is empty, no operation will be performed on it. But this is not feasible, and this is not the correct solution. The most direct problem is what should be done when the pointer is null. If NULL pointer detection is performed when a class is referenced elsewhere, but not in this place, you can add NULL pointer detection as in other places. However, if this is not the case, we should take a good look at why the pointer is null, rather than processing the NULL pointer. However, this is usually difficult because it is difficult to trace where the object comes from, where it is modified and referenced, and where it is initialized. The problem can be completely solved only by finding the reason for truly leaving the object empty. However, if a complicated system involves many references and multiple threads, tracing will be more difficult.
Lessons learned:
For null pointers, you cannot simply add a condition. We need to further investigate what causes null pointers. Unless you have reason to add the conditions.
4. Unsolved Problems
Case 1:
Background:
Some problems are extremely strange, and the probability of occurrence is very small, but they still appear, but no suitable solution can be found.
Problem:
In a GUI system, a nullpointerexception is reported in a basic class. This class will be used by many Gui-related applications.
Analysis:
Locate the code location where an exception occurs based on the stack information printed when the program exits. Surprisingly, this row cannot have a null pointer, because it uses basic data types. Nullpointer cannot occur within dozens of rows.
Solution:
No solution is found for this problem.
Case 2:
Background:
Perform large-scale random stress tests on a system. The message class of an object is a final class, And the tostring () method is overloaded. It prints information in the following format: "{What = xxx When = xxx xxxxxx }"
Problem:
During a test, the system restarts automatically due to abnormal exit of the system's core process.
Analysis:
The cause is that a runtimeexception occurs in the core process and a message is displayed: "[c0x44bc: this message is already in use .". Find the code at the exit position from the stack and find that the program detects unreasonable operations and then throws a runtimeexception: the code is as follows:
//...
If (msg. When! = 0 ){
Throw new runtimeexception (MSG + "this message is already in use .");
}
//...
MSG is a message object.
Generally, the + method calls the tostring () method of the object, and the output of the tostring () method has a specific format. Therefore, it is very confusing. Because the final printed message is very different from the tostring () of the object. From the log information, the current object should be a char array rather than a message object, but such a char array cannot be found in the relevant context.
Solution:
It is suspected that the memory of the object has been damaged, and the data in the object is no longer the object itself. This is also an unsolved problem.
Lessons learned:
Such problems are real problems. It is necessary to be proficient in the language and system.
5. problems that are difficult to reproduce
Background:
A gallery application can display many images in a grid. When there are many images, a slider will appear on the left to scroll the screen. Normally, when an application is opened, the slider should be at the top of the page.
Problem:
Someone reported that the slider was not at the top, but in the middle, or somewhere else when the app was initially opened. And claim that this is a must-have problem.
Analysis:
However, this problem cannot be reproduced during debugging. Although this is not a serious problem, it seems that such behavior is very strange. During the debugging process, we found that the order of the tester's application and the image displayed by my application seemed to be somewhat different, which is the opposite. There is a configurable option under control, and the option is set to positive or reverse. It is found that the tester uses reverse order at the time, and usually uses positive order in most cases (probably because no one sets this item, it defaults to positive order ). This may be the cause of the problem. Indeed, when the order is set to descending order, the slider position will not be placed on top.
Solution:
After finding the recurrence of the problem, it is easy to solve this problem. It is found that when it is set to reverse order, the Code is intended to put the slider at the end, but there are some errors in the calculation, mistakenly regard the height of a screen window as the length of the entire document. Therefore, when the length of the entire document exceeds one screen, the slider position is incorrect. As it is more common to place the slider at the top of the page, you can place the slider at the top of the page regardless of the sorting order.
Lessons learned:
In fact, few problems are true low probability events (seldom). The true seldom issue occurs only when multiple threads are involved, because the thread execution sequence cannot be determined. For other problems, we should have not found any recurrence rules. After believing in this fact, we will continue to experiment and make assumptions to reproduce the problem. During the experiment, pay attention to the details, because many details may be an important clue. The most important thing is to believe that the problem exists and can be reproduced and solved.
There are some methods and techniques that can be used to reproduce seemingly hard-to-reproduce problems:
1. Check the related operations and logs when the problem occurs to check whether necessary prerequisites and operations are missing.
2. If interaction with other applications or modules is involved, you need to understand the features of each application and module, make appropriate assumptions, and then perform experiments.
3. Guess the cause of the problem and create these conditions to see if the problem can be reproduced. For example, if it is caused by the length of time or the file size, you can increase the time and use large files for testing. If we guess it is a null pointer problem, we can create a null pointer.
4. Believe that the problem exists, believe that the problem can be reproduced, and believe that the problem can be fixed.
5. Be patient, think constantly, assume, then verify, change one condition at a time, analyze and verify at, and finally reproduce and modify the problem.
6. Real hard-to-reproduce problems-thread Problems
Background: A mobile music player supports multiple playlists. When you delete a playlist, if there is a song being played, after you delete the playlist, yes. During the deletion process, if a Playing Song is deleted, the next song in the playlist is opened and played until all the songs in the playlist are deleted. The action of deleting a song is put in a different thread than playing a song.
Problem: Sometimes, after you delete the playlist containing the songs being played, the current song jumps to another playlist to continue playing.
Analysis: the deletion process involves many processes: Stop, find the next one, play, and delete. If this happens every time, this problem cannot occur. After a lot of debugging and tracking, we finally found that it was interrupted in the middle because the stop and playback failed to be completed at one time. Because the deletion is in another thread, it is possible that when the execution is being stopped, taskscheduler switches the thread and executes the playback, thus switching to another playback list.
Solution: find the cause of the problem and it is easy to solve it. Add a synchronization lock to ensure that every process of deletion, stop, and playback is not interrupted, this problem is solved.
Lesson: this is a real random problem because it is caused by threads. The first problem brought about by multithreading is uncertainty. Another problem is synchronization and sharing. If synchronization and sharing are not well handled, multithreading will bring about more problems than they solve.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.