Favorites
Method 1: If you are a C ++ programmer, if you have written a very complex program, if you often encounter inexplicable crash problems. Then you may have encountered a wild pointer. If you compare
Note the debug output window carefully. You may notice the following prompt:
Heap: Free heap block XXXXXXXX modified at XXXXXXXX
After it was freed
There are many people asking this question on the Internet, but there are only a few answers. After searching online for a few days, I finally found a solution to this problem. Next I will introduce it.
Gflags is a tool in the Windows debug tools toolkit.
You can also find it in kit. It is used to set some debugging attributes, which are generally divided into three levels: system, kernel, and image file.
Set the PATH environment variable and point it to the directory of the debug tools tool.
Enter the following command line:
Gflags-P/enable test.exe/full/unaligned
Or pageheap app.exe/full
This command line sets some debugging parameters in the registry so that the protection mechanism is added when the internal usage exists. Therefore, once the memory write is out of bounds or the problem of a wild pointer occurs, it will lead to an interruption. Therefore,
You can determine where the problem is.
Method 2:
Multithreading and memory (HEAP)Pay enough attention to the memory issue in multithreading. Otherwise, the program will always encounter inexplicable errors. If you imagine that some situations "should" do not occur, or think that some steps are normal
If the order is down, the error is huge, because there is a simple fact that you are not following the facts and want to confront the CPU!
Xx
SDK Error Reporting and exit problems have not been effectively solved since last year. In the recent period, the problem was frequently exposed, so it was time to face it directly, go to xx
N times, till now, it is not sure whether the problem has been solved.
SDK symptom:
There is no problem in your own test demo, but there is a problem on the platform.
Later tests showed that the demo was not correct, but the original demo only
The number of devices that are running the test is small, the environment is not real, and the actual situation cannot be effectively tested. Therefore, the demo will not have any problems.
Running the program directly is the kind of debugging error. In the VC debug mode of the platform, the error that triggers the user breakpoint occurs. In the SDK VC
In debug mode, the essence of errors is exposed:
Heapdomainpingtai.exe]: Heap: Free heap block 2be3000
Modified at 2be3200 after it was freed
This type of error has never been encountered in the past. After investigation, it indicates that an error occurs because the heap is damaged.
Error. There are indications that the content of a memory is modified after it is deleted. This means that the heap is destroyed and the error occurs. The simulation program is as follows:
Void fun1 ()
{
Char * P = new char [128];
Delete [] P;
Strcpy (P, "abcdefghijklmnopqrstuvwxyz ");
}
During my simulation, the situation is as follows:
1. If my strcpy () has less than 13 bytes, no error is reported. This situation is not absolute
2.
When fun1 () is returned, no error occurs. When the program exits, an error is returned.
3. If you call any other function after fun1 () is returned, an error is returned.
To facilitate the subsequent descriptions, the error is described as follows:
Heap [<EXE>]: Heap: Free heap Block
<Addr1> modified at <addr2> after it was freed
Analyze the errors reported by the simulation program and draw the following conclusions:
The memory <addr1> is deleted, and the memory <addr2> is modified.
<Addr2> is included in <addr1>. The format is as follows:
| ------------------------------ |
|
<Addr1> | <addr2> |
| ------------------------------ |
In
In the simulation program, <addr1> = P, <addr2> = P and <addr2> <p +
128
In actual memory, <addr1> is not equal to P, but smaller than P. It does not represent any memory in the program code, but is used to save P
Memory pointer generated by memory, which is saved and maintained by Windows (Note: In my opinion, the address of P can be obtained through calculation). The actual mode is as follows:
| ------------------------------ |
|
<Addr1> | p |
| ------------------------------ |
<Addr1>
Is the memory occupied by windows, while p is the memory occupied by the program
Since the memory operation is clear and you know how to generate this error, you need to find out which part of the program has this problem based on <addr2>.
First, find out which memory is deleted based on <addr2> and re-attach the value (or somewhere in it is re-appended ), then, make sure that the deletion and value are not
To conflict
There should be no problem with the program sequence, because the values are not added after deletion, or any other possibility of use, from memory to be allocated (mainly char * type memory points
Variable. Print all the new char * pointer addresses and their lengths, and search for them one by one from tens of thousands of lines of print information, but no <addr2>
This
It's strange. Is there a problem with the execution sequence of the program? Impossible
Print all the new class addresses and their memory ranges, and find them one by one ......
Actually
This is the case with the caccept class! Is it still in use after it is deleted? According to the above memory analysis, this is definitely the case, otherwise..., find!
Because
The caccept class is used less and easier to find. It is mainly used in two places. One is to complete the dead loop of the port, and the other is to check the dead loop of the task, it must be used in these two places.
Conflict exists.
Working Mode of port completion:
While (true)
{
If (! Getcompleteio ())
Break;
If
(Isaccept ())
{
Caccept * PACC = getaccept (); // obtain from the list
//
Use PACC with a value
......
// Delete PACC
Safe_delete (PACC)
}
Else
{
//
Data sending and receiving and other message processing on the socket Port
}
}
Task Check Mode:
While (true)
{
Wait (5000); // wait for 5 seconds to start the task check
Lock ();
While
(Not_end_accept_list)
{
Caccept * PACC = nextaccept ();
If (
PACC-> toolongtime)
Safe_delete (PACC) // Delete PACC
}
Unlock ();
// Connection check
...
}
Sub-functions (such as createaccept (), getaccept (), and deleteaccept () for accept are all placed in the critical section.
Here, we carefully check and find that the use of caccept on the completed port is not placed in the critical section, although its getaccept () has an operation in the critical section, although
The accept list operation is also performed in the critical section, so the problem must have occurred here. To verify, delete the accept at the job check and
When the completed port is not used for the caccept, it has been deleted by the task check cycle!
This situation is hard to happen under normal circumstances,
Is
It is easy to appear, and the operation in the critical section is delayed, that is, each operation is waiting for a long time (originally there is no waiting), such as 100 milliseconds, or 50 milliseconds.
Multiple operations are suspended. When the number of connections to the device reaches a certain threshold, the larger the number of pending operations, the more problems can be easily reproduced.
Locate the problem and solve it.
Start
It should be easier (some are tricky, such as processing the socket messages in the else branch in the port ), you only need to add the use of caccept to the phase
In the critical section, the modification is complete, the test is complete, and there is no caccept heap problem. At this point, this part of the problem has been solved perfectly. That's a pleasure ---------, don't mention it.
......
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>
Also
There are two problems: one is about the Upper-layer call to the SDK, and the other is the internal issue of the SDK.
These two problems are essentially the same as those above, that is, they are still used after they are deleted!
1.
Upper-layer SDK calls
SDK
The device on the upper layer is disconnected, but the upper layer is still calling the corresponding interface after receiving the message. This is not a problem because when the SDK is called on the upper layer, the SDK will be included in the list of related devices.
Obtain the corresponding connection, and then use this connection to operate the device. If the device is disconnected, it will be deleted from the corresponding list, when you obtain a connection from an API called at the upper layer
Failed to retrieve. After SDK judgment, it will naturally return failure. That's right! You are right. That's it ....... Stop! You want to, think about it carefully, and then look down.
======================================
In
The SDK does not know whether the upper layer calls the SDK and receives messages in one thread or two threads, the SDK does not know the order of SDK message processing and API calls in the upper layer.
In one case, it should be easier to understand. When the SDK sends a lot of notifications to the upper layer, the upper layer may put the message into the message queue and then process it one by one, the processing of
The message may be a message 10 seconds ago (a little exaggerated). If so, and if the message is just a disconnected message, 3 seconds before it processes the message, the user wants
The user can see that the device is online, that is, the command fails to be issued, and the error message displayed is that the device is not online! In this way, the program will not go wrong, because 7 seconds, too long, the SDK should have been
The connection is deleted from the connection list.
Unfortunately, this is not always the case, but a bad situation may happen. After the corresponding connection is obtained in the API function
Set
When commands are issued by the slave node, the connection list is deleted cyclically by the Connection check in the task check (for example, no data is received for one minute). If the function that sends the command is read, without any additional operations
The error is caused by an invalid memory read exception. Otherwise, the error is the same as the above heap error. This kind of error may be easier to solve, for example, when used
This is certainly feasible, but when there are 100 APIs, you must perform 100 operations in the critical section. This is not important, but more importantly, because you have added operations in the critical section in the outermost layer
In the SDK, you must be careful to ensure that operations in the critical section do not conflict. Otherwise, the program will not respond, and you need to use the task manager to end the process. If a value-added operation exists
This is the only solution, but it may be relatively simple if there is no value-added operation. For example, I only added the outer layer of the API that encountered this error at that time.
Try... catch ().... In this way, if its instance is deleted during execution, the API will return a failure directly. The specific solution can be different based on the actual situation
Here, I just want to clarify my own ideas.
2. Another internal problem of the SDK
It is exactly the same as caccept, but it is quite complex because it involves almost
To all the places of the program, it is very difficult to solve, and, due to the need to communicate with the upper layer, a careless, will cause the critical section conflict (except for the upper-layer code, the code in the SDK is put into the critical
Area ):
[Receive socket message] --> [upper-layer Notification] --> [upper-layer sdk api call] -->
[API calls SDK internal functions]
In this process, after the SDK notifies the upper layer, it needs to wait until the upper layer returns for the next step. The upper layer needs to call the API to return the result.
When an API calls an internal function, it may need to enter the critical section before the operation. Therefore, a conflict occurs in the critical section.
Therefore, when processing the received socket message, it is placed in the critical section.
It is a dangerous practice, so it is tricky to find a good solution here
Due to time issues, I will only talk about it.