Analysis process of JBoss content disorder

Source: Internet
Author: User
Tags define session

Symptom:
After an application of online services is published, the user account serial number suddenly occurs. The severity of this problem is completely serious.
Commercial websites are almost fatal, and online websites are rolled back immediately when problems occur. Manual data correction for problematic users
Work.

Analysis:
This is because this was not the case before (basically none, it is said that this was the case once a year ago, because only once
And a year ago, it was suspected that it could be the data cache at the network access layer ). Because of the sudden emergence of a large number of applications (for business
If there are more than a dozen website accounts and dozens of user accounts are incorrect, it should be a large number). First, check the newly released applications.

One clue is that when a serial number is displayed, some page content and status codes are incorrect. It should have been 302 but changed to 200.
The page content is obviously incorrect. Originally, only true and false are output, but an HTML segment is output.
Although I do not believe that programmers in the company will define session variables beyond request processing, I should scan them carefully to exclude them.
In our application, because the session store of the distributed application is self-implemented, the request, response, session, and context
All of them are intercepted and packaged. If some record does not submit response, it may cause dirty data. But it does not
This problem was found.
A servlet has only one line of code: Response. sendredirect ("/path"); this is to redirect multiple different URLs to the same
For a request, theoretically this URL certainly returns 302, but returns 200 in the case of a string number. Therefore, it is suspected that the chain is in a filter condition.
If no request is called, it is returned directly. The request does not reach the servlet. This is not the case after all filters are analyzed.

A little helpless. the reference rate of the content that the search engine can search for is zero.
From the suspicion of the business layer to the container environment.
At first, I did not doubt the container environment because this problem was not encountered before, because of the release of an application. Now the application itself
The Business Code is basically excluded from the problem, so imagine whether there was a problem in the container environment before, but it was not triggered because the conditions were not met. This application was released
After a single activity increased the access traffic several times, was the hidden problem caused by pressure rise?
The container environment of other applications of the company has been mature and stable for many years. Therefore, we mainly look at the differences between the container environment of this application and other application environments.
It is found that most applications are apache2.0.3 + jboss4.0.5 + mod_jk1.2.28, and this application is apache2.0.3 + jboss4.2.3 + mod_jk1.2.26.
The JK version was upgraded from 1.2.26, but this has never happened before. Therefore, check jboss4.2.3 first.
Search for the jboss4.2.3 bug list and finally find a reliable problem.

Jboss4.2.3 uses jbossweb2.0.1. For the GA version, 2.0.x has a bug that causes data disorder:
In the parseparameters method in org. Apache. Catalina. connector. Request. Java, the last few rows are as follows:

Byte [] formdata = NULL;
If (LEN <cached_post_len ){
If (postdata = NULL)
Postdata = new byte [cached_post_len];
Formdata = postdata;
} Else {
Formdata = new byte [Len];
}
Try {
If (readpostbody (formdata, Len )! = Len ){
Return;
}
} Catch (ioexception e ){
// Client disconnect
If (context. getlogger (). isdebugenabled ()){
Context. getlogger (). debug (
SM. getstring ("coyoterequest. parseparameters"), e );
}
}
Parameters. processparameters (formdata, 0, Len );

When the servlet engine processes actual client requests, it does not generate a request object every time a request is sent, but a thread binds
Request object (threadlocal), which is cleaned up and reused after processing a request. However, sometimes postdata is not cleared, but the data read every time overwrites a buffer,

Then read the specified length from the buffer.

If the client sends a POST request from Apache to JBoss through JK, JK first tells JBoss that the body length of this transmission is X, while JBoss
Readpostbody does not read the corresponding length, and may time out. Although the exception is captured, but there is no return, it continues to run to the following
Parameters. processparameters (formdata, 0, Len); because the buffer does not read data of Len length, it is processed according to Len length,
Processparameters will occur using the post content of the previous request or part of the previous content. This will inevitably lead to production data disorder.

To reproduce this problem, I modified the ajp_send_request function in common/jk_ajp_common.c of mod_jk1.2.26 source code:

Postlen = op-> post-> Len;
If (postlen> ajp_header_len ){
.................
}
Else if (S-> reco_status = reco_filled ){
.................
}
Else {
If (RAND () % 3 = 0 ){
Jk_log (L, jk_log_error, "It will sleep .......................");
Sleep (120 );
Jk_log (L, jk_log_error, "end sleep ...........................");
}
.........................
}
When the post data is sent, sleep takes 120 seconds. This time must be later than connection_pool_timeout In the JK configuration file, instead
Socket_timeout.connection_pool_timeout corresponds to connectiontimeout In the connector in JBoss. It usually takes a long time.
Our application is set to 600 s. In order to debug timeout, I changed them to 60 s, so sleep (120) is definitely timeout.
Then test with the client program, write an httpurlconnection, and then I put a mark = xxxxxxxxxx
Random string. a jsp file gets the mark parameter and prints it out. I should read this value from the client again, which is the same as what I passed in.
Because there is a 1/3 timeout opportunity, I only use 10 cycles to run this client test. When the timeout occurs, Mark
This is not the current request passed by MAR, but the last mark. fully verified this situation.

In this case, we can explain the occurrence of serial numbers.
1. It is a POST request.
2. The servlet needs to process the request, that is, get content from the request. for performance reasons, the client content is cached by the TCP layer after receiving (APACHE.
If the servlet itself does not involve processing request. getxxx, JBoss will not request this part of data from Apache through JK, and Apache will discard it directly.
This bug will not be triggered. During debugging, I went to this point. When I changed JSP to servlet for testing, I didn't get the parameter and the result was not triggered. I thought for a long time.
This problem was discovered.
3. Although this request uses part of or all of the data of the previous request after an exception occurs, the parameter itself is logically incorrect, but it is not necessarily
The output result is also incorrect, and some content is not output based on the request parameters.
4. Only the previous request data contains the information of other accounts, and the output result obtained using this condition contains the account information and is returned to the current user.
Will see the string phenomenon. Therefore, even if many readpostbodies are generated, the result of the string is not necessarily the same.

 

Solution:

This is easy. If you think there is a risk of upgrading or downgrading other versions, simply add return. At the end of the catch above and then compile and replace the original jar.

 

Note: This bug is not generated in jboss4.0.5. tomcat5.5, parameters. processparameters (formdata, 0, Len) used in 4.0.5 );
And added readlen = contentlen. Jbossweb2.10 also fixed this bug.

Another version may cause data disorder, that is, the connection pool between JK and JBoss. If an exception occurs in the default implementation, the connection will be
If the handler that intercepts ctor in connector eats this exception, the exception cannot be ajpconnectionhandler.
The connection cannot be closed and put back into the connection pool. For non-blocking Io, the residual data in the socketchannel will be read for the next time.
It is unacceptable, so if you want to intercept the input and output streams, do not catch their exceptions, or immediately throw them after capture and processing.

A major cause of this problem is the two-phase submission of IE post.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.