Reprint Please specify source: http://blog.csdn.net/xiaojimanman/article/details/44407297
Http://www.llwjy.com/blogdetail/9383e88e4bc7378b8318e15b0ac33559.html
Personal Blog Station has been online, website: www.llwjy.com, Welcome to spit Groove ~
--------------------------------------------------------------------------------------------------------------- -----------
In the previous blog introduced some about how to use httpclient to simulate browser requests, so as to obtain the Web page source code, to get their own content. Recently, when I was doing some project testing, I sent for example the following questions:
Descriptive narrative of the problem
1. When using httpclient to simulate the post/get request, the Chinese is in the number of references in the server side, and it is "?". English figures, etc. can be parsed normally.
2. When there is a Chinese in the simulated URL, the Chinese in the parsed value is "?", for example: http://hostname/test.do?name= Hello
Cause of the problem
Access to a wide range of relevant information. Ultimately, the reason for this is that most of the browsers in the market now use the Chinese language in the Utf-8 encoding method. The HttpClient default encoding method is GBK, so in the process of simulating the browser request, due to improper encoding method caused by garbled.
How to Solve
For this issue, from the following two aspects to solve:
The first step: Specify the encoding of the httpclient request, specifying the encoding method can be as follows:
Method.getparams (). Setcontentcharset ("Utf-8"); Method.getparams (). Setparameter (Httpmethodparams.http_content_ CHARSET, "Utf-8"); Method.addrequestheader ("Content-type", "text/html; Charset=utf-8 "); Httpclient.getparams (). Setcontentcharset (" Utf-8 ");
After the first step of processing, the number of references in the post to the Chinese problem has been able to overcome. But there is no way to solve the Chinese in the URL, this requires a second step.
The second step: The URL of the Chinese transcoding processing, the URL can be done by the following method of preprocessing, so as to simulate the behavior of the browser, there will be no garbled.
public static string Encodeurlch (string url) throws Unsupportedencodingexception {string Chregex = "([\u4e00-\u9fa5]+)"; while (true) {String s = getfirststring (URL, Chregex, 1); if ("". Equals (s)) {return URL;} url = Url.replaceall (S, Urlencoder.encode (S, "utf-8"));}}
Through the above two steps, completely can perfectly solve the simulation browser behavior in the garbled problem.
About the HttpClient simulation browser request the number of characters garbled problem resolution method