About the HttpClient simulation browser request the number of characters garbled problem resolution method

Source: Internet
Author: User

Reprint Please specify source: http://blog.csdn.net/xiaojimanman/article/details/44407297

Http://www.llwjy.com/blogdetail/9383e88e4bc7378b8318e15b0ac33559.html

Personal Blog Station has been online, website: www.llwjy.com, Welcome to spit Groove ~

--------------------------------------------------------------------------------------------------------------- -----------

In the previous blog introduced some about how to use httpclient to simulate browser requests, so as to obtain the Web page source code, to get their own content. Recently, when I was doing some project testing, I sent for example the following questions:


Descriptive narrative of the problem

1. When using httpclient to simulate the post/get request, the Chinese is in the number of references in the server side, and it is "?". English figures, etc. can be parsed normally.

2. When there is a Chinese in the simulated URL, the Chinese in the parsed value is "?", for example: http://hostname/test.do?name= Hello


Cause of the problem

Access to a wide range of relevant information. Ultimately, the reason for this is that most of the browsers in the market now use the Chinese language in the Utf-8 encoding method. The HttpClient default encoding method is GBK, so in the process of simulating the browser request, due to improper encoding method caused by garbled.


How to Solve

For this issue, from the following two aspects to solve:

The first step: Specify the encoding of the httpclient request, specifying the encoding method can be as follows:

Method.getparams (). Setcontentcharset ("Utf-8"); Method.getparams (). Setparameter (Httpmethodparams.http_content_ CHARSET, "Utf-8"); Method.addrequestheader ("Content-type", "text/html; Charset=utf-8 "); Httpclient.getparams (). Setcontentcharset (" Utf-8 ");
After the first step of processing, the number of references in the post to the Chinese problem has been able to overcome. But there is no way to solve the Chinese in the URL, this requires a second step.

The second step: The URL of the Chinese transcoding processing, the URL can be done by the following method of preprocessing, so as to simulate the behavior of the browser, there will be no garbled.

public static string Encodeurlch (string url) throws Unsupportedencodingexception {string Chregex = "([\u4e00-\u9fa5]+)"; while (true) {String s = getfirststring (URL, Chregex, 1); if ("". Equals (s)) {return URL;} url = Url.replaceall (S, Urlencoder.encode (S, "utf-8"));}}
Through the above two steps, completely can perfectly solve the simulation browser behavior in the garbled problem.


About the HttpClient simulation browser request the number of characters garbled problem resolution method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.