Java Network stream transmission, Chinese garbled problem.

Source: Internet
Author: User

Recently, you need to fetch data from a Web page. Twists

1. First to find the Site page call background data service URL address, but I do not understand JS, spent a lot of time in the analysis of its web page source code JS section, trying to find out the call Data link.

Later learned that the browser will track all the links sent to the page, in Chrome, "F12-> network" will show all the call link. The link to read back-end data is inside.

2. After the URL link is found, the data is read Next.

Starting with the HttpGet class to read, the code is as follows:

Newnull; Try {       = httpclient.execute (httpget);        = response.getentity ();        =catch  (IOException e) {       e.printstacktrace ();}
return body

However, in the body, there will always be garbled appearance, such as the page displayed on the " Kevin " in the program displayed as garbled. The content of the request header and the response header on the page, found that the character set returned is "gb2312", and changed the code to

BODY = entityutils.tostring (Entity, "gb2312");

But it still appears garbled.

So on the Internet (http://www.qqxiuzi.cn/bianma/zifuji.php) query " Kevin" and other garbled character sets, found that "GBK" contains these, and "GB2312" does not contain these relatively rare words. It appears that the character set information on the Web page is not quite right.

Then change the code to

BODY = entityutils.tostring (Entity, "GBK");

There is still a problem ...

Then try to add "CHARSET=GBK" in the HTTP header, there is no change, the server does not support ...

3. Read the byte stream data from the Web page instead

There is still a problem after you change the code in the previous step, guess is that the entityutils inside has already done the conversion. But I don't know how to go any further, so I'm going to start from the source and receive the byte stream data.

The code is as follows:

URL Quest =Newurl (URL); HttpURLConnection Connection=(httpurlconnection). Quest.openconnection (); InputStream is=Connection.getinputstream ();intLen = 0;byte[] temp =New byte[102400]; intLlen =-1; Bytearrayoutputstream OutStream=NewBytearrayoutputstream (); while((Llen = is.read (temp, 0, 102400))! =-1) {outstream.write (temp,0, Llen);} Is.close (); Content=NewString (Outstream.tobytearray (), "GBK");

The byte data is received with Bytearrayoutputstream in order to prevent intermediate truncation resulting in the final translation of the target character set when an error occurs.

This is used only if you know the character set encoding of the data.

It's a simple way to look back from the results, but each step takes a lot of effort to find the right direction, and debug is a menial job.

In conclusion, it seems that the most effective way is to receive the byte stream and turn it into the corresponding character set encoding format.

Java Network stream transmission, Chinese garbled problem.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.