HttpClient when you visit a Web site, set accept-encoding to Gzip,deflate to return a garbled result

Source: Internet
Author: User

Recent obsession with httpclient simulation of various sites landing, browser developer tools to view the request header information, and then divert write to httpclient request to go, Requestheader has such a set:

Accept-encoding      Gzip,deflate

Before the simulation of other sites this piece does not care too much, because whether I add this paragraph in the httpclient or not add, the request site data has no impact, and does not affect the site's security detection, so there was no special attention to this setting, Until the simulation landing 58 the same city site when the first encounter this problem, when added above this line request header settings, the return page data is garbled, and is the kind of various bold blocks garbled, experience tells me this garbled performance is not a simple coding error caused, because at least English should not appear garbled. And if it is a simple gbk,utf-8 between the garbled also does not appear this large area of bold block.

Then think of this accept-encoding, Baidu later know, this is used to set the return data received from the site is gzip compressed. This also explains why the data returned is a large area of bold block garbled, because it is compressed data, it is impossible to normal decoding.

http://blog.csdn.net/zhangxinrun/article/details/5711307 This is a blog post describing the specific meaning of gzip,deflate

Prevent link invalidation I'll just excerpt a paragraph:

Gzip is a data format that, by default, uses only the DEFLATE algorithm to compress the data section; Deflate is a compression algorithm, which is an enhancement of Huffman coding. Deflate is almost identical to the code that gzip extracts, and you can synthesize a piece of code. The difference is only: Deflate uses inflateinit (), while Gzip initializes with InflateInit2 (), one more parameter than Inflateinit ():-max_wbits, which represents processing raw deflate data. Because the zlib compressed data block in the GZIP data does not have a zlib header of two bytes. The Zlib library is required to ignore the zlib header when using InflateInit2. In the Zlib manual, Windowbits is required to be 8. 15, but in fact other ranges of data have a special effect, see Zlib.h in the comments, such as negative numbers indicate raw deflate. Apache's deflate variant may also have no zlib header, which requires the addition of a dummy header after processing. That is, MS Error deflate (raw deflate). zlib header 1th Byte is generally 0x78, the 2nd byte and the first byte should be divisible by 31, see rfc1950. For example, Firefox's zlib dummy head for 0x7801,python zlib.compress () results in head 0x789c. Deflate is the most basic algorithm, Gzip added 10 bytes of Gzheader in front of deflate raw data, the tail adds 8 bytes of check bytes (optional CRC32 and ADLER32) and length identification bytes.

The problem here seems to be clear, but there is still a doubt, that is, I have simulated landing site will always set the header request header, but the returned data is not compressed. This seems a bit contradictory, after further collection of information, the browser will be automatically uncompressed, Therefore, the request header will be added to such a code setting, and the Web site's server does not all support the request header parameters, that is, even if the request header to add such a compression setting, the servers will not necessarily be compressed to return the data. This question is cleared.

Then the next is good, now that we know the cause of the problem, then we use HttpClient to receive the time is also good processing, if the other side support gzip compression processing and our request header also added to the gzip compression request header, then the return header will have such a return headers information:

Content-encoding  gzip

When we receive the return information, we only need to detect a little bit if the return header contains the above information can be processed accordingly, a sample code is as follows:

HttpResponse rep =Client.execute (POST); Header[] Headers= Rep.getheaders ("content-encoding");BooleanIsgzip =false; for(Header h:headers) {if(H.getvalue (). Equals ("gzip")){
The return header contains gzip Isgzip=true; }}string responsestring=NULL;if(Isgzip) {
Gzip decompression is required responsestring= Entityutils.tostring (Newgzipdecompressingentity (Rep.getentity ()));}Else{responsestring=entityutils.tostring (Rep.getentity ());}

At this point, the person is crazy to solve the problem of bold block garbled.

 

 

HttpClient when you visit a Web site, set accept-encoding to Gzip,deflate to return a garbled result

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.