When httpclient accesses the website, it sets Accept-Encoding to gzip, and the result returned by deflate is garbled. acceptencoding
Recently, I was infatuated with httpclient to simulate various website login. I checked the request header information in the developer tool in the browser, and wrote it to the httpclient request according to the gourd scheme. The requestheader has such a setting:
Accept-Encoding gzip,deflate
I did not care too much when simulating other websites, because no matter whether I add this section in httpclient or not, the requested website data does not affect the website's security detection, so I didn't pay special attention to this setting at the time, until I first encountered this problem when simulating login to the 58-City website. When I added the above request header settings, the returned webpage data was garbled, it is also the garbled characters of various bold blocks. experience tells me that such garbled characters are not caused by simple encoding errors, because at least English should not contain garbled characters .. in addition, if it is a simple gbk, garbled characters between UTF-8 will not appear in this large area of bold box ..
Then I thought of this Accept-Encoding. Baidu later knows that this is used to set whether gzip compression is performed on the returned data received from the website. this explains why the returned data is garbled in a large area of bold blocks, because it is compressed data, it is impossible to perform normal decoding.
Http://blog.csdn.net/zhangxinrun/article/details/5711307. This is a blog post about the specific meaning of gzip and deflate.
To prevent link failure, I directly extract a section:
Gzip is a data format. By default, only the deflate algorithm is used to compress the data part. deflate is a compression algorithm that enhances the huffman encoding. Deflate is almost the same as the Code decompressed by gzip. You can combine a piece of code. The only difference is: deflate uses inflateInit (), while gzip uses inflateInit2 () for initialization. If it is more than inflateInit (),-MAX_WBITS indicates processing raw deflate data. Because zlib in gzip data does not contain two bytes of zlib header. When inflateInit2 is used, zlib library is required to ignore zlib header. In the zlib manual, windowBits is required to be 8 .. 15, but in fact the data in other ranges has a special effect. See the notes in zlib. h. For example, a negative number indicates raw deflate. The deflate variant of Apache may not have the zlib header. You need to add a false header for post-processing. That is, the MS error deflate (raw deflate). zlib header 1st bytes are generally 0x78, and the dual-byte combination of 2nd bytes and the first byte should be able to be divisible by 31. For details, see rfc1950. For example, the false header of zlib in Firefox is 0x7801, and the result header of python zlib. compress () is 0x789c. Deflate is the most basic algorithm. gzip adds 10 bytes of gzheader before the raw data of deflate, and adds 8 bytes of validation bytes to the end (crc32 and adler32 are optional) and length.
The problem seems to be clear here, but there is still a doubt that I have always set this header when simulating a website login, but the returned data is not compressed. this seems a bit contradictory. After further data collection, we know that all the files in the browser will be automatically decompressed, so this encoding setting will be added to the request header, the website server does not support this request header parameter, that is, even if such a compression setting is added to the Request Header, when the server returns data, it does not necessarily compress the data .. clear all questions ..
Then let's do it. Now that we know the cause of the problem, we can handle it well when receiving it with httpclient, if the other party supports gzip compression and the gzip compression request header is added to our request header, the returned header contains the following returned header information:
Content-Encoding gzip
When receiving the returned information, we only need to slightly check whether the returned header contains the above information for corresponding processing. An example code is as follows:
HttpResponse rep = client.exe cute (post); Header [] headers = rep. getHeaders ("Content-Encoding"); boolean isGzip = false; for (Header h: headers) {if (h. getValue (). equals ("gzip ")){
// The returned header contains gzip isGzip = true;} String responseString = null; if (isGzip ){
// Perform gzip decompression. responseString = EntityUtils. toString (new GzipDecompressingEntity (rep. getEntity ();} else {responseString = EntityUtils. toString (rep. getEntity ());}
So far, the maddening bold box garbled problem has been solved ..