When an HTTP request is used, a Chinese character such as a parameter value in the URL is encoded
1, Processing coding ideas:
Note: [ when accessed on the browser, each space is transcoded to %20 and UrlEncode will transcode each space to + ]
UrlEncode transcoding will also transcode some characters that do not need to be transcoded (for example, &:/etc) that are not parameter values.
1. Encode the required parameters first. (If a GET request can use UrlEncode encoding after the URL stitching)
2. Intercept the URL, then unify the encoding, and replace the special character encoding (&/: when it is a parameter value , it needs to transcode, no
not be transcoded); This method is a disadvantage: when & and/as parameter values need to be transcoded, bad processing.
2, some common characters are urlencode transcoded value ( Chinese character will be transcoded to start with%e, length of 9 string )
English ? after encoding:%3f
/ after encoding:%2f
% encoded:%25
Chinese ? after encoding:%ef%bc%9f
After a single space is encoded: + (when accessed on the browser, each space is transcoded to%20, UrlEncode will
Each space is transcoded to +; you can transcode it and replace it with %20 .
+ after encoding:%2b
English : after encoding:%3a
English : after encoding:%ef%bc%9a
& after encoding:%26
3, Simple splicing demo (there is a shortage of places)
public static void main (string[] args) {//transcoded urlstring result = "";//URL to transcode String url = "https://www.baidu.com/s?wd= language? &rsv_spt=1" + "&rsv_iqid=0xd13fd9040001fb1d&issp=1&f=8 &rsv_bp=0 "+" &rsv_idx=2&ie=utf-8&tn=baiduhome_pg&rsv_enter=1&rsv_sug3=2 "+" &rsv_sug1 =2&rsv_sug7=101&rsv_sug2=0&inputt=774&rsv_sug4=1367 &aaaa=1 "; int index = URL.INDEXOF ("? "); result = Url.substring (0,index+1); String temp = url.substring (index+1); the try {//urlencode transcoding will transcode some special characters, such as &:/=, (but this character needs to be transcoded only when it is a parameter value; for example, the & in the URL has a parameter connection encode = Urlencoder.encode (temp, "utf-8"), the function of which cannot be transcoded at this time); System.out.println (encode); encode = Encode.replace ("%3d", "="); encode = Encode.replace ("%2f", "/"); encode = Encode.replace ("+", "%20"); encode = Encode.replace ("%26", "&"); result + = encode; System.out.println ("transcoded URL:" +result);} catch (Unsupportedencodingexception e) {e.printstacktrace ();}}
Pre-code address:http://hi.baidu.com/test/?a= Zhang San &b=_a123&c=+abc
post-Encoding address:HTTP://HI.BAIDU.COM/TEST/?A=%E5%BC%A0%E4%B8%89&B=_A123&C=+ABC (Chinese characters have been transcoded)
The value of the character corresponding to:
/ → A-Z → 97~122 A-Z → 65~90: → GDP → 37
Chinese characters correspond to values greater than 255
char c = ' I ';
SYSTEM.OUT.PRINTLN ((int) c)//can see the corresponding values
In terms of character encoding, ASCII codes are reserved for standard symbols, numbers, English, etc., with a range of 0~127 and a portion as extended ASCII code 128~255
When the operating system uses non-ASCII encoding (such as Chinese character coding), generally with extended ASCII code, agreed to use the 128~255 range of two or even 4 consecutive codes to enter
Line Chinese character coding, (for example, the national standard with two consecutive 128~255 code for 1 Chinese characters, respectively, the code of the region code and bit code; UTF-8 can use 3 consecutive numbers to represent a
characters), the specific coding rules to see the specific definition, generally not the same. Therefore, when working with a string, if it is a signed string and encounters characters less than 0, it is combined with the
The characters that follow immediately make up a Chinese character, greater than 0 is the standard Western character, and if it is unsigned, it can be judged to be greater than 127.
Chinese encoding in URL when HTTP request