Do you really know the URL encode?

Source: Internet
Author: User
Tags urlencode

The network components need to be rewritten recently due to project needs. When rewriting and review the old code of the network component of the project group, it is found that the URL encoding is not rigorous. When it comes to the fact that this is a problem, several colleagues are very surprised and have a few rebuttal. I said a little cautious surprised, in the online search is really very few other wording. In this with some of their own understanding and experience, do a bit of the popularity of URL coding, I hope to have some help, there are problems please enlighten us. (Refer to rfc1738,3986,6874,7320)

  First, understand the URL encoding and encoding timing, operation process

URIs include URLs and urns, commonly used URLs encode actually follow URI-related files. In the initial design of the URI, it is hoped that the written transcription, such as writing on the napkin, will tell the other person that the constituent character of the URI must be a writable ASCII character. In these can be written character, because some characters in different operating system encoding has different parsing, is included in the unsafe characters (Figure 3), to pay extra attention. Finally, in the constituent characters of the URI, the safest scenario is to correctly use the reserved characters (Figure 1) and the unreserved characters (Figure 1).

When encoding an illegal character to a legitimate URI, the percent encode encoding is used, and the result of encoding an illegal character is three bytes (%+16 characters). However, there is no clear guideline on how to generate percent code, which is why most developers copy old code without knowing why.

Percent encode is literally semantically explicit about its use of% to encode identifiers, and the essence of URL encode is to use percent encode correctly.

The key question for the correct completion of URL encode is: when, what filtering principles are used, and how to generate percent encoding?

At the beginning of the WWW, the practice is to convert the character stream into byte streams, in accordance with the ASCII characters corresponding to bytes one by one can be converted to each other, using the corresponding ASCII character integer value as a% of the latter two 16 characters, constitute percent encoding. A number of percent encoding generation methods have emerged, resulting in the difficulty of identifying URIs.

Now, including iOS using percent-related functions, the specified or system default uses UTF8 to convert to a byte stream, each of which is encoded in a percent encoding, for example, the URL of the Chinese "netease" is encoded as%e7%bd%91%e6%98%93, and its UTF8 byte stream is e7 BD e6 93, you can see its one by one correspondence relationship.

Then the percent encoding is the percent encoding of the byte-by-bytes plus%, after the use of some encoding of the illegal character (contract UTF8) into a character stream.

Because different schemes or protocols have different requirements for the URI format, the RFC makes no hard rules about what content to encode and what filtering principle to use. The decision is deferred to execution by the developer as needed. The following principles are usually followed:

  1. Do not encode unreserved characters percent encode.

2. All characters except reserved and non-reserved characters must be encoded using percent encode.

3. Reserved characters are not used for URI separators, but for other locations, such as value in the query section, to percent encode encoding of the reserved characters used at this time.

4. When the character of the two URI is almost equal, the difference is only in one of the original characters used for certain characters, and the other URI is percent encode for these characters. In most cases, these two URIs should be considered to be different two URIs. Therefore, you should not use percent encode encoding for reserved words when they are used as reserved words .

Figure 1. Reserved words and non-reserved words

Figure 2 Unsafe characters

  Ii. programming of URL encode in iOS development

There are three functions or system methods for URL encode,cfurlcreatestringbyaddingpercentescapes (9.0 deprecated), Stringbyaddingpercentescapesusingencode (9.0 scrapped),

Stringbyaddingpercentencodewithallowedcharacters (System recommended). The system recommendation method uses the UTF8 encoding by default and then completes the percent encoding according to the character set that we specify to allow. Usually we use the URL encode when stitching the GET request URL.

The application scenario is typically passed in urlstring and a parameter nsdictionary, at which point the incoming party is required to ensure that the urlstring is correctly encoded , and then traverse the Nsdictionary key and value, Specify the allowed character set as required to encode the key value and value. The result output is

urlstring[& |?] UrlEncode (Key1) =urlencode (value1) &urlencode (key2) =urlencode (value2) ....The permitted character set used by the author is unreserved characters, including reserved characters and illegal characters such as Chinese, are percent encoded, as shown in the following example
+ (Nsurl *) creategeturlfromstring: (NSString *) urlstringparams:(Nsdictionary *)params{Nsurl*parsedurl =[Nsurl urlwithstring:urlstring]; NSString* Queryprefix = parsedurl.query?@"&":@"?"; Nsmutablearray* Pairs =[Nsmutablearray array];  for(nsstring* keyinch[paramsKeyenumerator]) {        if(! [[paramsObjectforkey:key] Iskindofclass:[nsstringclass]]) {            Continue; } nsstring*value = (NSString *) [paramsObjectforkey:key]; Nsmutablecharacterset*allowedcharacterset =[Nsmutablecharacterset Lowercaselettercharacterset];        [Allowedcharacterset Formunionwithcharacterset:[nsmutablecharacterset Uppercaselettercharacterset]; [Allowedcharacterset addcharactersinstring:@"-_.~"]; NSString*urlencodekey =[value stringbyaddingpercentencodewithallowedcharacters:allowedcharacterset]; NSString*urlencodevalue =[value stringbyaddingpercentencodewithallowedcharacters:allowedcharacterset]; [Pairs addobject:[nsstring stringWithFormat:@"%@=%@", Urlencodekey, Urlencodevalue]]; } nsstring* query = [pairs componentsjoinedbystring:@"&"]; return[Nsurl urlwithstring:[nsstring stringWithFormat:@"%@%@%@", urlstring, Queryprefix, query]];}

   Iii. + and spaces in URL encodeIn the use of Base64 encoded children's shoes will probably know that the basic base64 derived from the Web safe base64, changed the coded character set, where the basic table will appear in the + and/characters, which is generally understood by the browser as a space and path separator. So in order to work properly, we need to replace the last two characters of the index table with a dot .and underline 。  + Number after percent encoded for%20, and 20 is the ASCII code of the space, this is probably the browser's designer will + understand the reason for the space. So in the URL encode is not required to specify the specific license character set, the + number and the space to do the processing to prevent confusion.  In fact, when we used unreserved characters to encode content, we did not allow the + and/symbols to appear in the encoding results of the content items. Therefore, the correct use of percent encoding, when the incoming parameter dictionary contains the + number and/character can be assured.   When the incoming parameter is the result of Base64, you do not need to specifically replace Base64 with a Web safe base64. Four , incorrect URL encode may cause problems

some iOS developers get Get request urlstring and parameter dictionary, first stitching parameters, and then the entire string URL encode, resulting in the inability to distinguish certain characters are in the role of the split component, or as the content of the component. This is not allowed, and the following problems may occur:

1. If the filter is not properly filtered, such as http://www.baidu.com becomes http%3a%2f%2fwww.baidu.com%2findex.htm after encoding, it will not be accessed normally.    that is , in order to support the concatenation of the string as a URL encode, the entire concatenation of the string must be forbidden to encode all the reserved words of the URI, such as & characters, which caused problems 2 and 3.

2. When the build parameter is passed into {"name": "Namepart1&namepart2", "id": "KK"}. At this time splicing string into Http://www.baidu.com?name=namepart1&namepart2id=kk, then how to parse to get "name" field "Namepart1&namepart2" The value of the actual value, as well as the ID field, "KK"?  3. When the build parameter passed in {"name": "Mitty&islogin=true"}. At this time splicing string into http://www.baidu.com?name=Mitty&isLogin=true, if islogin really meaningful querykey, directly caused the server to receive additional parameters. Of course there are many attacks on URLs, such as semantic attack, which is not discussed here. In addition, there are children shoes worried about illegal characters, first to Chinese do base64 and then put to get parameters, from the fear of illegal characters is not necessary, after the URL encoding, and base64 The result is also the ASCII character set, on the network can be normal without loss of information transmission. when the server receives the request,in PHP, for example, the URL should be decoded for each $_get["key".

Do you really know the URL encode?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.