Urldecoder and Urlencoder

Source: Internet
Author: User
Tags encode string url parts alphanumeric characters

When a form in a Web page is submitted using the Post method, the type of data content is application/x-www-form-urlencoded, which is:

1. The characters "a"-"Z", "a"-"Z", "0"-"9", ".", "-", "*", and "_" are not encoded;

2. Convert the space to a plus sign (+);

3. Convert non-textual content into "%xy" form, XY is a two-bit 16 binary value;

4. Place the & symbol between each name=value pair.

*/

The Urlencoder class contains static methods that convert a string to the application/x-www-form-urlencoded MIME format.

One of the many challenges web designers face is how to deal with the differences between different operating systems. These differences in performance cause URL problems: For example, some operating systems allow filenames to contain whitespace, and some do not. Most operating systems do not consider the file name to have any special meaning in the symbol "#", but in a URL, the symbol "#" indicates that the file name has ended, followed by a fragment (partial) identifier. Other special characters, non-alphanumeric character sets, which have special meanings on the URL or on another operating system, express similar problems. To solve these problems, the characters we use in the URL must be elements of a fixed set of ASCII character sets, as follows:

1. Capital Letter A-Z

2. Lowercase letter A-Z

3. Number 0-9

4. Punctuation characters-_. ! ~ * ' (and,)

such as characters:/&? @ # ; $ + = and% can also be used, but they each have its special purpose, if a file name includes these characters (/&? @ #; $ + =%), these characters and all other characters should be encoded.

The encoding process is very simple, as long as any character is not ASCII digits, letters, or punctuation marks mentioned earlier, they are converted to byte form, each byte is written in this form: a "%" followed by a two-bit 16 binary value. Spaces are a special case, because they are too common. In addition to being encoded as "%20", it can also be encoded as a "+". The plus sign (+) itself is encoded as%2b. When/# = & and? When used as part of the name, instead of as a delimiter between the URL parts, they should all be encoded.

Warning This strategy is not ideal in heterogeneous environments where there are a large number of character sets. For example: In the U.S. Windows system, é is encoded as%e9. is encoded as%8e in U.S. Mac. The existence of such uncertainties is an obvious shortcoming of the existing URIs. Therefore, in the future specification of URIs should be improved through the International Resource Identifier (IRIS).

The class URL does not automatically perform encoding or decoding work. You can generate a URL object that can include illegal ASCII and non-ASCII characters and/or%xx. When methods GetPath () and Toexternalform () are used as output methods, the character and the transfer characters are not automatically encoded or decoded. You should be responsible for the string object used to generate a URL object, ensuring that all characters are properly encoded.

Fortunately, Java provides a class urlencoder to encode string into this form. Java1.2 adds a class Urldecoder it can decode a string in this form. None of the two classes are initialized:

public class Urldecoder extends Object

public class Urlencoder extends Object

First, Urlencoder

In java1.3 and earlier versions, Class Java.net.URLEncoder included a simple static method encode (), which encodes a string with the following rules:

public static string encode (string s)

This method always uses the default encoding of its platform, so it produces different results on different systems. In the result java1.4, this method was replaced by another method. This method requires you to specify the encoding form yourself:

public static string encode (string s, String encoding) throws Unsupportedencodingexception

Two methods of encoding, all convert any non-alphanumeric characters to%xx (except spaces, underscores (_), hyphens (?), periods (. ), and an asterisk (*)). Both are also encoded so that the non-ASCII characters. The space is converted to a plus sign. These methods are a bit too burdensome, and they also convert "~", "" "," () "to%xx, even if they do not have to do so at all. Despite this, the conversion is not prohibited by the URL specification. So the Web browser will naturally handle these overly encoded URLs.

Both of the methods for encoding return a new encoded string,java1.3 method encode () uses the platform's default encoding form to get%xx. Typical of these encodings are: Iso-8859-1 on the U.S. Unix system, in U.S. Cp1252 on Windows systems, in U.S. Macs on the Macroman, and other local character sets, and so on. Because the encoding and decoding process is related to the local operating platform, these methods are unpleasant and cannot be cross-platform.

This is a clear answer to why this approach was abandoned in java1.4, and turned to the way it was required to specify its own coding form. However, if you insist on using the default encoding for your platform, your program will be the same as the program in java1.3, which is local to the platform. In another way of encoding, you should always use UTF-8, not anything else. UTF-8 is compatible with new Web browsers and more other software than the other encoding formats you choose.

Example 7-8 is the use of Urlencoder.encode () to print out various encoded strings. It needs to be compiled and run in a java1.4 or later version.

Example 7-8. x-www-form-urlencoded strings

  Import Java.net.URLEncoder;  Import Java.net.URLDecoder;  Import java.io.UnsupportedEncodingException; public class Encodertest {public static void main (string[] args) {try {System.out.println (Urlencoder.encode ("the" this St  Ring has Spaces "," UTF-8 "));  System.out.println (Urlencoder.encode ("This*string*has*asterisks", "UTF-8"));  System.out.println (Urlencoder.encode ("this%string%has%percent%signs", "UTF-8"));  System.out.println (Urlencoder.encode ("this+string+has+pluses", "UTF-8"));  System.out.println (Urlencoder.encode ("This/string/has/slashes", "UTF-8"));  System.out.println (Urlencoder.encode ("This" string "have" quote "Marks", "UTF-8"));  System.out.println (Urlencoder.encode ("This:string:has:colons", "UTF-8"));  System.out.println (Urlencoder.encode ("This~string~has~tildes", "UTF-8"));  System.out.println (Urlencoder.encode ("This (string) have (parentheses)", "UTF-8"));  System.out.println (Urlencoder.encode ("This.string.has.periods", "UTF-8")); System.out.println (Urlencoder.encode ("This=string=hAs=equals=signs "," UTF-8 "));  System.out.println (Urlencoder.encode ("This&string&has&ersands", "UTF-8"));  System.out.println (Urlencoder.encode ("Thiséstringéhasénon-ascii characters", "UTF-8"));  System.out.println (Urlencoder.encode ("This People's Republic", "UTF-8"));  } catch (Unsupportedencodingexception ex) {throw new RuntimeException ("Broken VM does not support UTF-8"); }  }  }

Here is the output of it. It is important to note that the code should be saved in other encodings instead of ASCII, and that the encoding you choose should be passed as a parameter to the compiler, allowing the compiler to interpret non-ASCII characters in the source code accordingly.

% javac-encoding UTF8 encodertest%

Java encodertest

This+string+has+spaces

This*string*has*asterisks

This%25string%25has%25percent%25signs

This%2bstring%2bhas%2bpluses

This%2fstring%2fhas%2fslashes

This%22string%22has%22quote%22marks

This%3astring%3ahas%3acolons

This%7estring%7ehas%7etildes

This%28string%29has%28parentheses%29

This.string.has.periods

This%3dstring%3dhas%3dequals%3dsigns

This%26string%26has%26ampersands

This%c3%a9string%c3%a9has%c3%a9non-ascii+characters

It is particularly important to note that this method encodes the symbols, "\", &,=, and:. It does not attempt to specify how these characters are used in a URL. So you have to code your URLs in chunks instead of passing the entire URL to this method at a time. This is important because the most common use of class Urlencoder is to query a string to interact with a program that uses the Get method on the server side. For example, suppose you want to encode this query sting, which is used to search AltaVista sites:

pg=q&kl=xx&stype=stext&q=+ "JAVA+I/O" &search.x=38&search.y=3

This code encodes it:

String query = Urlencoder.encode ("pg=q&kl=xx&stype=stext&q=+" java+i/o "&search.x=38&search.y=3 "); System.out.println (query);

Unfortunately, the resulting output is:

Pg%3dq%26kl%3dxx%26stype%3dstext%26q%3d%2b%22java%2bi%2fo%22%26search.x%3d38%26search.y%3d3

This problem arises when the method Urlencoder.encode () is blindly encoded. It cannot distinguish between special characters that are used in a URL or query string (like "=" in the preceding string, and "&") and characters that do need to be encoded. Thus, the URL needs to encode only one piece at a time like this:

String query = Urlencoder.encode ("PG");  Query + = "=";  Query + = Urlencoder.encode ("q");  Query + = "&";  Query + = Urlencoder.encode ("KL");  Query + = "=";  Query + = Urlencoder.encode ("XX");  Query + = "&";  Query + = Urlencoder.encode ("Stype");  Query + = "=";  Query + = Urlencoder.encode ("Stext");  Query + = "&";  Query + = Urlencoder.encode ("q");  Query + = "=";  Query + = Urlencoder.encode ("Java I/O");  Query + = "&";  Query + = Urlencoder.encode ("search.x");  Query + = "=";  Query + = Urlencoder.encode ("38");  Query + = "&";  Query + = Urlencoder.encode ("Search.y");  Query + = "=";  Query + = Urlencoder.encode ("3"); System.out.println (query);

This is the output you really want:

Pg=q&kl=xx&stype=stext&q=%2b%22java+i%2fo%22&search.x=38&search.y=3

Example 7-9 is a QueryString class. In a Java object, it uses the class Urlencoder to encode successive property names and property value pairs, a Java object that is used to send data to a server-side program.

When you create a QueryString object, you can get the initial string by passing the first property in the query string to the constructor of class querystring. If you want to continue adding the following property pair, you should call the method Add (), which can also accept two string as arguments and encode them. Method Getquery () returns an attribute pair that is encoded one after the other to get the entire string.

Example 7-9. -the QueryString Class

Package com.macfaq.net;  Import Java.net.URLEncoder;  Import java.io.UnsupportedEncodingException;  public class QueryString {private StringBuffer query = new StringBuffer ();  Public QueryString (string name, String value) {Encode (name, value);  } public synchronized void Add (string name, String value) {query.append (' & ');  Encode (name, value);  } private synchronized void Encode (string name, string value) {try {query.append (Urlencoder.encode (name, "UTF-8"));  Query.append (' = ');  Query.append (Urlencoder.encode (Value, "UTF-8"));  } catch (Unsupportedencodingexception ex) {throw new RuntimeException ("Broken VM does not support UTF-8");  }} public String Getquery () {return query.tostring ();  } public String toString () {return getquery (); }  }

Using this class, we can now encode the string from the previous example:

QueryString qs = new QueryString ("PG", "Q");  Qs.add ("KL", "XX");  Qs.add ("Stype", "stext");  Qs.add ("Q", "+" Java I/O ");  Qs.add ("search.x", "38");  Qs.add ("Search.y", "3");  String url = "Http://www.altavista.com/cgi-bin/query?" + qs; System.out.println (URL);

Second, Urldecoder

There are two static methods for the Urldecoder class corresponding to the Urlencoder class. They decode a string encoded in this form x-www-form-url-encoded. That is, they convert all the plus signs (+) into spaces, converting all%xx to their corresponding characters:

public static string decode (string s) throws Exception public static string decode (string s, string encoding)//Java 1.4 Throws Unsupportedencodingexception

The first method of decoding is used in java1.3 and java1.2. The second method of decoding is used in java1.4 and later versions. If you can't decide which encoding to use, then choose UTF-8. It is more likely than any other coding form to get the right results.

If the string contains a "%", but is not immediately followed by a number of two-bit 16 or is decoded into an illegal sequence, the method throws a IllegalArgumentException exception. The next time this happens, it may not be thrown out. This is related to the operating environment, when the check to have an illegal sequence, throw not throw illegalargumentexception exception, when exactly what will happen is indeterminate. In Sun's JDK 1.4, no exception is thrown, and it adds some inexplicable bytes into a string that cannot be successfully encoded. This is really a headache and could be a security breach.

Since this method does not touch the non-escaped character, you can pass the entire URL as an argument to the method, instead of chunking it as before. For example:

String input = "http://www.altavista.com/cgi-bin/" + "query?pg=q&kl=xx&stype=stext&q=%2b%22java+i%2fo%  22&search.x=38&search.y=3 ";  try {String output = Urldecoder.decode (input, "UTF-8");  SYSTEM.OUT.PRINTLN (output); }

Urldecoder and Urlencoder

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.