URL encoding (encoder) and decoding (Decoder) in Java __java

Source: Internet
Author: User
Tags url parts stringbuffer

Before we begin to discuss coding decoding, let's be clear about the problem.

What is a application/x-www-form-urlencoded string.

A: It is a type of encoding. These characters are converted to application/x-www-form-urlencoded strings when the URL address contains strings that are not of Western European characters.

This is also true when the table dropdowns commits, and the system converts these characters to application/x-www-form-urlencoded strings when they contain strings that are not of Western European characters.

However, this encoding is inefficient when you send large amounts of text to the server, text that contains non-ASCII characters, or binary data. This time we're going to use another

Encoding type "Multipart/form-data", for example, when we do upload, the form's enctype properties are generally set to "Multipart/form-data."

Browser end <form> form of the Enctype property value of Multipart/form-data, it tells us the transmission of data to use the multimedia transport protocol, because the multimedia transmission is a large number of data, so that the upload file must be the Post method, The type attribute of <input> must be file.

See here for other enctype in the form. Digress with me.

We often see such strings in the browser's address bar%E6%96%87%E6%A1%A3

This is the encoded string, let's discuss the Java URL encoding and decoding problem

Code Java.net.URLDecoder.decode (String s,string enc);
Converts a application/x-www-form-urlencoded string into a normal string.

Java.net.URLEncoder.decode (String s,string enc);
Converts plain string to application/x-www-form-urlencoded string code

Reproduced below a piece of article

/*

When a form in a Web page is submitted using the Post method, the type of the data content is application/x-www-form-urlencoded, which is:

1. The character "a"-"Z", "a"-"Z", "0"-"9", ".", "-", "*", and "_" are not coded;
2. Convert the space to a plus sign (+);
3. Convert non-text content to "%xy" form, XY is a two-bit 16-digit value;
4. Place & symbol between each name=value pair.
*/

The Urlencoder class contains a static method that converts a string to a application/x-www-form-urlencoded MIME format.

One of the many challenges that

       web designers face is how to handle differences between different operating systems. These differences in performance cause problems with URLs: for example, some operating systems allow file names to contain spaces, and some are not allowed. Most operating systems do not think that the filename contains a symbol "#" with any special meaning, but in a URL, the symbol "#" indicates that the file name has ended, followed by a fragment (partial) identifier. Other special characters, non-alphanumeric character sets, have their special meanings on URLs or on another operating system, expressing similar problems. To solve these problems, the character we use in the URL must be an element in a fixed word set in the ASCII character set, as follows:

1. Capital Letter A-Z
2. Lowercase letter A-Z
3. Digital 0-9
4. Punctuation characters-_.! ~ * ' (and,

       such as characters:/& @ #; $ + = and% can also be used, but they each have their special purpose, if a file name includes these characters (/&?). @ # ; $ + =%), these characters and all other characters should be encoded. The

       encoding process is very simple, and any character, as long as it is not an ASCII number, a letter, or a previously mentioned punctuation symbol, is converted to a byte form, Each byte is written in this form: a "%" followed by a value of two-bit 16. Spaces are a special case, because they are too common. In addition to being encoded as "%20", it can also be encoded as a "+". The plus sign (+) itself is encoded as%2b. When/# = & and? When used as part of the name, rather than as a delimiter between the URL parts, they should all be encoded.

     warning This strategy is less effective in heterogeneous environments where a large number of character sets exist. For example: in U.S. Windows system, E is encoded as%E9. Encoded as%8e in the U.S. Mac. The existence of this uncertainty is an obvious deficiency of the existing URI. Therefore, the specification of the future URI should be improved through the International Resource Identifier (IRIS).

The class URL does not automatically perform encoding or decoding work. You can generate a URL object that can include illegal ASCII and non-ASCII characters and/or%xx. When the method GetPath () and Toexternalform () are used as the output method, the characters and the transfer character are not automatically encoded or decoded. You should be responsible for the string object that is used to generate a URL object to ensure that all characters are properly encoded.


Fortunately, Java provides a class urlencoder to encode a string into this form. Java1.2 adds a class Urldecoder it can decode string in this form. None of the two classes is initialized:
public class Urldecoder extends Object
public class Urlencoder extends Object

First, Urlencoder

In java1.3 and earlier versions, Class Java.net.URLEncoder included a simple static method, encode (), which encodes a string with the following rules:
public static string encode (string s)

This method always uses the default encoding of the platform on which it is located, so it produces different results on different systems. In the result java1.4, this method is replaced by another method. This method requires you to specify the encoding form yourself:

public static string encode (string s, String encoding) throws Unsupportedencodingexception

Two methods for encoding are to convert any non-alphanumeric character to%xx (except spaces, underscores (_), hyphens (?), periods (.). ), and an asterisk (*)). Both are encoded so that non-ASCII characters. The space is converted to a plus sign. These methods are a bit too burdensome; they also convert "~", "'", "()" to%xx, even if they do not need to do so at all. Despite this, this conversion is not prohibited by the URL specification. So the Web browser will naturally handle these overly encoded URLs.

Both of the methods for encoding are returned by a new encoded string,java1.3 method encode () using the default encoding form of the platform to get%xx. Typical of these encodings are: the iso-8859-1 on the U.S. Unix system, in U.S. Cp1252 on the Windows system, in U.S. Macs on the Macroman, and other local character sets, and so on. Because the encoding and decoding process is related to the local operating platform, these methods are unpleasant and cannot be cross-platform.
This is a clear answer to why this method was discarded in java1.4, and instead to the method of requiring that the encoding be specified in its own form. However, if you insist on using the default encoding form of your platform, your program will be related to the local platform like the program in java1.3. In another way of coding, you should always use UTF-8 instead of anything else. UTF-8 is compatible with new Web browsers and more other software than you can choose from other forms of coding.

Example 7-8 uses Urlencoder.encode () to print out a variety of encoded strings. It needs to be compiled and run in a java1.4 or newer version.

Example 7-8. x-www-form-urlencoded strings

Import Java.net.URLEncoder;
Import Java.net.URLDecoder;
Import java.io.UnsupportedEncodingException; public class Encodertest {public static void main (string[] args) {try {System.out.println urlencoder.encode (' This Strin
G has spaces "," UTF-8 "));
System.out.println (Urlencoder.encode ("This*string*has*asterisks", "UTF-8"));
System.out.println (Urlencoder.encode ("this%string%has%percent%signs", "UTF-8"));
System.out.println (Urlencoder.encode ("this+string+has+pluses", "UTF-8"));
System.out.println (Urlencoder.encode ("This/string/has/slashes", "UTF-8"));
System.out.println (Urlencoder.encode ("This\" string\ "has\" quote\ "Marks", "UTF-8"));
System.out.println (Urlencoder.encode ("This:string:has:colons", "UTF-8"));
System.out.println (Urlencoder.encode ("This~string~has~tildes", "UTF-8"));
System.out.println (Urlencoder.encode ("This (string) has (parentheses)", "UTF-8"));
System.out.println (Urlencoder.encode ("This.string.has.periods", "UTF-8")); System.out.println (Urlencoder.encode ("This=string=has=equals=signS "," UTF-8 "));
System.out.println (Urlencoder.encode ("This&string&has&ersands", "UTF-8"));
System.out.println (Urlencoder.encode ("Thiséstringéhasénon-ascii characters", "UTF-8"));
System.out.println (Urlencoder.encode ("This People's Republic of China", "UTF-8"));
The catch (Unsupportedencodingexception ex) {throw new RuntimeException ("Broken VM does not support UTF-8");} }

The following is the output of it. Note that the code should be saved in other encodings, not in ASCII form, and that the encoding you choose should be passed as a parameter to the compiler, allowing the compiler to interpret non-ASCII characters in the source code accordingly.

% javac-encoding UTF8 encodertest%

Java encodertest
This+string+has+spaces
This*string*has*asterisks
This%25string%25has%25percent%25signs
This%2bstring%2bhas%2bpluses
This%2fstring%2fhas%2fslashes
This%22string%22has%22quote%22marks
This%3astring%3ahas%3acolons
This%7estring%7ehas%7etildes
This%28string%29has%28parentheses%29
This.string.has.periods
This%3dstring%3dhas%3dequals%3dsigns
This%26string%26has%26ampersands
This%c3%a9string%c3%a9has%c3%a9non-ascii+characters

It is particularly noteworthy that this method encodes symbols, "\", &,=, and:. It will not attempt to specify how these characters are used in a URL. So you have to code your URL in chunks instead of passing the entire URL once to this method. This is important because the most common use of class Urlencoder is to query string for interaction with a program that uses the Get method on the server side. For example, suppose you want to encode this query sting, which is used to search AltaVista Web sites:
pg=q&kl=xx&stype=stext&q=+ "JAVA+I/O" &search.x=38&search.y=3

This code encodes it:
String query = Urlencoder.encode ("pg=q&kl=xx&stype=stext&q=+\" java+i/o\ &search.x=38&search.y =3 "); System.out.println (query);

Unfortunately, the resulting output is:
Pg%3dq%26kl%3dxx%26stype%3dstext%26q%3d%2b%22java%2bi%2fo%22%26search.x%3d38%26search.y%3d3

The problem is that the method Urlencoder.encode () is being blindly coded. It cannot distinguish between special characters used in URLs or query strings (like "=" in the preceding string, and "&") and characters that do need to be encoded. So the URL needs to encode only one piece at a time like the following:

String query = Urlencoder.encode ("PG");
Query + = "=";
Query + + urlencoder.encode ("q");
Query + "&";
Query + + urlencoder.encode ("KL");
Query + = "=";
Query + + urlencoder.encode ("XX");
Query + "&";
Query + + urlencoder.encode ("Stype");
Query + = "=";
Query + + urlencoder.encode ("stext");
Query + "&";
Query + + urlencoder.encode ("q");
Query + = "=";
Query + + urlencoder.encode ("\" Java i/o\ "");
Query + "&";
Query + + urlencoder.encode ("search.x");
Query + = "=";
Query + + urlencoder.encode ("38");
Query + "&";
Query + + urlencoder.encode ("Search.y");
Query + = "=";
Query + + urlencoder.encode ("3");
System.out.println (query);

This is the output you really want:
Pg=q&kl=xx&stype=stext&q=%2b%22java+i%2fo%22&search.x=38&search.y=3

Example 7-9 is a QueryString class. In a Java object, it uses the class Urlencoder to encode successive attribute names and attribute value pairs, which are used to send data to the server-side program.

When you create a QueryString object, you can get the initial string by passing the first attribute in the query string to the constructor of the class querystring. If you want to continue adding a later property pair, you should call method Add (), which can also accept two strings as arguments and encode them. Method Getquery () returns an entire string that is encoded by a property pair.

Example 7-9. -the QueryString Class
Package com.macfaq.net;

Import Java.net.URLEncoder;
Import java.io.UnsupportedEncodingException;

public class QueryString {
Private StringBuffer query = new StringBuffer ();

Public querystring (string name, String value) {
Encode (name, value);
}

Public synchronized void Add (string name, String value) {
Query.append (' & ');
Encode (name, value);
}

Private synchronized void Encode (string name, String value) {
try {
Query.append (Urlencoder.encode (name, "UTF-8"));
Query.append (' = ');
Query.append (Urlencoder.encode (Value, "UTF-8"));
catch (Unsupportedencodingexception ex) {
throw new RuntimeException ("Broken VM does not support UTF-8");
}
}

Public String Getquery () {
return query.tostring ();
}

Public String toString () {
return Getquery ();
}
}

With this class, we can now encode the string in the previous example:
QueryString QS = new QueryString ("PG", "Q");
Qs.add ("KL", "XX");
Qs.add ("Stype", "stext");
Qs.add ("Q", "+\" "Java i/o\");
Qs.add ("search.x", "38");
Qs.add ("Search.y", "3");
String url = "Http://www.altavista.com/cgi-bin/query?" + qs;
System.out.println (URL);

Second, Urldecoder
The Urldecoder class corresponding to the Urlencoder class has two static methods. They decode a string encoded in x-www-form-url-encoded this form. That is, they convert all the plus signs (+) to spaces and convert all%xx to their corresponding characters:
public static string decode (string s) throws Exception
public static string decode (string s, string encoding)//Java 1.4 throws Unsupportedencodingexception

The first method of decoding is used in java1.3 and java1.2. The second decoding method is used in both java1.4 and newer versions. If you can't decide which coding method to use, choose UTF-8. It is more likely than any other form of coding to get the right results.

The method throws a IllegalArgumentException exception if the string contains a "%" but is not immediately followed by a two-bit 16-digit number or is decoded into an illegal sequence. The next time this happens, it may not be thrown. This is related to the operating environment, when checking to have illegal sequence, throw not throw illegalargumentexception exception, then what will happen is uncertain. In Sun's JDK 1.4, no exception is thrown, which adds some inexplicable bytes to a string that cannot be encoded successfully. This is really a headache, probably a security breach.

Since this method does not touch a non-escaped character, you can pass the entire URL as a parameter to the method, rather than block it as before. For example:
String input = "http://www.altavista.com/cgi-bin/" + "query?pg=q&kl=xx&stype=stext&q=%2b%22java+i%2fo% 22&search.x=38&search.y=3 ";
try {
String output = Urldecoder.decode (input, "UTF-8");
SYSTEM.OUT.PRINTLN (output);
}

This article is based on the signed 2.5 China mainland license Agreement, Welcome to reprint, deduction or for commercial purposes, but must retain this article's signature Shimo (including links). If you have any questions or authorization of the negotiations, please leave me a message. If you find the article useful, you are welcome to donate. "Through".

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.