Solve the problem of URL encoding in Java

Source: Internet
Author: User
Tags bitset rfc uppercase letter

First look at the differences between the encodeURI and Encodeurlcomponent methods in JavaScript.

encodeURI: ASCII letters and numbers are not encoded, and these ASCII punctuation marks are not encoded:-_. ! The encodeURI () function is not escaped for the following ASCII punctuation marks with special meanings in the URI:;/?:@&=+$,#

Encodeurlcomponent: ASCII letters and numbers are not encoded, and these ASCII punctuation marks are not encoded:-_. ! ~ * ' ()

In Java, Urlencoder.encode (string content,string Enc) method:

ASCII letters and numbers are not encoded, and these ASCII punctuation marks are not encoded:-_. *

The reference code is as follows:

Dontneedencoding =NewBitSet (256); inti;  for(i = ' a '; I <= ' z '; i++) {dontneedencoding.set (i); }         for(i = ' A '; I <= ' Z '; i++) {dontneedencoding.set (i); }         for(i = ' 0 '; I <= ' 9 '; i++) {dontneedencoding.set (i); } dontneedencoding.set (‘ ‘);/*encoding a space to A + is do * in the Encode () method*/Dontneedencoding.set (‘-‘); Dontneedencoding.set (‘_‘); Dontneedencoding.set (‘.‘); Dontneedencoding.set (‘*‘);

If I want to encode a URL in Java, but do not encode ASCII punctuation with special meaning in the URI, you need to add the relevant characters in dontneedencoding to create your own encoding class Myuriencode:

  

 PackageCom.sitech.solr.util;ImportJava.io.CharArrayWriter;Importjava.io.UnsupportedEncodingException;ImportJava.nio.charset.Charset;Importjava.nio.charset.IllegalCharsetNameException;Importjava.nio.charset.UnsupportedCharsetException;ImportJava.security.AccessController;ImportJava.util.BitSet;Importsun.security.action.GetPropertyAction; Public classMyuriencoder {StaticBitSet dontneedencoding; Static Final intCasediff = (' A '-' a '); StaticString Dfltencname =NULL; Static {        /* The list of characters that is not encoded have been * determined as follows: * * RFC 2396 states : *-----* Data characters that is allowed in a URI but does not has a * reserved purpose is cal  LED unreserved.  These include upper * and lower case letters, decimal digits, and a limited set of * punctuation marks and         Symbols. * * unreserved = Alphanum | Mark * Mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "         (" | ")" * * Unreserved characters can be escaped without changing the * semantics of the URI, but this should not be do unless the * URI is being used in a context this does not allow the * unescaped character to Appea R. *-----* * It appears both Netscape and Internet Explorer escape * All Special ch Aracters from this list with the exception * of "-", "_", ".", "*". While it was not clear Why they am * Escaping the other characters, perhaps it's safest to * assume that there might be Contex TS in which the others * is unsafe if not escaped. Therefore, we'll use the same * list.         It is also noteworthy, which is the consistent with * O ' Reilly ' s "html:the Definitive Guide" (page 164). * As a last note, intenet Explorer does not encode the "@" * character which are clearly not unreserved a Ccording to the * RFC.         We are being consistent with the RFC in this matter, * as is Netscape.          **/dontneedencoding=NewBitSet (256); inti;  for(i = ' a '; I <= ' z '; i++) {dontneedencoding.set (i); }         for(i = ' A '; I <= ' Z '; i++) {dontneedencoding.set (i); }         for(i = ' 0 '; I <= ' 9 '; i++) {dontneedencoding.set (i); } dontneedencoding.set (‘ ‘);/*encoding a space to A + is do * in the Encode () method*/Dontneedencoding.set (‘-‘); Dontneedencoding.set (‘_‘); Dontneedencoding.set (‘.‘); Dontneedencoding.set (‘*‘); //The following ASCII punctuation mark with a special meaning in the URI;/?:@&=+$,# does not need to be escapedDontneedencoding.set ('; ')); Dontneedencoding.set (‘/‘); Dontneedencoding.set (‘?‘); Dontneedencoding.set (‘:‘); Dontneedencoding.set (‘@‘); Dontneedencoding.set (' & '); Dontneedencoding.set (=); Dontneedencoding.set (+); Dontneedencoding.set ($); Dontneedencoding.set (‘,‘); Dontneedencoding.set (‘#‘); Dfltencname=accesscontroller.doprivileged (NewGetpropertyaction ("File.encoding")        ); }    /*** You can ' t call the constructor. */    PrivateMyuriencoder () {} Public Staticstring Encode (string s, String enc)throwsunsupportedencodingexception {BooleanNeedtochange =false; StringBuffer out=NewStringBuffer (S.length ());        Charset Charset; Chararraywriter Chararraywriter=NewChararraywriter (); if(Enc = =NULL)            Throw NewNullPointerException ("CharsetName"); Try{CharSet=charset.forname (ENC); } Catch(illegalcharsetnameexception e) {Throw Newunsupportedencodingexception (ENC); } Catch(unsupportedcharsetexception e) {Throw Newunsupportedencodingexception (ENC); }         for(inti = 0; I <s.length ();) {            intc = (int) S.charat (i); //System.out.println ("Examining character:" + C);            if(Dontneedencoding.get (c)) {if(c = = "") {C= ' + '; Needtochange=true; }                //System.out.println ("Storing:" + C);Out.append ((Char) c); I++; } Else {                //convert to external encoding before hex conversion                 Do{chararraywriter.write (c); /** If This character represents the start of a Unicode * surrogate pair and then Pass in the characters. It's not * clear-should be do if a bytes reserved in the * surrogate pairs Range occurs outside of a legal * surrogate pair.                     For now, the just treat it as if it were * any other character. */                    if(c >= 0xD800 && c <= 0xDBFF) {                        /*System.out.println (integer.tohexstring (c) + "is high surrogate"); */                        if((i+1) <s.length ()) {                            intD = (int) S.charat (i+1); /*System.out.println ("\texamining" + integer.tohexstring (d))                            ; */                            if(d >= 0xdc00 && D <= 0xDFFF) {                                /*System.out.println ("\ T" + integer.tohexstring (d)                                + "is low surrogate"); */Chararraywriter.write (d); I++; }}} I++; }  while(I < s.length () &&!dontneedencoding.get ((c = (int) (S.charat (i))));                Chararraywriter.flush (); String Str=NewString (Chararraywriter.tochararray ()); byte[] ba =str.getbytes (CharSet);  for(intj = 0; J < Ba.length; J + +) {out.append (‘%‘); Charch = character.fordigit ((Ba[j] >> 4) & 0xF, 16); //converting to use uppercase letter as part of//The hex value if CH is a letter.                    if(Character.isletter (ch)) {ch-=Casediff;                    } out.append (CH); CH= Character.fordigit (Ba[j] & 0xF, 16); if(Character.isletter (ch)) {ch-=Casediff;                } out.append (CH);                } chararraywriter.reset (); Needtochange=true; }        }        return(Needtochange?)out.tostring (): s); }}

Resolving URL encoding issues in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.