Uri, URL, urn

Source: Internet
Author: User
Tags control characters


Class URI
public final class URI
Extends object
Implements comparable
>, Serializable

Indicates a Uniform Resource Identifier (URI) reference.

Except for some minor differences mentioned below, such instances represent a URI reference, which is defined in the following document:RFC 2396: Uniform Resource
Identifiers (URI): Generic syntax

In this file, the content of the file has been modified:RFC 2732: Format for literal IPv6
Addresses in URLs

. Scope_ids is also supported. The syntax and usage of scope_ids are as follows:
Description. This class provides
Used to create a URI from its components or by parsing its string
The instance construction method, the method used to access different components of the instance, and the method used to normalize, parse, and relative the URI instance. Such instances are unchangeable.

Uri syntax and components

At the highest level, the string format URI reference (hereinafter abbreviated as "Uri") syntax is as follows




Square brackets [...] are used to describe the optional components and characters.:


They represent themselves.

Uri specifies the scheme (scheme); a non-absolute URI is calledRelative
Uri. Uri
You can alsoOpaque

Not transparent
Uri is an absolute Uri, and the scheme part is not a diagonal line character ('/'
. Opaque URI

No further Parsing is available. The following are some examples of opaque Uris:

Mailto: java-net@java.sun.com
News: Comp. Lang. Java
URN: ISBN: 096139210x

The URI is either an absolute uri (the part of the specific scheme starts with a slash) or a relative Uri, that is, the URI of the scheme is not specified. Below is the layered URI


Docs/GUIDE/collections/designfaq.html #28


File :///~ /Calendar

The layered URI should be further parsed according to the following syntax


] [//

] [Path
] [?

] [#






They represent themselves. The scheme-specific part of a layered URI contains characters between the scheme and fragment.

The authorization component of the layered uri (if specified) isServer-based
. Server-based authorization is parsed according to the following well-known Syntax:




The characters@


They represent themselves. Almost all currently used Uris
All solutions are server-based. Authorization components that cannot be parsed in this way are considered registry-based.

If the path component of the layered URI starts with a slash ('/'), The URI itself is called absolute; otherwise, it is relative.
Layered URI
Or absolute, or the authorization path is specified. It is always absolute.

As described above, a URI instance has the following nine components:

Components Type
Solution String
Specific Solution String
Authorization String
User Information String
Host String
Port Int
Path String
Query String
Fragment String

In a given instanceUndefined
And has different values. The undefined string consists
The undefined integer consists-1
. The value of the defined string component can be a null string, which is not equivalent to the undefined component.

Whether the specific components of an instance are defined or undefined depends on the URI type. Absolute Uri has a scheme component. Opaque URI
There is a scheme, a specific part, and a fragment, but there is no other component. Layered URI
There is always a path (although it may be null) and a solution-specific part (it contains at least one path), and can contain any other component. If the authorization component is server-based, the host component is defined, user information and port components may also be defined.

Uri instance operations

The main operations supported by this class are:Normalization


Is unnecessary in the path component of the layered URI"."
Partial removal process. Each
All parts will be removed.".."
The part is also removed, unless there is a non-".."
. Normalize an opaque URI
No effect.

Is based on anotherBasic
The process of URI parsing a URI. The URI consists of two Uris.
Specified by RFC 2396, which extracts components not specified in the original URI from the basic Uri. For layered Uris, the original path is parsed based on the basic path and then normalized. For example, parse the following

Docs/GUIDE/collections/designfaq.html #28

Based on the basic URIHttp://java.sun.com/j2se/1.3/
Resolution, the result is Uri

Http://java.sun.com/j2se/1.3/docs/guide/collections/designfaq.html#28 (3)

Parse relative URI


Generate according to this result

The basic URI of this result is (3)

Parse absolute and relative Uris and the absolute and relative paths of layered Uris. Based on any other URI
File :///~ Calendar
Only the original URI can be generated for parsing because it is an absolute path. Parse the relative URI according to the relative basic uri (1) (2)
Generate a standard but still a relative URI


Is the inverse process of resolution: For any two standard UrisU

. Relativize (
. Resolve (
). Equals (
. Resolve (
. Relativize (
). Equals (

This operation is useful in the following scenarios: constructing an include
Uri document. The URI must be a relative URI created based on the Basic URI of the document. For example, relative URI


Based on the basic URI


A relative URI is generated.


Character Classification

RFC 2396 precisely specifies the characters allowed by different components in the URI reference. Most of the following categories are taken from this specification. The usage of these constraints is described as follows:

Alpha US-ASCII letter character,'A'
Digit US-ASCII decimal digit,'0'
Alphanum AllAlpha
Character and string"_-!.~ '()*"
Contains characters
Punct String",;: $ & + ="
Contains characters
Reserved AllPunct
Character and string"? /[] @"
Contains characters
Escaped Escape an eight-digit group, that is, a combination of three parts: percent sign ('%'
Followed by two hexadecimal numbers ('0'
Other Unicode characters not included in the US-ASCII character set are not control characters (accordingCharacter.isISOControl

Method), and is not a space character (accordingCharacter.isSpaceChar

Method)(Some discrepancies with RFC 2396
, RFC 2396 is limited

All valid URI character sets include

Escape octal groups, references, encoding, and Decoding

RFC 2396 allows the user information, path, query, and fragment components to contain an eight-digit escape group. Escape has two purposes in Uri:

  • When the URI cannot contain anyOther
    When characters are strictly compliant with RFC 2396, non-US-ASCII characters are requiredEncoding

  • YesReference
    Invalid characters in the component. The user information, path, query, and fragment components are slightly different in determining which characters are valid and which characters are invalid.

In this class, three related operations achieve these two goals:

  • CharacterEncoding
    The way is to represent this character in the UTF-8
    The escape sequence of the character in the character set replaces the character. For example, the euro symbol ('/U20ac'
    ) Encoded
    "% E2 % 82% ac"
    .(Some discrepancies with RFC 2396
    , RFC 2396 does not specify any special character set)


  • Invalid characters are simply encoded.Reference
    . For example, use a space character"% 20"

    Replace it for reference. The UTF-8 contains the US-ASCII, so this conversion works as required by RFC 2396 for US-ASCII characters.

  • Escape the sequence of eight-bit groupsDecoding
    The method is to replace it with the character sequence in the UTF-8 Character Set it represents. UTF-8
    Contains a US-ASCII, so decoding has the effect of canceling a reference to any US-ASCII character referenced, and non-US-ASCII for any Encoding
    Character decoding effect. If an error occurs when decoding the escape octal Group
    , The error octal group replaces the character with Unicode

These operations are made public in such constructor methods and methods as follows:

  • Single Parameter Construction Method

    Any invalid characters in the parameter must be referenced, and any escape octal group and

  • Multi-parameter Constructor

    Reference invalid characters based on the components. Percent character
    ) Always reference through these constructor methods. AnyOther
    All characters are retained.

  • getRawUserInfo






    The method returns the values of the corresponding components in the original form without interpreting any escape group. The strings returned by these methods may contain the escape octal group andOther

    Characters, but does not contain any illegal characters.

  • getUserInfo






    Method to decode any escape octal groups in the corresponding components. The strings returned by these methods may containOther
    Characters and invalid characters, but do not contain any escape octal groups.

  • toString

    Returns a URI string with all necessary references, but it may containOther

  • toASCIIString

    Method return does not contain anyOther
    Character, fully referenced, and encoded URI string.


For any URIU
, The following identifier is valid

Uri (
. Tostring (). Equals (

For any URI that does not contain redundant syntaxU
For example, there are two diagonal lines in front of an empty authorization (for example
File: // tmp/
) And the host name is followed by a colon but there is no port (such
In addition to the characters that must be referenced, the following identifier is also valid:

Uri (
. Getscheme (),

. Getschemespecificpart (),

. Getfragment ())
. Equals (

The following identifiers are valid in all cases:

Uri (
. Getscheme (),

. Getuserinfo (),
. Getauthority (),

. Getpath (),
. Getquery (),

. Getfragment ())
. Equals (

Is hierarchical, the following identifiers are valid

Uri (
. Getscheme (),

. Getuserinfo (),
. Gethost (),
. Getport (),

. Getpath (),
. Getquery (),

. Getfragment ())
. Equals (

Is hierarchical and not authorized or is not server-based.

Uri, URL, and urn

Uri is a unified resource.Identifier
While URL is a unified resourceOperator
. Therefore, in general, each URL
All are Uris, but not all Uris are URLs. This is because URI also includes a subclass, that is, unified resource.Name

(URN), which names resources but does not specify how to locate resources. The aboveMailto
Uri all
Urn example.

The difference in Uri and URL concepts is reflected in this class andURL


An instance of this type represents a URI reference in the syntax defined by RFC 2396. Uri can be absolute or relative. Uri
The string is parsed according to the general syntax, regardless of the scheme specified by the string (if any). It does not perform searches on the host (if any) or construct a stream processing program dependent on the scheme. Equality, hash calculation, and comparison are strictly defined according to the character content of the instance. In other words,
The URI instance is similar to a schema-dependent comparison, normalization, parsing, and relative computing structured string that supports syntax.

For comparison,URL

The instance of the class represents the URL
And the information required to access the resources it describes. The URL must be absolute, that is, it must always specify a scheme. The URL string is parsed according to the scheme. Generally
Creating a stream processing program cannot create a URL instance for a solution that does not provide a processing program. Equality and hash computing depend on the solution and the Internet of the host.
Address (if any); no comparison is defined. In other words, a URL is a structured string that supports parsing syntax operations and network I/O operations such as searching hosts and opening connections to specified resources.




Class URL
public final class URL
Extends object
Implements serializable


Represents a unified resource identifier, which is a pointer to the Internet "resource. A resource can be a simple file or directory, or a reference to a more complex object, such as a query of a database or search engine.


The URL class itself does not encode or decode any URL part based on the escape mechanism defined in rfc2396. The caller encodes any fields that need to be escaped before calling the URL, and
Any escaped fields returned by the URL are decoded. Further, because the URL does not understand URL escape, it does not recognize the equivalent encoding and decoding form of the same URL. For example

Http://foo.com/hello world // and http://foo.com/hello%20world

They are considered to be unequal.

Class performs escape on its composition fields in certain situations. RecommendedURI
Manage URLs
And usetoURI()

Implement the conversion between the two classes.

You can also useURLEncoder
Class, but only applicable to HTML
It is different from the encoding mechanism defined in rfc2396.



Object, which indicatesURL

The connection of the referenced remote object.

(Proxy proxy)

And openconnection ()
Similarly, the connection is established through the specified proxy. protocol handlers that do not support proxy will ignore this proxy parameter and establish a normal connection.


Open hereURL
And returnsInputStream





Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.