Java.net
Class URI
public final class URI
Extends object
Implements comparable
<URI
>, Serializable
Indicates a Uniform Resource Identifier (URI) reference.
Except for some minor differences mentioned below, such instances represent a URI reference, which is defined in the following document:RFC 2396: Uniform Resource
Identifiers (URI): Generic syntax
In this file, the content of the file has been modified:RFC 2732: Format for literal IPv6
Addresses in URLs
. Scope_ids is also supported. The syntax and usage of scope_ids are as follows:
Description. This class provides
Used to create a URI from its components or by parsing its string
The instance construction method, the method used to access different components of the instance, and the method used to normalize, parse, and relative the URI instance. Such instances are unchangeable.
Uri syntax and components
At the highest level, the string format URI reference (hereinafter abbreviated as "Uri") syntax is as follows
[Scheme
:
]Scheme-specific-part
[#
Fragment
]
Square brackets [...] are used to describe the optional components and characters.:
And#
They represent themselves.
Absolute
Uri specifies the scheme (scheme); a non-absolute URI is calledRelative
Uri. Uri
You can alsoOpaque
OrLayered
.
Not transparent
Uri is an absolute Uri, and the scheme part is not a diagonal line character ('/'
. Opaque URI
No further Parsing is available. The following are some examples of opaque Uris:
Mailto: java-net@java.sun.com |
|
News: Comp. Lang. Java |
|
URN: ISBN: 096139210x |
Layered
The URI is either an absolute uri (the part of the specific scheme starts with a slash) or a relative Uri, that is, the URI of the scheme is not specified. Below is the layered URI
Examples:
Http://java.sun.com/j2se/1.3/
Docs/GUIDE/collections/designfaq.html #28
.../Demo/jfc/swingset2/src/swingset2.java
File :///~ /Calendar
The layered URI should be further parsed according to the following syntax
[Scheme
:
] [//
Authority
] [Path
] [?
Query
] [#
Fragment
]
Where,:
,/
,?
And
#
They represent themselves. The scheme-specific part of a layered URI contains characters between the scheme and fragment.
The authorization component of the layered uri (if specified) isServer-based
OrRegistry-based
. Server-based authorization is parsed according to the following well-known Syntax:
[User-Info
@
]Host
[:
Port
]
The characters@
And:
They represent themselves. Almost all currently used Uris
All solutions are server-based. Authorization components that cannot be parsed in this way are considered registry-based.
If the path component of the layered URI starts with a slash ('/'), The URI itself is called absolute; otherwise, it is relative.
Layered URI
Or absolute, or the authorization path is specified. It is always absolute.
As described above, a URI instance has the following nine components:
Components |
Type |
Solution |
String |
Specific Solution |
String |
Authorization |
String |
User Information |
String |
Host |
String |
Port |
Int |
Path |
String |
Query |
String |
Fragment |
String |
In a given instanceUndefined
OrDefined
And has different values. The undefined string consists
Null
The undefined integer consists-1
. The value of the defined string component can be a null string, which is not equivalent to the undefined component.
Whether the specific components of an instance are defined or undefined depends on the URI type. Absolute Uri has a scheme component. Opaque URI
There is a scheme, a specific part, and a fragment, but there is no other component. Layered URI
There is always a path (although it may be null) and a solution-specific part (it contains at least one path), and can contain any other component. If the authorization component is server-based, the host component is defined, user information and port components may also be defined.
Uri instance operations
The main operations supported by this class are:Normalization
,Analysis
AndRelative
Operation.
Normalization
Is unnecessary in the path component of the layered URI"."
And".."
Partial removal process. Each
"."
All parts will be removed.".."
The part is also removed, unless there is a non-".."
. Normalize an opaque URI
No effect.
Analysis
Is based on anotherBasic
The process of URI parsing a URI. The URI consists of two Uris.
Specified by RFC 2396, which extracts components not specified in the original URI from the basic Uri. For layered Uris, the original path is parsed based on the basic path and then normalized. For example, parse the following
Uri
Docs/GUIDE/collections/designfaq.html #28
(1)
Based on the basic URIHttp://java.sun.com/j2se/1.3/
Resolution, the result is Uri
Http://java.sun.com/j2se/1.3/docs/guide/collections/designfaq.html#28 (3)
Parse relative URI
.../Demo/jfc/swingset2/src/swingset2.java
(2)
Generate according to this result
Http://java.sun.com/j2se/1.3/demo/jfc/SwingSet2/src/SwingSet2.java
The basic URI of this result is (3)
Parse absolute and relative Uris and the absolute and relative paths of layered Uris. Based on any other URI
File :///~ Calendar
Only the original URI can be generated for parsing because it is an absolute path. Parse the relative URI according to the relative basic uri (1) (2)
Generate a standard but still a relative URI
Demo/jfc/swingset2/src/swingset2.java
Finally,Relative
Is the inverse process of resolution: For any two standard UrisU
AndV
,
U
. Relativize (
U
. Resolve (
V
). Equals (
V
)
And
U
. Resolve (
U
. Relativize (
V
). Equals (
V
)
.
This operation is useful in the following scenarios: constructing an include
Uri document. The URI must be a relative URI created based on the Basic URI of the document. For example, relative URI
Http://java.sun.com/j2se/1.3/docs/guide/index.html
Based on the basic URI
Http://java.sun.com/j2se/1.3
A relative URI is generated.
Docs/GUIDE/index.html
.
Character Classification
RFC 2396 precisely specifies the characters allowed by different components in the URI reference. Most of the following categories are taken from this specification. The usage of these constraints is described as follows:
Alpha |
US-ASCII letter character,'A' To'Z' And 'A' To'Z' |
Digit |
US-ASCII decimal digit,'0' To'9' |
Alphanum |
AllAlpha AndDigit Character |
Unreserved
|
AllAlphanum Character and string"_-!.~ '()*" Contains characters |
Punct |
String",;: $ & + =" Contains characters |
Reserved |
AllPunct Character and string"? /[] @" Contains characters |
Escaped |
Escape an eight-digit group, that is, a combination of three parts: percent sign ('%' ) Followed by two hexadecimal numbers ('0' -'9' ,'A' -'F' And 'A' -'F' ) |
Other |
Unicode characters not included in the US-ASCII character set are not control characters (accordingCharacter.isISOControl
Method), and is not a space character (accordingCharacter.isSpaceChar
Method)(Some discrepancies with RFC 2396 , RFC 2396 is limited US-ASCII) |
All valid URI character sets include
Unreserved
,Reserved
,Escaped
AndOther
Character.
Escape octal groups, references, encoding, and Decoding
RFC 2396 allows the user information, path, query, and fragment components to contain an eight-digit escape group. Escape has two purposes in Uri:
When the URI cannot contain anyOther
When characters are strictly compliant with RFC 2396, non-US-ASCII characters are requiredEncoding
.
YesReference
Invalid characters in the component. The user information, path, query, and fragment components are slightly different in determining which characters are valid and which characters are invalid.
In this class, three related operations achieve these two goals:
CharacterEncoding
The way is to represent this character in the UTF-8
The escape sequence of the character in the character set replaces the character. For example, the euro symbol ('/U20ac'
) Encoded
"% E2 % 82% ac"
.(Some discrepancies with RFC 2396
, RFC 2396 does not specify any special character set)
.
Invalid characters are simply encoded.Reference
. For example, use a space character"% 20"
Replace it for reference. The UTF-8 contains the US-ASCII, so this conversion works as required by RFC 2396 for US-ASCII characters.
Escape the sequence of eight-bit groupsDecoding
The method is to replace it with the character sequence in the UTF-8 Character Set it represents. UTF-8
Contains a US-ASCII, so decoding has the effect of canceling a reference to any US-ASCII character referenced, and non-US-ASCII for any Encoding
Character decoding effect. If an error occurs when decoding the escape octal Group
, The error octal group replaces the character with Unicode
'/Ufffs'
Replace.
These operations are made public in such constructor methods and methods as follows:
Single Parameter Construction Method
Any invalid characters in the parameter must be referenced, and any escape octal group and
Other
Character.
Multi-parameter Constructor
Reference invalid characters based on the components. Percent character
('%'
) Always reference through these constructor methods. AnyOther
All characters are retained.
getRawUserInfo
,getRawPath
,getRawQuery
,getRawFragment
,getRawAuthority
AndgetRawSchemeSpecificPart
The method returns the values of the corresponding components in the original form without interpreting any escape group. The strings returned by these methods may contain the escape octal group andOther
Characters, but does not contain any illegal characters.
getUserInfo
,getPath
,getQuery
,getFragment
,getAuthority
AndgetSchemeSpecificPart
Method to decode any escape octal groups in the corresponding components. The strings returned by these methods may containOther
Characters and invalid characters, but do not contain any escape octal groups.
toString
Returns a URI string with all necessary references, but it may containOther
Character.
toASCIIString
Method return does not contain anyOther
Character, fully referenced, and encoded URI string.
Identifier
For any URIU
, The following identifier is valid
New
Uri (
U
. Tostring (). Equals (
U
)
.
For any URI that does not contain redundant syntaxU
For example, there are two diagonal lines in front of an empty authorization (for example
File: // tmp/
) And the host name is followed by a colon but there is no port (such
Http://java.sun.com:
In addition to the characters that must be referenced, the following identifier is also valid:
New
Uri (
U
. Getscheme (),
U
. Getschemespecificpart (),
U
. Getfragment ())
. Equals (
U
)
The following identifiers are valid in all cases:
New
Uri (
U
. Getscheme (),
U
. Getuserinfo (),
U
. Getauthority (),
U
. Getpath (),
U
. Getquery (),
U
. Getfragment ())
. Equals (
U
)
IfU
Is hierarchical, the following identifiers are valid
New
Uri (
U
. Getscheme (),
U
. Getuserinfo (),
U
. Gethost (),
U
. Getport (),
U
. Getpath (),
U
. Getquery (),
U
. Getfragment ())
. Equals (
U
)
IfU
Is hierarchical and not authorized or is not server-based.
Uri, URL, and urn
Uri is a unified resource.Identifier
While URL is a unified resourceOperator
. Therefore, in general, each URL
All are Uris, but not all Uris are URLs. This is because URI also includes a subclass, that is, unified resource.Name
(URN), which names resources but does not specify how to locate resources. The aboveMailto
,News
AndISBN
Uri all
Urn example.
The difference in Uri and URL concepts is reflected in this class andURL
Class.
An instance of this type represents a URI reference in the syntax defined by RFC 2396. Uri can be absolute or relative. Uri
The string is parsed according to the general syntax, regardless of the scheme specified by the string (if any). It does not perform searches on the host (if any) or construct a stream processing program dependent on the scheme. Equality, hash calculation, and comparison are strictly defined according to the character content of the instance. In other words,
The URI instance is similar to a schema-dependent comparison, normalization, parsing, and relative computing structured string that supports syntax.
For comparison,URL
The instance of the class represents the URL
And the information required to access the resources it describes. The URL must be absolute, that is, it must always specify a scheme. The URL string is parsed according to the scheme. Generally
Creating a stream processing program cannot create a URL instance for a solution that does not provide a processing program. Equality and hash computing depend on the solution and the Internet of the host.
Address (if any); no comparison is defined. In other words, a URL is a structured string that supports parsing syntax operations and network I/O operations such as searching hosts and opening connections to specified resources.
Java.net
Class URL
public final class URL
Extends object
Implements serializable
ClassURL
Represents a unified resource identifier, which is a pointer to the Internet "resource. A resource can be a simple file or directory, or a reference to a more complex object, such as a query of a database or search engine.
The URL class itself does not encode or decode any URL part based on the escape mechanism defined in rfc2396. The caller encodes any fields that need to be escaped before calling the URL, and
Any escaped fields returned by the URL are decoded. Further, because the URL does not understand URL escape, it does not recognize the equivalent encoding and decoding form of the same URL. For example
URL:
Http://foo.com/hello world // and http://foo.com/hello%20world
They are considered to be unequal.
Note,URI
Class performs escape on its composition fields in certain situations. RecommendedURI
Manage URLs
And usetoURI()
And
URI.toURL()
Implement the conversion between the two classes.
You can also useURLEncoder
AndURLDecoder
Class, but only applicable to HTML
It is different from the encoding mechanism defined in rfc2396.
URLConnection
|
openConnection () ReturnsURLConnection Object, which indicatesURL The connection of the referenced remote object. |
URLConnection
|
openConnection (Proxy proxy) And openconnection () Similarly, the connection is established through the specified proxy. protocol handlers that do not support proxy will ignore this proxy parameter and establish a normal connection. |
InputStream
|
openStream () Open hereURL And returnsInputStream . |