Research on Uri and URL conversion to Java Network Programming (I)

Source: Internet
Author: User
Tags rfc
Uri, URL, and urn are the standard way to identify, locate, and name resources on the Internet. This article analyzes the URI and URL classes (and URL-related classes) of Uri, URL and urn and Java APIs, and demonstrates how to use these classes in programs.

In 1989, Tim Berners-Lee invented the Internet (World Wide Web ). WWW is regarded as a set of actual and abstract resources for global interconnection-it provides information entities as needed-and accesses through the Internet. The actual resources range from files to people. Abstract resources include database queries. Because you need to identify resources in a variety of ways (the names of people may be the same, but computer files can only be accessed through a unique combination of path names), you need to identify WWW resources in a standard way. To meet this need, Tim Berners-Lee introduced a standard way to identify, locate, and name: URI, URL, and urn.

  What are Uri, URL, and urn?

The URI, URL, and urn in the system are associated with each other. The category of URI is located at the top of the system, and the category of URL and urn is located at the bottom of the system. This sort shows that URL and urn are sub-categories of Uri, as shown in 1:


Figure 1: hierarchy between Uri, URL, and urn. URL and urn are sub-categories of Uri.

Uri represents a unified resource identifier, which identifies a simple string of resources in a unified (standardized) way. In typical cases, this string starts with Scheme (the identifier of the namespace named URI -- a group of related names). The syntax is as follows:

[Scheme:] scheme-specific-part

Uri starts with scheme and colon. Scheme starts with an upper/lower-case letter and is left blank or followed by more upper/lower-case letters, numbers, plus signs, minus signs, and periods. The colon separates scheme from scheme-specific-part, and the syntax and meaning of scheme-specific-part are determined by the URI namespace. One example is the http://www.cnn.com, where HTTP is scheme, // The http://www.cnn.com is scheme-specific-part, and its scheme and scheme-specific-part are separated by colons.
We can classify Uris by absolute or relative. An absolute URI refers to a URI starting with scheme (followed by a colon. The http://www.cnn.com mentioned above is an example of absolute Uri, and other examples include mailto: jeff@javajeff.com, news: Comp. Lang. java. Help and XYZ: // whatever. You can regard the absolute URI as a reference to a resource in some way, and this method does not depend on the environment where the identifier appears. If you use a file system for analogy, the absolute URI is similar to the path of a file starting from the root directory. Unlike the absolute Uri, the relative URI is not a URI starting with scheme (followed by a colon. One example is articles/articles.html. You can regard the relative URI as a reference to a resource in a certain way, which depends on the environment where the identifier appears. If the file system is used as an analogy, the relative URI is similar to the file path starting from the current directory.

The URI can be further divided into two types: opaque and layered. An opaque URI refers to an absolute URI where scheme-specific-part does not start with a forward slash. The examples include news: Comp. Lang. Java and the previous mailto: jeff@javajeff.com. The opaque URI is not used for decomposition (beyond the scope of scheme recognition), because the validity of scheme-specific-part is not required. Different from it, a layered URI can be an absolute URI or a relative URL starting with a forward slash.

Unlike an opaque Uri, the scheme-specific-part of a layered URI must be divided into several components. What are these components? The layered URI identifies the scheme-specific-part of the normal subset of the component in line with the following syntax:

[// Authority] [path] [? Query] [# fragment]

The optional authority component identifies the nameorganization Of The URI namespace. If this part starts with a forward slash, it can be based on the server or registration, and it ends with a forward slash, question mark, or no other symbol. The register-based authorization organization component has the syntax of a specific outline (which is not discussed in this article because it is rarely used), and the syntax of the server-based authorization organization component is as follows:

[Userinfo @] host [: Port]

According to this syntax, server-based authorization organization components can start with user information (such as user name) at will, followed by a @ symbol, followed by the host name, colon and port number. For example, jeff@x.com: 90 is a server-based authorization organization component, where Jeff contains user information, x.com contains the host, 90 contains the port.

The optional path component identifies the location (or location) of the resource based on the authorization organization component (if provided) or outline (if no authorization organization component is available ). A path can be divided into a series of path segments. Each path segment is separated by a forward slash (+) from other path segments. If the first path segment of a path starts with a forward slash, the path is considered absolute. Otherwise, the path is considered relative. For example,/A/B/C consists of three path fragments A, B, and C. In addition, this path is absolute because the first path segment () the prefix is a forward slash.

The optional query component identifies the data to be passed to a certain resource. This type of resource uses this data to obtain or generate other data transmitted back to the caller. For example, http://www.somesite.net/? X = Y, x = Y is a query. In this query, x = Y is the data passed to a certain resource. X is the name of a certain entity, Y is the object value.

The last component is fragment. Although this component appears as a part of the URI, It is not absolute. When a URI is used for a retrieval operation, the software that follows the operation uses fragment to focus on the resource part of the software interest (after the software successfully retrieves the data of the resource ).

To actually display the component information mentioned above, you can use the following URI:

FTP: // george@x.com: 90/public/notes? TEXT = Shakespeare # Hamlet

The URI above recognizes FTP as an outline and george@x.com: 90 as a server-based authority (where George is user information, x.com is a host, 90 is a port ), recognize/public/notes as paths, text = Shakespeare as queries, and Hamlet as fragments. Essentially, it is a user called George who wants to retrieve the hamlet information of Shakespeare text on port 90 of the server x.com through the/public/Notes path. After Shakespeare returns to the program successfully, the program locates the hamlet segment and presents it to the user.

Standardization can be understood through directory terminology. The directory X is directly located in the root directory. The directory X contains sub-directories A and B, the file memo.txt, and the file a is the current directory. To display the content in memo.txt (in Microsoft Windows), you may enter type/X/./B/memo.txt. You may also enter type/X/A/../B/memo.txt. In this case, the appearance of a and... is unnecessary. These two forms are not the simplest. However, if you enter/X/B/memo.txt, you have fixed the simplest setting, starting with "memo.txt" in the root directory. The simplest/X/B/memo.txt path is the standard path.

Generally, resources are accessed through basic and relative Uris. The basic URI is an absolute Uri, which uniquely identifies the namespace of a resource, and the relative URI identifies the resource relative to the basic Uri. (Unlike the basic Uri, the relative URI can never be changed within the lifecycle of a resource ). Because basic and relative Uris cannot completely identify a certain resource, it is necessary to merge the two Uris through the parsing process. On the contrary, it is also feasible to extract the relative URI from the merged URI through normalization.

  Note:

Unlike other Uris, an opaque URI is not subject to standardization, decomposition, and relative.

Assume that you use X: // A/as the basic Uri and B/C as the relative Uri. Based on the basic Uri, the corresponding URI will generate X: // A/B/C. B/c is generated based on X: // A/relative X: // A/B/C.

The URI cannot locate or read/write resources. This is a unified resource locator (URL) task. A URL is a URI, but its outline component is a known network protocol (Protocol ), in addition, it compares the URI component with a protocol processing program (A Resource Locator And the read/write mechanism of the constraint rules established according to the protocol to communicate with the resource ).

Generally, a URI cannot provide a name with persistent inconvenience for a resource. This is a unified Resource Name (URN) task. Urn is also a URI, but it is globally unique and inconvenient, even if the resource does not exist or is no longer used.

Use URI

By providing URI classes (in the java.net package), network APIs make it possible to use Uris at the source code layer. The URI constructor creates a URI object that encapsulates the URI. The URI method creates a URI object. If the authorization organization component is server-based, it analyzes it and extracts the URI component, determine whether the URI Of The URI object is absolute or relative; Determine whether the URI Of The URI object is opaque or hierarchical; compare the URI of two URI objects; standardize the URI Of The URI object; break down a relative URI based on the Basic URI Of The URI object to get the decomposed URI. associate a decomposed URI based on the Basic URI Of The URI object to get the relative Uri, converts a URI object to a URL object.
Let's further look at the URI class, which contains five constructors. The simplest is uri (string URI ). This constructor uses the URI as a string parameter, splits the URI into components, and stores these components in a new URI object. If the URI of the string object (referenced by Uri) violates the RFC 2396 syntax rules, the other four constructors uri (string URI) will generate a java.net. urisyntaxexception object.

The following code snippet demonstrates how to use uri (string URI) to create a URI object that encapsulates a simple URI component:

Uri uri = new uri ("http://www.cnn.com ");

In typical cases, the URI constructor is used to create a URI object that encapsulates the URI specified by the user. The URI constructor generates the checked urisyntaxexception object because the user may enter an incorrect Uri. This means that your code must explicitly try to call a URI constructor and catch exceptions, or list urisyntaxexception in the throws clause of this method to "shirk responsibility ".

If you know that the URI is valid (for example, in the source code URI), The urisyntaxexception object will not be generated. In this case, it may be difficult to handle exceptions of a URI constructor. Therefore, Uri provides a static create (string URI) method. This method is used to break down string objects referenced by Uris that contain Uris. If the URI does not violate any syntax rules, a URI object is created (and a reference to it is returned from the method ), otherwise, an internal urisyntaxexception object will be captured. Wrap the object into an unchecked illegalargumentexception object and the illegalargumentexception object will be thrown. Because illegalargumentexception is not checked, you do not need to explicitly try the code and catch exceptions or list its class names in the throws clause.

The following code snippet demonstrates create (string URI ):

Uri uri = URI. Create ("http://www.cnn.com ");

The URI constructor and the CREATE (string URI) method try to break down the user information, host, and port of an authorization organization component of a URI. Server-based authorization organization components normally formed will succeed. For server-based authorization organization components that are poorly formed, they will fail-and the authorization organization component will be treated as registration-based. Sometimes you may know that the authorization organization component of a URI must be server-based. You can ensure that the authorization organization component of the URI breaks down user information, hosts, and ports, or you can ensure that an exception (along with the relevant diagnostic information) will occur ). You can call the parseserverauthority () method of URI to perform this operation. If the URI is successfully decomposed, this method returns a reference to a new URI object that contains the extracted user information, host and port uri (but if the authorization organization component has been broken down, returns the reference of the URI object that calls parseserverauthority .), Otherwise, this method will generate a urisyntaxexception object.

The following code snippet demonstrates parseserverauthority ():

// What happens when the following parseserverauthority () call occurs?
Uri uri = new uri ("// FOO: bar"). parseserverauthority ();

Once you have a URI object, you can call getauthority (), getfragment (), gethost (), getpath (), getport (), getquery (), getscheme (), getschemespecificpart () and getuserinfo () methods to extract multiple components. You can also call isabsolute () to determine whether the URI is absolute or relative. Call isopaque () to determine whether the URI is opaque or hierarchical. If the returned value is true, the URI is absolute or opaque. If the returned value is false, the URI is relative or hierarchical.

The program in List 1 creates a URI object using the command line parameters, calls the URI Component Extraction Method to retrieve the URI component, and calls the isabsolute () and isopaque () of the URI () the URI is classified as absolute/relative and opaque/hierarchical.

List 1: uridemo1.java

// Uridemo1.java

Import java.net .*;

Class uridemo1
{
Public static void main (string [] ARGs) throws exception
{
If (ARGs. length! = 1)
{
System. Err. println ("Usage: Java uridemo1 Uri ");
Return;
}

Uri uri = new Uri (ARGs [0]);

System. Out. println ("authority =" + URI. getauthority ());
System. Out. println ("fragment =" + URI. getfragment ());
System. Out. println ("host =" + URI. gethost ());
System. Out. println ("Path =" + URI. getpath ());
System. Out. println ("Port =" + URI. getport ());
System. Out. println ("query =" + URI. getquery ());
System. Out. println ("Scheme =" + URI. getscheme ());
System. Out. println ("Scheme-specific part =" +
Uri. getschemespecificpart ());
System. Out. println ("User Info =" + URI. getuserinfo ());
System. Out. println ("URI is absolute:" + URI. isabsolute ());
System. Out. println ("URI is opaque:" + URI. isopaque ());
}
}

After you enter the Java uridemo1 command, the output of List 1 is as follows:

Query: // jeff@books.com: 9000/public/manuals/appliances? Stove # Ge:
Authority = jeff@books.com: 9000
Fragment = Ge
Host = books.com
Path =/public/manuals/Appliances
Port = 9000
Query = Stove
Scheme = Query
// Jeff@books.com: 9000/public/manuals/appliances? Stove
User Info = Jeff
Uri is absolute: True
Uri is opaque: false

The output above shows that the URI is absolute because it specifies an outline (query) and the URI is layered because the query is followed by a/symbol.

  Tips

You should call the Uri's compareto (Object O) and equals (Object o) to determine the Uri's order (for sorting purposes) and equality. You can refer to the SDK documentation for more information about these methods.
The URI class supports basic URI operations, including normalization, resolution, and relativization ). Standardization is supported by Uri's normalize () method. When normalize () is called, it returns a reference to the new URI object, which contains the standard representation of the uri of the called URI object.

List 2 demonstrates the normalize () method. It uses URI as the unique parameter of the program, and uridemo2 prints the standard equivalent Uri.

List 2: uridemo2.java

// Uridemo2.java

Import java.net .*;

Class uridemo2
{
Public static void main (string [] ARGs) throws exception
{
If (ARGs. length! = 1)
{
System. Err. println ("Usage: Java uridemo2 Uri ");
Return;
}

Uri uri = new Uri (ARGs [0]);

System. Out. println ("normalized uri =" +
Uri. normalize (). tostring ());
}
}

After uridemo2 is compiled, enter Java uridemo2 x/y/../z/./Q in the command line. The following output is displayed:

Normalized uri = x/z/Q

The output above shows that y,... and. are gone. This is because... it means you want to directly access the Z Part Of The namespace under X. It means you want to access the Q part of The namespace related to Z.

Uri supports reverse resolution and relative operations by providing resolve (string URI), resolve (URI), and relativize (URI) methods. If the URI reference is null, The nullpointerexception object is generated. Similarly, if the specified URI violates RFC 2396 syntax rules, resolve (string URI) indirectly generates an illegalargumentexception object through an internal create (string URI) Call.

The code in list 3 demonstrates resolve (string URI) and relativize (URI ).

List 3: uridemo3.java

// Uridemo3.java

Import java.net .*;

Class uridemo3
{
Public static void main (string [] ARGs) throws exception
{
If (ARGs. length! = 2)
{
System. Err. println ("Usage:" +
"Java uridemo3 uribase urirelative ");
Return;
}

Uri uribase = new Uri (ARGs [0]);
System. Out. println ("base uri =" + uribase. tostring ());

Uri urirelative = new Uri (ARGs [1]);
System. Out. println ("relative uri =" + urirelative. tostring ());

Uri uriresolved = uribase. Resolve (urirelative );
System. Out. println ("Resolved uri =" + uriresolved. tostring ());

Uri urirelativized = uribase. relativize (uriresolved );
System. Out. println ("relativized uri =" + urirelativized. tostring ());
}
}

After uridemo3 is compiled, enter the Java uridemo3 http://www.somedomain.com/X/.../Y. In the command line and output the following:

Base uri = http://www.somedomain.com/
Relative uri = x/../y
Resolved uri = http://www.somedomain.com/y
Relativized uri = y

The above output shows the relative URI x/../y according to the Basic URI http://www.somedomain.com/decomposition and (Internally) standardization, and obtained the decomposed http://www.somedomain.com/uri. Given the URI and the base Uri, the decomposed URI obtains y based on the base Uri, which is the original but standard relative Uri.

  Tips

Call the URI tourl () method to convert the URI to a URL.

This Sunday's topic will show you how to use URLs and mime (multi-purpose Internet Mail extended Protocol) and how it relates to URLs.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.