HTTP protocol Advanced (ii) URLs and resources

Source: Internet
Author: User
Tags printable characters file transfer protocol ftp protocol

Like today's weather, in fact, the most suitable to put some lyrical slow ballad, a cup of espresso, a book, Spend a day ...

Shanghai today is raining hard ...

First, the syntax of the URL

URL is a standardized name for Internet resources

URLs provide a means of locating any resource on the Internet, but these resources are accessed through different schemes (protocols such as HTTP, FTP, SMTP), so the URL syntax is slightly different

Most URLs follow a common syntax, and different URL scheme styles and syntax overlap

Most URL association syntax is based on the following 9-part common format:

<scheme>://<user>:<password>@

The 3 most important parts are: scheme (scheme), host , and path

Generic URL component:

1. Plan-What protocol to use

Scenario: actually specifies how to access the primary identifier of the specified resource, which tells the application that is responsible for parsing the URL what protocol to use

The schema component must begin with an alphabetic symbol separated from the rest of the URL by the first ":" Symbol. (scenario name is not case sensitive)

2. Host and Port

To find resources on the go, the application needs to know which machine is loading the resources and where the machine can find the server that accesses the target resource, and the host and port of the URL provide this 2-point information

The host component identifies the host machine on which the resource can be accessed. The hostname or IP address is used to represent the host name. For example, the following 2 URLs refer to the same resource

host name point to:http://joes-hardware.com:80/index.html

IP point:161.58.228.45:80/index.html

The port component identifies the network port that the server is listening on. The default port is 80 for HTTP protocols that use the TCP protocol below

3. User name and password

Some servers require a user name and password to allow users to access data, such as FTP (File Transfer Protocol), as in the following examples

Ftp://ftp.prep.ai.mit.edu/pub/gnu

There are no user names and passwords, only standard protocols, hosts, and paths; If an application uses a URL protocol that requires a user name password, but the user does not provide it, a default user name and password are usually inserted, such as an FTP

Ftp://[email Protected]://ftp.prep.ai.mit.edu/pub/gnu

A user name Anonymous is specified, which is combined with the host to look like an email address, and the character "@" separates the user name and the password component from the rest

Ftp://anonymous:[email Protected]://ftp.prep.ai.mit.edu/pub/gnu

A user name and password are specified, separated by the character ":"

4. Path

The path indicates where the requested resource is located on the server, usually a hierarchical file system path, for example:

Http://joes-hardware.com:80/seasona/index-fall.html

The resource path in this URL is seasona/index-fall.html, much like the file system path in the Unix file system

The path is the information required by the server to locate the resource, and you can use "/" to divide the path component in the HTTP URL into some path segments (path segment), each with its own parameter field

5. Parameters

For many protocols, only a simple hostname and the path to the object is not enough, in addition to the port and user name password, but also need more content to access

Some applications that resolve URLs require protocol parameters to work, or the server does not provide services or provide the wrong services, such as

The FTP protocol is available in two ways: binary and text. If a binary picture is sent as text, it's hard to predict how bad it is.

A parameter component is a queue table of name values in a URL, separated by "/" from other parts, such as

Ftp://ftp.prep.ai.mit.edu/pub/gnu;type=d

The parameter is Type=d, where the parameter name is type and the value is D

6. Query string

When we send a request, many resources, such as the database service, can be queried to narrow the type range of the requested resource, for example

http://www.joes-hardware.com/inventoty-check.cgi?itcm-12731

Question mark (? The content on the right is the query component of this URL. The query component of the URL is sent to the gateway resource along with the URL path component that identifies the gateway resource. Gateways can be used as access points to other applications

For example, the purpose of the query is to check the list for entries with size large and color blue

The gateway basically wants the query string to appear as a series of "name/value" pairs, separated by "&" between the names of the values.

In the above example, the query component has 2 name/value pairs: item=12731 and Color=blue

7. Fragments

Some resource types, such as HTML, can be further divided in addition to resource-level surprises, such as a large text document with chapters, where the URL of the resource points to the entire document, but ideally, it can point to the chapters in the resource

To facilitate the reference, the URL allows the fragment (Frag) component to represent a fragment within the resource, the fragment hanging at the right of the URL, preceded by a character "#", for example:

Http://www.joes-hardware.comtools.html#drills

In this example, the fragment refers to a section of page/tools.html on the Joes-hardwareweb server, which is called drills

The server handles the entire object, and the URL fragment is used only by the client and displayed

Second, the URL shortcut

Web can understand and use shortcuts to URLs, such as thumbnails, auto-expand (User input Key section, browser is responsible for populating)

1. Relative URL

URLs are available in 2 ways: absolute and relative . So far, the URL we used is basically absolute, and it contains all the information needed to access the resource.

Relative URLs are incomplete, and you must parse them relative to the other base (base) URL to get all the information from the relative URL that accesses the resource.

A relative URL is a handy thumbnail for a URL, and here's an example of an HTML document with a relative URL embedded in it

<body>
<p>joe ' s hardware online has the largest selection of<a herp= ".
Hammers.html ">hammers
<a/>on Earth
</body>

The above example is a resource http://www.joes-hardware.com/tools.html HTML document that contains url./hammers.html hyperlinks in this document.

Although it does not look complete, it is actually a valid relative URL that can be interpreted relative to the URL in the document in which it resides.

Using the abbreviated form of relative URL syntax, you can omit schema, host, and other components from the URL when writing HTML, which can be deduced from the underlying URL of the owning resource, and the URLs of other resources can be represented in this abbreviated form.

Explains how to derive missing component information from the underlying URL

A relative URL is just a fragment or a small portion of a URL, and the application that processes the URL needs to convert between relative and determined URLs

PS: Relative URLs provide a convenient way to maintain the convenience of a set of resources (HTML pages), and if you use a relative URL, you can still maintain the validity of the link when you move a set of documents, because relative URLs are interpreted relative to the new base, similar to the ability to provide mirrored content on other servers

1.1 Base URL

The underlying URL is used as a reference point for relative URLs, and can come from the following different places:

explicitly provided in resources: Some resources explicitly specify the underlying URL

For example, an HTML document may contain an HTML tag <base> that defines the underlying URL, which can be converted to all relative URLs in an HTML document.

encapsulates the underlying URL of a resource: If a relative URL is found in a resource that does not explicitly specify the underlying URL, as shown in the preceding HTML document, the URL of the resource to which it belongs can be used as the basis

There is no base URL: In some cases there is no underlying URL, which generally means you have a relative URL, but sometimes it may just be an incomplete or corrupt URL

1.2 Resolving relative references

parsing: to convert a relative URL to a decision URL, you need to divide the relative URL and the decision URL into component segments so that the URL is actually parsed, but it is divided into components that can be called parse/explode URLs

The base and relative URLs are divided into components, which can be used to complete the conversion

The algorithm converts a relative URL into its absolute mode, and then uses it to reference the resource

2. automatic extension URL

Many browsers will try to automatically extend the URL when the user submits the url/input URL, which makes it easy for users to enter the full URL, and the browser automatically expands

There are 2 ways to extend the feature automatically:

2.1 Host name extension

As long as there are some tips, the browser can help you to extend the host name into a full host name, such as: Enter Baidu, build www.baidu.com; The disadvantage is that sometimes it can cause problems for other HTTP applications, such as proxies, explained in detail later

2.2 Historical expansion

Stores the URL records visited by previous users, matches them with the URL prefixes in the history when the user enters the URL, and provides some complete options for the user to select

PS: When used with an agent, the behavior of the URL auto-extension may be different, explained in detail later

Third, URL character set

the URL is portable: It names all the resources on the Internet and needs to transfer resources through a variety of different protocols, so that the transfer of resources takes different mechanisms, so the safe transmission of information is important.

Secure transport means that URL transmissions cannot lose information, but some protocols, such as SMTP (Simple Mail Transfer Protocol), are stripped of some character characters

The URL is readable: Therefore, even invisible, non-printable characters can be ported through the mail program, and cannot be used in URLs

The URL is complete: someone wants the URL to contain binary data or characters other than the first universal security alphabet, so an escape mechanism is needed to encode the unsafe characters into safe characters and then transfer

1. URL Character Set

1.1 Many computer applications use the ASCII character set, ASCII uses 7-bit binaries to represent most keystrokes and a handful of non-controlled characters, and its portability is good, but considering that there are too many users around the world, and sometimes the URL contains arbitrary binary data

You need to integrate the escape sequence to encode a finite subset of the ASCII character set to any character value or data through an escape sequence, thus enabling portability and integrity

1.2 Encoding mechanism

To avoid the restrictions imposed by the security character set, an "escaped" notation is designed to represent unsafe characters, which contain a percent semicolon (%) followed by 2 hexadecimal digits representing the ASCII code of the character, and here are a few examples

1.3 Character limit

In the URL, several characters are reserved and have special meanings. Some characters are not in the defined ASCII character set, and some inode are confused with some protocol gateways and are therefore not in favor of using

Iv. common agreements

The following appendix is a list of commonly used protocols

Five, the future development of the URL

A URL can be used to name all existing objects, but it also provides a uniform naming mechanism that can be shared among various protocols, but is not perfect, because URLs only represent actual addresses, not exact names, meaning that URLs cannot be positioned if the resource address changes

The persistent Uniform Resource Locator (PURL), which is essentially a middle tier introduced in the search resource process, registers and tracks the actual URL of the resource through an intermediary resource Locator (resource Locator) server

The client can request a permanent URL to the locator, and the locator can redirect the client to the current URL of the resource in response to a resource

Write here, basically about the URL and resources of the content is finished, of course, some other information related to the need to compare specific information to find out ...

Originally this Tuesday to tidy up almost, has been because of trivial work and physical reasons, did not release, so, essays on the top of the content, we ignore ...

HTTP protocol Advanced (ii) URLs and resources

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.