1. What is a URL
The full name of the URL is uniform resoure Locator, the Uniform Resource Locator. A URL is the resource location that the browser needs to find information. When a person points a browser to a URL, the browser sends the appropriate protocol message behind the scenes to get the resources that people expect. When it comes to URLs, we have to say that URIs and urns are always the same concepts that come along.
The full name of the URI is uniform Resource Identifier, the Uniform Resource identifier.
The full name of the urn is Uniform Resource name, Uniform Resource naming.
Obviously, a URI is a more abstract concept that defines a unique identifier for a resource, and the URL and urn are two of his implementations. The former locates resources according to their location, and the latter locates resources according to their names. The HTTP protocol basically handles URLs.
The advent of URLs consolidates the way Internet resources are acquired, making it clear how resources are acquired.
2.URL syntax
URL syntax can be summed up in the following common format:
<scheme>://<user>:<password>@
- Scheme: scheme name, common is the http/https/ftp/mail and so on protocol. Scheme names are case-insensitive, i.e. http://www.baidu.com and HTTP://www.baidu.com are equivalent.
- User: Username, rare in HTTP protocol, default value is anonymous user "anonymous"
- Password: The password is the same as the user above. In the FTP protocol is more common, such as Ftp://user:[email protected]/download. If you do not specify a password, different browser implementations send a different default password.
- Host: Hosts, resources of the specific bearer machine. Generally use the domain name or IP to indicate that the use of IP can be directly targeted to specific machines, and the use of domain names need to be resolved after DNS to obtain IP.
- Port: Ports, specific applications on the machine. On one machine, one port corresponds to an application, and Host+port is used to locate the specific application of the resource. The default port for the HTTP protocol is 80,HTTPS, which is the default port of 443.
- Path: A hierarchical directory of paths and resources. Similar to the file system path, you can use multiple/hierarchical segmentation, each layer can be followed by parameters.
- Params: The parameters of the path, not commonly used but valid. For example: http://www.baidu.com/china;type=a/beijing;degree=b
- Query: Queries string, the key to interacting with the backend program, to? Begin. For example: http://www.baidu.com?item=a&color=b
- Frag: Fragment, also called Anchor Point. The previous section can be targeted to a specific resource file, and the fragment is used to identify the specific part. The fragment is not sent to the server, the server returns the entire object, and the browser shows different effects based on the fragment.
3.URL Character Set
- URLs are portable, so for secure transmission, use a smaller, universal character set.
- URLs are readable, so they are not visible, and non-printable characters cannot be used in URLs.
- URLs have integrity and can contain a variety of complex meanings, so unsafe character encoding is transferred to a secure character encoding through an escape mechanism.
For these reasons, the designer of the URL uses the US-ASCII encoding and introduces the concept of the transfer sequence. The specific way to translate this is to use an unsafe character with a percent percent, followed by two 16-bit binary numbers that represent the ASCII code of the character. For example, the space corresponding to the ASCII code is 32, so escaped to%20.
For Chinese characters, this is handled differently. Since 2 16-bit numbers are the length of a byte, the binary corresponding to the unsafe character is divided by bytes, followed by a% per byte. For example, "Hello" hexadecimal number is E4BDA0E5A5BD, after UrlEncode is%E4%BD%A0%E5%A5%BD.
URL of the HTTP protocol