How long should a URL be? Why should I raise this question? There are many optimization guidelines for optimizing web page requests and loading by minimizing cookies, shortening URLs, and using GET requests as much as possible. However, the so-called "as much as possible" and "as much as possible" are only qualitative descriptions. in quantitative terms, the number of bytes to be shortened is only small?
In a revision of our home page, I saw several interesting URLs of. js files through HTTP analyzers, as shown in the following figure:
Https://static.alipay.net/build/js/app/tracker.js? V = 083
Https://static.alipay.net/build/js/home/home.js? T = 20101012
Https://static.alipay.net/build/js/pa/alieditcontrol-update.js? T = 20101012
Https://static.alipay.net/javascript/arale_v1.0.js
Https://static.alipay.net/min? B = JavaScript & F = arale/lang/aspect. JS, arale/lang/md5.js, arale/lang/uri. JS, arale/lang/tmpl. JS, arale/lang/date. JS, arale/lang/number. JS, arale/HTTP/jsonp. JS, arale/HTTP/Ajax. JS, arale/HTTP/core. JS, arale/event/event-chain.js, arale/class/declare. JS, arale/FX/animator. JS, aralex/widget. JS, aralex/tplwidget. JS, aralex/view. JS, aralex/TAB/tab. JS, aralex/dropdown. JS, aralex/slider. JS, aralex/slider/switchslider. JS
Https://static.alipay.net/build/js/app/tracker.js? V = 083
Https://static.alipay.net/build/js/home/home.js? T = 20101012
Https://static.alipay.net/build/js/pa/alieditcontrol-update.js? T = 20101012
Https://static.alipay.net/javascript/arale_v1.0.js
Https://static.alipay.net/min? B = JavaScript & F = arale/lang/aspect. JS, arale/lang/md5.js, arale/lang/uri. JS, arale/lang/tmpl. JS, arale/lang/date. JS, arale/lang/number. JS, arale/HTTP/jsonp. JS, arale/HTTP/Ajax. JS, arale/HTTP/core. JS, arale/event/event-chain.js, arale/class/declare. JS, arale/FX/animator. JS, aralex/widget. JS, aralex/tplwidget. JS, aralex/view. JS, aralex/TAB/tab. JS, aralex/dropdown. JS, aralex/slider. JS, aralex/slider/switchslider. JS
Pay attention to the last one. Well, don't be surprised. The exact length of a URL is 443 bytes. But is it long? Still not long?
Take ie as an example. The URL length can be 2048 bytes ~~ In any case, ie can handle it. In fact, the general browser is okay, so "correctness" is okay. So next we will talk about efficiency.
I. packet header in TCP/IP protocol
In TCP/IP networks, the underlying protocol is the same thing, and the application layer protocol is the same thing. Therefore, as an application layer protocol, HTTP can transmit much content and how to transmit it. (for example, an HTTP packet is typically bounded by 48 k. If it exceeds 48 k, an application layer packet is subcontracted, the so-called multipart) are all agreed by the application layer. In the underlying protocol, the link layer and the transport layer have their own conventions on the "transfer size package. In short, the transport layer defines the MSs (maximum segment size) of IP data packets, and the link layer defines MTU (maximum transmission unit ). If the size of an IP packet exceeds MTU (MSS + TCP Header + IP header> MTU), the IP packet is split into multiple information packets for transmission at the link layer.
MSS is related to different transmission environments and has two recommended values. Generally,
-When the target address is not a local address (the source address is in a different network segment), the default value of MSS is usually 536. Otherwise,
-The default value of MSS is usually 1460.
MTU is related to the network environment and has two recommended values. Generally,
-The serial port is 576 bytes;
-Ethernet is 1500 bytes.
The two recommended values of MTU and MSS are 40 bytes different, that is, the general value of (TCP Header + IP header, this value is limited to 120 bytes (20 + 20 bytes of IP/TCP Header; 40 + 40 bytes of IP/tcp optional Header ). Therefore, in a complex network environment, the maximum size of a single data packet available for the network protocol at the application layer should be less than 536-80 = 456 bytes, the limit is 1460-80 = 1380 bytes. Such restrictions are the result of comprehensive consideration of transport layer and link layer protocols. However, the 536/1460 value is also used in some common suggestions. There is no essential difference from the discussion here. I just stressed that if we want a "sufficiently optimized request", what is the limit?
Ii. packet header in HTTP
Now, we come to the HTTP application layer protocol. An HTTP request consists of a header and a data zone. For an http get request, there can be only a header but no data zone, the reason is that the content of the HTTP header is as follows (the header must end with two consecutive carriage returns ):
---------
Get (...) HTTP/1.1
Accept :*/*
Referer: http://www.alipay.net/
Accept-language: ZH-CN
User-Agent :(...)
Accept-encoding: gzip, deflate
HOST: static.alipay.net
Connection: keep-alive
COOKIE :(...)
---------
---------
Get (...) HTTP/1.1
Accept :*/*
Referer: http://www.alipay.net/
Accept-language: ZH-CN
User-Agent :(...)
Accept-encoding: gzip, deflate
HOST: static.alipay.net
Connection: keep-alive
COOKIE :(...)
---------
Here, get (...) can be followed by a complete GET request URL, and the parameters of the GET request are also placed on this URL, so there is no need for a separate data zone. In the preceding HTTP request, some specific clients may have several or fewer HTTP head fields, but the fields are usually relatively short. We only use this example to illustrate how many bytes are used for the "default (incomplete) HTTP header?
The answer is 184 bytes. However, it should be emphasized that Referer is directly related to the currently browsed URL. For example, the currently browsed page is a 500-byte long URL, the Referer field will fill in the 500-byte URL when the hyperlink is clicked on the current webpage. The URL that is too long in the webpage will consume more transmission when the hyperlink is clicked. This is also an example.
So we will not discuss the influence of the Referer field. Taking the preceding example as an example, the best value we can use is 456-184 = 272 bytes. The 272 bytes are used in three places: Get, User-Agent, and cookie. The User-Agent field is related to the browser. Different browsers and the browser process different operating systems. This field is often used in JS and server statistics software to determine the browser environment, such as OS and version. The value of this field is sometimes long. Take my current machine as an example. The value is:
---------
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; qqwubi 108; embeddedwb 14.52 from: http://www.bsalsa.com/embeddedwb 14.52 ;. net CLR 2.0.50727; infopath.2 ;. net CLR 3.0.20.6.2152 ;. net CLR 3.5.30729 ;. net CLR 1.1.4322 ;. net CLR 3.5.21022 ;. net4.0c ;. net4.0e)
---------
Occupies 274 bytes. That is to say, in the ideal environment, 456 bytes is not enough. Based on the previous discussions, we can proceed to the next step:
-Use the 536-byte boundary value, that is, the optional Header of the 80-byte TCP/IP is not considered.
In addition, we need to emphasize the variability of the User-Agent length, for example, the preceding "embeddedwb ......" 64 bytes may not exist in general computers. This is a third-party component. Similarly, this field may be longer due to other browser environments (such as aoyou. Based on this fact, I still analyze this special situation in this example.
Taking 536 bytes as an example, we actually have 78 bytes available, so here we set the optimization level 1 to 70 bytes. It is recommended that the company obtain a balance value based on the data collected by the server.
3. Cookie consumption can be reduced to 0
Currently, the cookie is the largest consumption. Taking my current machine as an example, this value has several situations (for different protocols and domains, it is different ):
(1) For the Home Page http://www.alipay.net/, the value is 49:
Ali_apache_id = 12.1.11.70.1275978936200.5; lastpg =
(2) For http: // * .alipay.net/, the value is 171:
Ali_apache_id = 12.1.11.70.1275978936200.5; ali_apache_sid = 12.1.46.46.128998714836.4 | 1289988948; alipayjsessionid = nobody; ali_apache_tracktmp = uid =
(3) For https://static.alipay.net/, the value is 307:
CNA = Alibaba; ali_apache_id = Alibaba; paymethod = directpay; _ tb_order = 38016166656317; defaultbank = ICBC; _ utma = Alibaba; _ utmz = Alibaba = life.alipay.net | utmccn = (referral) | utmcmd = referral | utmcct =/index. PHP
(4) For HTTP (s): // img.alipay.net/, the value has a slash:
Apay_id = signature; CNA = akaaahybhu0baemdahlnhncd; ali_apache_id = signature; paymethod = directpay; _ tb_order = 38016166656317; defaultbank = ICBC; _ utma = signature; _ utmz = 22931947.1282287558.2.2.utmcsr = life.alipay.net | utmccn = (referral) | utmcmd = referral | utmcct =/index. PHP
(5) other cases.
Why does the cookie usage surge in the case of 2, 3, and 4? In fact, although there are slight differences between 3 and 4 cases, the root cause of the problem is exactly the same as that of Case 2. Therefore, this document uses case 2 as an example. Tracking its HTTP request process:
-When a homepage is requested, the server returns four set-Cookie responses.
The four responses (HTTP Response head) are as follows:
--------
Set-COOKIE: ali_apache_sid = 10.2.46.46.128998714836.4 | 1289988948; Path =/; domain = .alipay.net
Set-COOKIE: JSESSIONID = a8ce523aea03e2c990d6796d6baec81e; Path =/
Set-COOKIE: alipayjsessionid = bywcn4wq0z5fbcohzfpn2f1xxdambepay; domain = .alipay.net; Path =/
Set-COOKIE: ali_apache_tracktmp = uid =; domain = .alipay.net; Path =/
--------
Therefore, all subsequent HTTP requests will use 171-byte cookies, for example, response (3. However, obviously, these cookies are meaningless in at least the following situations:
-If a redirected page is accessed, including status code: 302 redirection, and HTTP-meta redirection in the HTML page;
-If the accessed page is cached, for example, "not modified" of status code 304 is returned ";
-If the webpage is static, you do not need to identify cookie', such as the .img;.jsand .css files in static.alipay.net.
Obviously, images or other static resources in IMG and static can be cached, And the cookie value is meaningless whether cached or accessed for the first time. For static page (.html), if we do not want to analyze the access to static pages through HTTP server, these cookies are not required. Refer static page, we may only need to analyze the session ID of the user access chain ).
The method to optimize cookies is simple: deploy these static resources in a server/group that does not use .alipay.net as the domain, or use other independent domain names. In this case, the cookie consumption can be reduced to 0 for the specific -- of course the largest part -- resource.
4. Shorten the URL
The question is: how long can a URL be? Through the previous analysis, we still have 70 million character books available. Even under certain conditions, we need to leave track data (such as session) for some page access, then we still have 40 ~ 50 bytes can be used. However, we are still far from the 443 bytes mentioned in the beginning of this article.
But do we really need such a long URL?
the answer is no. we can shorten the URL. For example, in the previous example, the get part of the original URL is:
---------
/min? B = JavaScript & F = arale/lang/aspect. JS, arale/lang/md5.js, arale/lang/uri. JS, arale/lang/tmpl. JS, arale/lang/date. JS, arale/lang/number. JS, arale/HTTP/jsonp. JS, arale/HTTP/Ajax. JS, arale/HTTP/core. JS, arale/event/event-chain.js, arale/class/declare. JS, arale/FX/animator. JS, aralex/widget. JS, aralex/tplwidget. JS, aralex/view. JS, aralex/TAB/tab. JS, aralex/dropdown. JS, aralex/slider. JS, aralex/slider/switc Hslider. js
---------
/min? B = JavaScript & F = arale/lang/aspect. JS, arale/lang/md5.js, arale/lang/uri. JS, arale/lang/tmpl. JS, arale/lang/date. JS, arale/lang/number. JS, arale/HTTP/jsonp. JS, arale/HTTP/Ajax. JS, arale/HTTP/core. JS, arale/event/event-chain.js, arale/class/declare. JS, arale/FX/animator. JS, aralex/widget. JS, aralex/tplwidget. JS, aralex/view. JS, aralex/TAB/tab. JS, aralex/dropdown. JS, aralex/slider. JS, aralex/slider/switchslider. JS
---------
After careful observation, it actually means
---------
/Min? B = JavaScript & F =...
---------
The field F is followed by some static Resources in the arale script project. On the server side, MinProgramAccording to the parameter "B = JavaScript & F =... "splice some script fragments into a separate one. the JS file is returned to the browser. If no change is made, the status code: 304 is returned directly.
In fact, the parameter blocks after the "F =..." field in each request will be exactly the same. Or, even if the list of files to be spliced is different under different circumstances, there is only a limited combination. This makes us naturally think of something: sum. In this way, calculate a key (such as hash, MD5, and CRC) for the preceding string, and then we can use this unique key to find the spliced key. JS content-this also means that the min program does not need to splice text every time. In this way, the above URL can be changed to (take the 396-byte CRC32 after the f field as an example ):
---------
/Min? B = JavaScript & F = 313466db
---------
Considering different version management:
---------
/Min? B = JavaScript & V = 0.9b & F = 313466db
---------
Now, we control the URL to a relatively small scale, and add version management and content validation. If necessary, the server-side min program can also be dynamically generated and cached. These transformations do not conflict with our original needs. It is important that we have successfully controlled the GET request to 35 bytes, and the remaining space fully satisfies our needs.
Overall Optimization needs: Level 1 optimization, 70 bytes!
V. technical maturity and Value
1. twritter has long used this technology.
2. Similar to the arale project, yql (Yahoo! Query Language) projects also have similar requirements, so they use the above technology to convert "upload an SQL statement in a URL" into a short name, for example:
Http://y.ahoo.it/iHQ8c0sv
Equivalent
Http://developer.yahoo.com/yql/console? Q = select % 20 woeid % 20 from % 20geo. Places % 20 where % 20 text % 3d % 22san % 20 Francisco % 2C % 20ca % 22
3. Microsoft is still "stupid and unclear", so it is very slow to see their official website. ^.
4. When we have the condition to reduce the HTTP header to less than 456 bytes, we should do our best. For example, trademanager can customize HTTP request headers to reduce User-Agent fields because it has independent clients.
5. When we always send a minimal HTTP request from the browser, the network can always submit the request to the server as quickly as possible without waiting for multiple packages to be combined. This will be extremely effective in slow networks and networks with large volumes of packet loss. Simply put, if someone uses thunder or Bt in the LAN, minimizing HTTP requests will significantly improve the web browsing experience.
6. We should manage versions of static resources such as scripts.