Web application optimization skills for developers of Flickr

Source: Internet
Author: User
Tags file url php template perl script

By Cal Henderson
The next generation of Web applications makes JavaScript and CSS usable. We will tell you how to make these applications fast and agile.
We have established an application called "Web 2.0" and implemented rich content and interaction. We expect CSS and JavaScript to play more important roles. To make the application clean, we need to improve the files that render the page and optimize its size and shape to ensure the best user experience-in practice, this means a combination: make the content as small as possible, download as fast as possible, and avoid unnecessary re-acquisition of unchanged resources.
The situation is a bit complicated because of the CSS and JS files. Compared with images, the source code may be changed frequently. Once changed, the client needs to download it again to make the local cache invalid (the same is true for versions saved in other caches ). In this article, we will focus on how to make the user experience the fastest: including downloading the initial Page, downloading the subsequent page, and downloading resources as the application progresses and the content changes.
I always believe in this: developers should make things as simple as possible. Therefore, we prefer the methods that allow the system to automatically handle optimization problems. With only a little effort, we can build a rare environment: it makes development simple, has excellent terminal performance, and does not change the existing working method.
Hao freshman year

The old idea is that to optimize performance, you can merge multiple CSS and JS files into a very small number of large files. It is better to combine a 50 K file with ten 5k JS files. Although the total number of bytes in the Code remains unchanged, the overhead caused by multiple HTTP requests is avoided. Each request is created and eliminated on both the client and server, resulting in overhead of the request and response header, there are also more processes and thread resource consumption on the server side (there may be CPU time consumed for compressing content ).
(In addition to HTTP requests) concurrency is also important. By default, when persistent connections is used, ie and Firefox only download two resources at the same time in the same domain name (recommended in section 8.1.4 of HTTP 1.1 specifications) (htmlor Note: You can change this default configuration by modifying the Registry and other methods ). This means that image resources cannot be downloaded while we are waiting to download two JS files. That is to say, the user cannot see the image on the page during this period of time.
(Although merging files can solve the above two problems), this method has two disadvantages. First, package all resources together and force the user to download all resources at a time. If (rather than doing this) converts a large part of content into multiple files, the download overhead will be distributed to multiple pages, and the speed pressure in the session will be reduced (or some overhead will be completely avoided, it depends on the selected path ). If you download the initial Page slowly to make the subsequent page download faster, we will find that more users are not waiting to open the next page.
Second, if a single file system is used in an environment with frequent file changes, the client needs to re-download all CSS and JS files each time the file is changed. If our application has a K merged JS large file, any minor changes will force the client to digest the K again.
Decomposition path

(It seems inappropriate to merge large files .) The alternative solution is a compromise: the CSS and JS resources are scattered into multiple sub-files, and the number of files is kept as small as possible by function. This solution also has a price. Although code distributed into logical chunks in the Development era can improve efficiency, files must be merged to improve performance during download. However, it is okay to add something to the build system (which turns the development code into a tool set for the product code for deployment.
For applications with different development and product environments, some simple technologies can be used to better manage the code. In the development environment, to make the code clear, the code can be divided into multiple logical parts (logical components ). You can create a simple function in smarty (a PHP template language) to manage JavaScript downloads:
Smarty:
{Insert_js files = "foo. JS, bar. JS, Baz. js "}

PHP:
Function smarty_insert_js ($ ARGs ){
Foreach (explode (',', $ ARGs ['files']) as $ file ){
Echo "<SCRIPT type =/" text/JavaScript/"Source =/"/JavaScript/$ FILE/"> </SCRIPT>/N ";
}
}

Output:
<SCRIPT type = "text/JavaScript" Source = "/JavaScript/Foo. JS "> </SCRIPT> <SCRIPT type =" text/JavaScript "Source ="/JavaScript/bar. JS "> </SCRIPT> <SCRIPT type =" text/JavaScript "Source ="/JavaScript/Baz. JS "> </SCRIPT>

(Htmlor Note: WordPress will replace "src" with an unknown character, so only "Source" is written here. Please replace it with your code, the same below)
That's simple. Then we run the build process command to merge the identified files. In this example, foo. js and bar. js are merged because they are almost always downloaded together. We can let the application configuration remember this and modify the template function to use it. (The Code is as follows :)
Smarty:
{Insert_js files = "foo. JS, bar. JS, Baz. js "}

PHP: # source file ing diagram. After merging files in the build process, use this figure to find the JS source file.

$ Globals ['config'] ['js _ source_map '] = array ('foo. JS '=>' foobar. js', 'bar. JS '=>' foobar. js', 'baz. JS '=>' Baz. js ',);
Function smarty_insert_js ($ ARGs ){
If ($ globals ['config'] ['is _ dev_site ']) {
$ Files = explode (',', $ ARGs ['files']);
} Else {
$ Files = array ();
Foreach (explode (',', $ ARGs ['files']) as $ file ){
$ Files [$ globals ['config'] ['js _ source_map '] [$ file] ++;
}
$ Files = array_keys ($ files );
}
Foreach ($ files as $ file ){
Echo "<SCRIPT type =/" text/JavaScript/"Source =/"/JavaScript/$ FILE/"> </SCRIPT>/N ";
}
}

Output:
<SCRIPT type = "text/JavaScript" Source = "/JavaScript/foobar. JS "> </SCRIPT> <SCRIPT type =" text/JavaScript "Source ="/JavaScript/Baz. JS "> </SCRIPT>

The source code in the template does not need to be changed to adapt to the development and product stages respectively. It helps us to keep the files scattered during development and merge the files when releasing them into products. If you want to go further, you can write the merge process in PHP and execute it using the same (merged file) configuration. In this way, there is only one configuration file to avoid synchronization problems. In order to do better, we can also analyze the probability that CSS and JS files appear simultaneously on the page, to determine which files to merge is the most reasonable (files that appear almost always at the same time are the first choice for merging ).
For CSS, You can first create a master-slave relationship model, which is very useful. A primary style table controls all style sheets of an application. Multiple child style sheets control different application regions. In this way, most pages only need to download two CSS files, and one of them (the main style table) will be cached at the first request of the page.
For applications that do not have many CSS and JS resources, this method may be slower than a single large file in the first request. However, if the number of files is small, you will find it faster, because the data size of each page is smaller. The download cost is distributed across different application regions, so the number of concurrent downloads is kept at a minimum, and the average download data volume on the page is also small.
Compression

When talking about resource compression, most people will immediately think of mod_gzip (but be careful that mod_gzip is actually a devil, at least a nightmare ). The principle is simple: when a browser requests a resource, it sends a header indicating that it can accept the content encoding. Like this:
Accept-encoding: gzip, when the deflate server encounters such a header request, it uses gzip or deflate to compress the content to the client and decompress it. This process reduces the data transmission volume and consumes the CPU time of the client and server. It is also unsatisfactory. However, mod_gzip works like this: first create a temporary file on the disk, then send it to the client, and finally delete the file. In high-capacity systems, due to disk Io problems, the limit will soon be reached. To avoid this situation, you can use mod_deflate (only supported by Apache 2 ). It adopts a more reasonable method: compress in the memory. For Apache 1 users, a RAM disk can be created to allow mod_gzip to write temporary files on it. Although there is no pure memory mode, it is not slower than writing files to the disk.
Even so, there is still a way to completely avoid the compression overhead, that is, pre-compress the relevant static resources. during download, mod_gzip provides the appropriate compression version. If compression is added to the build process, it is transparent. There are usually few files to be compressed (images do not need to be compressed, because they cannot reduce more sizes), only CSS and JS files (and other uncompressed static content ).
The configuration option tells mod_gzip where to find the pre-compressed file.
Mod_gzip_can_negotiate yesmod_gzip_static_suffix. gzaddencoding gzip. GZ add an additional configuration option to the mod_gzip version (starting from 1.3.26.1a) to automatically pre-compress the file. However, before that, you must ensure that Apache has the correct permissions to create and overwrite compressed files.
Mod_gzip_update_static yes unfortunately, it's not that easy. Some versions of Netscape 4 (especially 4.06-4.08) think they can interpret the Compressed Content (they send a header), but they cannot be decompressed correctly. Most other versions of Netscape 4 have various problems when downloading compressed content. So we need to test the proxy type on the server side (if it is Netscape 4, We Need To) to get them uncompressed versions. This is simple. IE (Version 4-6) has some more interesting problems: When downloading compressed JavaScript, sometimes ie may incorrectly decompress the file or decompress the file halfway, then, the half files are displayed on the client. If your application depends heavily on JavaScript (htmlor Note: for example, Ajax application), you must avoid sending compressed files to IE. In some cases, some older 5.x versions of IE can correctly receive compressed JavaScript, But they ignore the etag header of the file and do not cache it. (Thincat friendly reminder: Although compression is incompatible with Some browsers, the number of browsers that cannot be compressed is now very small, I think the compression caused by the browser is not normal. Can these outdated browsers be installed in popular Windows or UNIX environments)
Since there are so many problems with gzip compression, we may wish to focus on the other side: do not change the file format compression. Many of these JavaScript compression scripts are available now. Most of them use a regular expression-driven statement set to reduce the size of the source code. They do a few things: Remove comments, compress spaces, and shorten private variable names and remove negligible syntaxes.
Unfortunately, most scripts do not work as well, either with a relatively low compression rate, or in some cases, code is messy (or both ). Due to incomplete understanding of the parsing tree, it is difficult for the compressors to distinguish a comment from a reference string that looks like a comment. Because of the mixed use of closed structures, it is not easy to use regular expressions to identify which variables are private, so some techniques that shorten the variable name will disrupt some closed code.
Fortunately, there is a compressed file to avoid these problems: the dojo compressed file (the ready-made version is here ). It uses rhino (Mozilla's JavaScript Engine, implemented in Java) to create a parsing tree and submit it to the file. It can reduce the size of the Code, with only a small cost: because it is compressed only once during build. Since compression is implemented in the build process, it is clear. (Since there is no problem with compression,) We can add spaces and comments as we like in the source code without worrying about the impact on the product code.
Compared with JavaScript, CSS file compression is simpler. Since the CSS syntax does not contain too many referenced strings (usually URL paths and fonts), we can use regular expressions to kill spaces. (htmlor Note: This sentence is the best, haha ). If there is a reference string, we can always combine a string of spaces (because you do not need to search for multiple spaces and tabs in the URL path and the body name ). In this case, a simple Perl script is enough:

#! /Usr/bin/perl
My $ DATA = '';
Open F, $ argv [0] or die "can't open source file: $! ";
$ Data. =$ _ while <F>;
Close F;
$ DATA = ~ S! /*(.*?) */!! G; # Remove comments
$ DATA = ~ S! S +! ! G; # compress Spaces
$ DATA = ~ S !} !} /N! G; # add a line break after ending braces
$ DATA = ~ S! /N $ !!; # Delete the last line feed
$ DATA = ~ S! {! {! G; # Remove spaces after braces
$ DATA = ~ S !; }!}! G; # Remove spaces before ending braces
Print $ data;

Then, you can pass a single CSS file to the script for compression. The command is as follows:
Perl compress. pl site.source.css> site.compress.css
After completing these simple text optimization work, we can reduce the amount of data transferred by up to 50% (this amount depends on your code format and may be more ). This provides a faster user experience. But what we really want to do is to avoid user requests as much as possible-unless necessary. Now the HTTP cache knowledge comes in handy.
Cache is a good thing

When a user agent (such as a browser) requests a resource from the server, it caches the server's response after the first request to avoid repeated identical requests. The length of cache time depends on two factors: the proxy configuration and the Cache control header of the server. All browsers have different configuration options and processing methods, but most of them cache at least one resource until the session ends (unless explicitly notified ).
To prevent the browser from caching pages with frequent changes, you may have sent headers without caching dynamic content. In PHP, the following two commands can be performed:

<? PHP
Header ("cache-control: Private ");
Header ("cache-control: No-Cache", false );
?>

It sounds too simple? Indeed -- Some proxies (browsers) will ignore these headers in some environments. Make sure that documents are not cached in the browser:

<? PHP
# Make it "invalid" in the past"
Header ("expires: Mon, 26 Jul 1997 05:00:00 GMT"); # Always changed
Header ("last-modified:". gmdate ("D, D m y h: I: s"). "GMT"); # HTTP/1.1
Header ("cache-control: No-store, no-cache, must-revalidate ");
Header ("cache-control: Post-check = 0, pre-check = 0", false); # HTTP/1.0
Header ("Pragma: No-Cache");?>

In this way, the content we do not want to cache is enough. However, the browser should be encouraged to cache content that will not be changed during each request. The "If-modified-since" request header can do this. If the client sends an "if-modified-since" header in the request, Apache (or other servers) will respond with status code 304 (not changed, tell the browser that the cache is up to date. Using this mechanism can avoid repeatedly sending files to the browser, but it still results in the consumption of an HTTP request. Well, think about it.
Similar to the IF-modified-since mechanism, entity tag (entity tags) is used ). In Apache, each response to a static file sends an "etag" header, which contains a checksum generated by the file modification time, file size, and inode ). Before downloading an object, the browser sends a head request to check the object's etag. Etag and if-modified-since have the same problem: the client still needs to execute an HTTP request to verify whether the local cache is valid.
In addition, if you use multiple servers to provide content, be careful to use if-modified-since and etags. In the two Server Load balancer environments, for a proxy (browser), a resource can be obtained from server a this time and from server B next time (htmlor note: the LVS load balancing system is a typical example ). This is good, and it is also the reason for using load balancing. However, if the two servers generate different etag or file modification date for the same file, the browser will be at a loss (each time it will be downloaded again ). By default, etag is generated by the inode Number of the file, while the inode numbers of files are different between multiple servers. You can use Apache configuration options to turn it off:
Fileetag mtime size using this option, Apache will only use the file modification date and file size to determine etag. Unfortunately, this leads to another problem (the same can affect if-modified-since ). Since etag depends on the modification time, You Have To synchronize time. When you can upload files to multiple servers, the upload time difference is usually one to two seconds. In this way, the etag generated by the two servers is still different. Of course, we can also change the configuration so that the generation of etag depends only on the file size, but this means that if the file content changes but the size does not change, the etag will not change. This is not acceptable.
Cache is really a good thing

It seems that we are starting to solve the problem from the wrong direction. (Now the problem is) these possible cache policies cause one thing to happen repeatedly, that is, the client queries the server to check whether the local cache is up to date. If the server notifies the client when changing the file, the client will not know that its cache is up to date (until the next notification is received )? Unfortunately, tiangong is not doing well-(fact) is that the client sends a request to the server.
In fact, it is not clear. Before obtaining JS or CSS files, the client sends a request to the server using the <SCRIPT> or <link> flag, indicating which page to load these files. At this time, you can use the server response to notify the client of changes to these files. A bit vague. The details are as follows: if the CSS and JS file content is changed, their file names will also be changed, you can tell the client that all the URLs are permanently cached-because each URL is unique.
If we can determine that a resource will never change, we can issue some domineering cache headers (htmlor Note: This sentence is also very imposing ). In PHP, just two lines:

<? PHP
Header ("expires:". gmdate ("D, D m y h: I: s", time () + 315360000). "GMT ");
Header ("cache-control: Max-age = 315360000 ");
?>

We told the browser that the content will expire in 10 years (about 315,360,000 seconds in 10 years, or more) and the browser will keep it for 10 years. Of course, it is very likely that PHP does not need to output CSS and JS files (So header cannot be issued). This situation will be explained later.
Sometimes poor human resources

It is dangerous to manually change the file name when the file content is changed. If you change the file name, but the template does not point to it? If you have changed some templates, but haven't changed them? Suppose you have changed the Template but haven't changed the file name? What's worse, if you forget to change the name of the file or forget to change its reference? The best result is that the user can see the old but not the new content. The worst result is that the website cannot run because the file cannot be found. It sounds like this (modifying the URL when modifying the file content) is a bad idea.
Fortunately, computers do this kind of thing-when a change happens, it needs to be done exactly, repeat and repeat (htmlor Note: Tomato eggs waiting ~) Boring work-always very good.
This process (modifying the File URL) is not so painful, because we do not need to change the file name at all. The resource URL and the file location on the disk do not need to be consistent. Using the mod_rewrite module of Apache, you can establish simple rules to redirect a specified URL to a specified file.

Rewriteengine onrewriterule ^/(. *.) V [0-9.] +. (CSS | JS | GIF | PNG | JPG) $/$1 $2 [l]

This rule matches any URL with a specified extension that also contains "version" Information (version nugget). It redirects these URLs to a path without version information. As follows:

URL path/images/foo.v2.gif->/images/foo.gif/CSS/main.v1.27.css->/CSS/main.css/JavaScript/md5.v6. js->/JavaScript/md5.js

With this rule, you can change the URL without changing the file path (because the version number has changed ). Because the URL is changed, the browser considers it as another resource (which will be downloaded again ). If you want to further develop, you can combine the previously mentioned script grouping function to generate a <SCRIPT> tag list with a version number as needed.
Here, you may ask me, why not add a query string (such as/CSS/main.css) at the end of the URL? V = 4 )? According to the HTTP cache specification, the user agent never caches URLs containing query strings. Although IE and Firefox ignore this, opera and Safari do not -- do not use query strings in URLs to ensure that all browsers cache your resources.
Now, you can change the URL without moving the file. It is better if you can make the URL update automatically. In a small product environment (if there is a large product environment, it is the development environment), you can easily implement this using the template function. Smarty is used here, and other template engines are also used.

Smarty:
<Link xhref = "{version xsrc = '/CSS/group.css'}" rel = "stylesheet" type = "text/CSS"/>

PHP:
Function smarty_version ($ ARGs ){
$ Stat = Stat ($ globals ['config'] ['site _ root']. $ ARGs ['src']);
$ Version = $ stat ['mtime'];
Echo preg_replace ('!. ([A-Z] + ?) $! ', ". V $ version. $1", $ ARGs ['src']);
}

Output:
<Link xhref = "/CSS/group.v1234567890.css" mce_href = "/CSS/group.v1234567890.css" rel = "stylesheet" type = "text/CSS"/>

For each linked resource file, we get its path on the disk and check its mtime (the last modification date and time of the file ), then insert the time into the URL as the version number. This solution is good for low-traffic sites (with little overhead for their stat operations) or development environments, but it is not applicable to high-capacity environments-because each stat operation requires disk reading (resulting in increased server load ).
The solution is quite simple. In a large system, each resource has a version number, that is, the revision number of Version Control (you should have used version control, right ?). When we create a site for deployment, we can easily find the revision number of each file and write it in a static configuration file.

<? PHP
$ Globals ['config'] ['Resource _ version'] = array ('/images/foo.gif' => '2. 1', '/CSS/main.css' => '1. 27', '/JavaScript/md5.js' =>' 6. 1.4 ',);
?>

When releasing a product, you can modify the template function to use the version number.

<? PHP
Function smarty_version ($ ARGs ){
If ($ globals ['config'] ['is _ dev_site ']) {
$ Stat = Stat ($ globals ['config'] ['site _ root']. $ ARGs ['src']);
$ Version = $ stat ['mtime'];
} Else {
$ Version = $ globals ['config'] ['Resource _ version'] [$ ARGs ['src'];
}
Echo preg_replace ('!. ([A-Z] + ?) $! ', ". V $ version. $1", $ ARGs ['src']);}
?>

In this way, you do not need to change the file name or remember which files have been changed. When a new version of the file is released, its URL will be automatically updated.-Interesting? We can get it done.
Only Dongfeng

I mentioned earlier that cache headers that send very-long-period messages to static files cannot send cache headers easily without PHP output. Obviously, there are two solutions: PhP output or Apache.
PHP is out of the box. All we need to do is change the rewrite rules, point the static file to the PHP script, and use PHP to send the header before outputting the file content.

Apache:
Rewriterule ^/(. *.) V [0-9.] +. (CSS | JS | GIF | PNG | JPG) $/redir. php? Path = $1 $2 [l]

PHP:
Header ("expires:". gmdate ("D, D m y h: I: s", time () + 315360000). "GMT ");
Header ("cache-control: Max-age = 315360000"); # ignore the path ".."
If (preg_match ('!..! ', $ _ Get [path]) {go_404 () ;}# ensure that the path starts with a specified directory.
If (! Preg_match ('! ^ (JavaScript | CSS | images )! ', $ _ Get [path]) {go_404 () ;}# does the file not exist?
If (! File_exists ($ _ Get [path]) {go_404 () ;}# issue a file type
Header $ ext = array_pop (explode ('.', $ _ Get [path]);
Switch ($ ext ){
Case 'css ':
Header ("Content-Type: text/CSS ");
Break;
Case 'js ':
Header ("Content-Type: text/JavaScript ");
Break;
Case 'gif ':
Header ("Content-Type: image/GIF ");
Break;
Case 'jpg ':
Header ("Content-Type: image/JPEG ");
Break;
Case 'png ':
Header ("Content-Type: image/PNG ");
Break;
Default: Header ("Content-Type: text/plain ");
}
# Output file content
Echo implode ('', file ($ _ Get [path]);
Function go_404 (){
Header ("HTTP/1.0 404 file not found ");
Exit;
}

This solution is effective but not outstanding. (Because) PHP requires more memory and execution time than Apache. In addition, we must be careful to prevent exploits that may be caused by passing forged values by the path parameter. To avoid these problems, use Apache to directly send headers. The rewrite rule statement allows you to set the environment variable (environment variable) when the rule matches. After the given environment variable is set, the header command can add the header. In combination with the following two statements, we bound the rewrite rule and header settings:

Rewriteengine onrewriterule ^ /(. *.) V [0-9.] +. (CSS | JS | GIF | PNG | JPG) $/$1 $2 [L, E = versioned_file: 1]
Header Add "expires" "mon, 28 Jul 2014 23:30:00 GMT" Env = versioned_fileheader Add "cache-control" "Max-age = 315360000" Env = versioned_file
Considering the Apache execution sequence, you should add the rewrite rules to the master configuration file (httpd. conf) instead of the Directory configuration file (. htaccess. Otherwise, the header will be executed first (that makes no sense) before setting environment variables ). As for the header line, it can be placed in either of the two files. There is no difference.
Yan guan6 Road

(Htmlor Note: Thanks to tchaikov for telling me the meaning of "skinning rabbits", but I don't want to turn it over too formal. It should not be too outrageous .)
By combining the above technologies, we can build a flexible development environment and a fast and high-performance product environment. Of course, there is still some distance from the ultimate goal "Speed. There are many deeper technologies (such as separating static servo content and increasing concurrency with multiple domain names) worth our attention, including the methods we talked about (creating Apache filters, modifying resource URLs, add version information. You can leave a comment to tell us the effective technologies and methods you are using.
(End)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.