Build a front-end performance monitoring system in seven days

Last Update:2014-08-01 Source: Internet

Author: User

Tags chrome developer webpagetest

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

A few days ago, w3ctech's entry into the famous enterprise-Baidu frontend Fex session once said that he could build his own front-end performance monitoring system seven days after listening to the lecture. Since he said it, he could not give up. In the previous article, the beauty of front-end data I believe everyone has a certain understanding of front-end data. The following describes the performance data and its monitoring in detail.

Start action

The performance in this article mainly refers to the web page loading performance, but I still don't know the performance yet? Don't worry. Next "every day" will join me in the front-end performance world.

Why is performance monitored on day 1?

"If you cannot measure it, you cannot improve it" ---- William Thomson

This is the most basic question. Why do we need to pay attention to and monitor front-end performance? For a company, performance is directly related to benefits to a certain extent. There are a lot of research data in this field abroad:

Performance	Benefits
Google latency400 ms	Decrease in search volume0.59%
Bing latency2 S	Revenue decline4.3%
Yahoo Delay400 ms	Traffic drop5-9%
Decrease when the Mozilla page is opened2.2 s	Download increase15.4%
Netflix enables Gzip	Performance improved by 13.25% bandwidth reduced50%

Data source: http://www.slideshare.net/bitcurrent/impact-of-web-latency-on-conversion-rates http://stevesouders.com/docs/jsdayit-20110511.pptx

Why does performance affect the company's earnings? The root cause is thatPerformance affects user experience. Loading delays and slow operations all affect the user experience. Especially on mobile terminals, users have low tolerance for page response latency and connection interruption. Imagine that you open a webpage with your mobile phone and want to see a piece of information but load it for half a day. You may choose to directly Leave the webpage. Google also regards page loading speed as a weight of Seo. There are many surveys on the impact of page loading speed on user experience and SEO.

Although performance is very important, it is inevitable that the development iteration process will be ignored,Performance degrades along with product Iteration. Especially on mobile terminals, the network has always been a major bottleneck, but the page is growing bigger and more complicated. There are no simple gold rules to optimize the performance. We need a performance monitoring system to continuously monitor, evaluate, and warn the page of performance conditions, identify bottlenecks, and guide the optimization.

What tools are available for day 2?

To do something better, you must first sharpen your tools.

There are many mature and excellent tools for page performance evaluation and monitoring, and reasonable use of existing tools can achieve twice the result with half the effort. The following describes several common tools:

Page speed

Page speed is a tool developed by Google to analyze and optimize web pages. It can be used as a browser plug-in. The tool detects websites based on a series of optimization rules, and provides detailed suggestions for failed rules. Similar Tools include yslow. We recommend that you use the gtmetrix website to view the results of multiple analysis tools at the same time, as shown in:

Webpagetest

Webpagetest is an excellent front-end performance testing tool for Web pages. It is open-source. You can use the online version or build it on your own. A performance testing platform built using webpagetest is also available in China. Alibaba testing is recommended (the following example uses Alibaba testing for testing ).

With webpagetest, you can learn in detail the waterfall stream, performance score, element distribution, View analysis, and other data during website loading. The intuitive view analysis function allows you to directly view screenshots of various stages of page loading:

Note: Click here for the entire Test Result

It intuitively shows two important time points for browsing websites: The white screen time and the first screen time, that is, how long can users see the content on the page, and how long the first screen is rendered (including images and other elements loaded ). These two time points directly determine how long the user will wait to see the information he wants to see. Google's optimization recommendations also mentioned reducing CSS and Js for non-first screen use, so as to present the first screen as soon as possible.

Phantomjs

Phantomjs easily brings monitoring into the automated ranks. Phantom JS is a WebKit for JavaScript APIs on the server. Based on this WebKit, you can easily perform automated Web testing. Phantomjs requires some programming work, but it is also more flexible. The official document contains a complete example of loading the Har file from a webpage. For more information, see this document. There are also many introductions about this tool in China. In addition, similar tools developed by Sina @ TaobaoBerserkjsIt's pretty good, but it's also considerate.First screen statisticsFor more information, see here.

Day 3 starts online real user performance monitoring

Take advantage of its strengths to avoid being short.

At this point, some people may ask, since there are so many excellent tools, why should we monitor the real access performance of online users?

We found that the simulated Tool testing will be consistentActual deviationAnd sometimes cannot reflect performance fluctuations. In addition to basic indicators such as the white screen first screen, the product line is equally concernedProduct-related indicatorsSuch as ad visibility, search availability, and sign-In availability. These functions are directly related to page JS loading and are difficult to simulate using tools.

To continuously monitor user access and the availability of various page functions in different network environments, we chose to insert JS into the page to monitor the real online user access performance, at the same time, the existing analysis tools are used as an aid to form a complete and diversified data monitoring system to provide reliable data for Product Line Evaluation and Optimization.

For a simple comparison of different monitoring methods, see the following table:

Type	Advantages	Disadvantages	Example
Non-Intrusive	Complete indicators, active client monitoring, and competing product monitoring	Unable to know the number of users affected by performance, less sampling and distortion, and unable to monitor complex applications and subdivided Functions	Pagespeed, phantomjs, and UAQ
Intrusive	Real massive user data, monitoring of complex applications and business functions, user clicks and regional Rendering	Script statistics need to be inserted, network indicators are incomplete, and competing products cannot be monitored	DP and Google statistics

Day 4 how to collect performance data?

Monitoring users' pain points

What are the Online Monitoring metrics? How can we better reflect user perception?

Why does the user feel that the page cannot be opened, why the button cannot be clicked, and why the image display is so slow. Engineers may be concerned with browser loading process indicators such as DNS queries, TCP connections, and service responses. Based on the user's pain points, we extract four key indicators from the browser loading process, namely, the white screen time, the first screen time, the user's operation, and the total download time (defining the previous article ). How are these indicators counted?

Determine the statistical start point

We need to start statistics when the user enters the URL or clicks the link, because in this way, the user's waiting time can be measured. If your user's high-end browser accounts for a high proportion, you can directly use the navigation timing interface to obtain the statistical start point and time consumption of each phase during the loading process. In addition, you can use the cookie record timestamp method to collect statistics. You must note that the cookie method can only collect data that is redirected to the site.

Count white screen time

The white screen time is the time when the user sees the content for the first time. It is also called the first rendering time. Chrome later versions have the firstpainttime interface to obtain this time. However, most browsers do not support this time. You must try other methods to monitor the time. Observe the webpagetest view carefully and find that the white screen time appears onThe external link resources in the header are loadedNearby, because the browser will render the page only after loading and parsing the header resources. Based on this, we can obtain the time when the head resource is loaded to calculate the white screen time. Although not accurate, the main factors that affect the white screen are taken into account: the first byte time and the header resource loading time.

How does one calculate the head resource loading? We found that JS embedded in the header usually needs to wait until the previous JS \ CSS loading is complete before execution. Can I add a JS statistics header resource loading end point at the bottom of the browser head? You can use a simple example to test:

<! Doctype HTML>

 After testing, it is found that the loading time of the statistics header is exactly the same as the download time of the header resource. In addition, a JS with a long execution time will not be counted until the JS execution is complete. This method is feasible (For details, refer to the browser Rendering Principle and JS single-thread Introduction ).
Count the first screen time The statistics on the first screen time are complex because it involves multiple elements such as images and asynchronous rendering methods. Observe the loading view and you can find that the loading of images that affect the main factors on the first screen. By calculating the loading time of images on the first screen, you can get the rendering time of the first screen. The statistical process is as follows:
 Call the API to start statistics on the position of the first screen-> bind the load event of all images on the first screen-> determine whether the image is on the first screen after the page is loaded, and find the slowest one to load-> the time of the first screen
 This is a simple statistical logic for Synchronous loading. Note the following points: 
  
  When the page has IFRAME, you also need to determine the loading time. 
  GIF images may trigger the load event repeatedly on IE and should be excluded. 
  In the case of asynchronous rendering, we should calculate the first screen after asynchronously obtaining data insertion. 
  Important CSS background images can be counted by JS request image URLs (the browser will not reload them) 
  If no image is displayed, the JS execution time is counted as the first screen, that is, the time when the text appears. 
 
Statistics on user operations and total downloads Users can operateStatistics are available by default.DomreadyBecause events are usually bound at this time. For JS with modular asynchronous loading, You can actively mark the loading time of important Js in the code, which is also the statistical method of product indicators.
 Total download timeStatistics are available by default.OnloadTime. If many asynchronous rendering operations exist on the page, you can use the time when all asynchronous rendering is completed as the total download time.
Network metrics Network Type Determination
 For mobile terminals, the network is the most influential factor in page loading speed, and corresponding optimization measures need to be taken according to different networks, such as using a lite version for 2G users. However, there is no interface on the web to obtain the user's network type. To obtain the user's network type, you can determine the network corresponding to different IP segments by measuring the speed. For example, the typical method of speed measurement is Facebook. After analysis, the user's loading rate has a significant distribution interval, as shown in:
 
 Each distribution interval corresponds to different network types. After auxiliary tests with the client, the success rate can be more than 95%. With the corresponding rate data of the IP address database, you can determine the network type of the user based on the IP address when analyzing the user data.
 Network time consumption statistics
 Network time consumption data can be obtained through the navigation timing interface mentioned earlier. Similarly, resource timing can be used to obtain the loading time of all static resources on the page. This interface allows you to easily obtain DNS, TCP, first byte, and HTML transmission time. The navigation timing interface is as follows:
 
 The above section focuses on the data collection part, which is also the most important part of the system. Only by ensuring that the data truly reflects user perception can we prescribe the right remedy to improve the user experience. After the data is collected, we can upload the data after loading the page. For example:
 http://xxx.baidu.com/tj.gif?dns=100&ct=210&st=300&tt=703&c_dnslookup=0&c_connecting=0&c_waiting=296&c_receiving=403&c_fetch_dns=0&c_nav_dns=75&c_nav_fetch=75&drt=1423&drt_end=1436&lt=3410&c_nfpt=619&nav_type=0&redirect_count=0&_screen=1366*768|1366*728&product_id=10&page_id=200&_t=1399822334414Day 5 how to analyze performance data? 
  
  Make data speak 
 
 The data analysis process, as described in the previous article, can analyze data from multiple dimensions. Big Data Processing requires the use of hadoop, hive, and other methods. For common websites, you can use any backend language for processing.
 Mean and distribution
 Mean and distribution are the two most common data processing methods. It intuitively shows the trend and distribution of indicators and facilitates evaluation, bottleneck detection, and alarm. Abnormal values should be removed during processing, such as dirty data that apparently exceeds the threshold.
 There is a lot of research data in this area during the time-consuming evaluation. For example, someone proposed three basic time ranges: 
  
  0.1 seconds: 0.1 seconds is the minimum granularity perceived by the user. Operations completed within this time range are considered smooth without delay. 
  1.0 seconds: The response completed within 1.0 seconds does not affect the user's thinking flow. Although the user can feel the delay, the loading prompt does not need to be given for Operations completed within 0.1-1.0 seconds. 
  10 seconds: After 10 seconds, the user will not be able to stay focused and may choose to leave for other tasks. 
 
 Based on some research in the industry and the characteristics of different indicators, we have developed the indicator distribution evaluation interval. As shown in:
 
 The establishment of the evaluation interval helps us understand the current performance situation and respond to performance trend fluctuations.
 Multidimensional Analysis
 To facilitate the exploration of potential performance bottlenecks, data needs to be analyzed in multiple dimensions. For example, the most important dimension of a mobile terminal is the network. In addition to the overall data, data needs to be analyzed based on the network type. Common dimensions include systems, browsers, and regional operators. We can also determine some dimensions based on the characteristics of our products, such as the page length distribution and Lite version.
 It should be noted that the more dimensions, the better, You Need To determine according to the product features and terminals.Dimensions are used to facilitate performance bottleneck search.
 Episode: Some people have commented on Weibo that they want to monitor but the companyNo log server. There is no need for a separate log server, as long as you can save the access logs for this statistical request. If there is no solution for the independent server of the website, create an application in the Baidu Developer Center and write a simple web service to parse the received statistics to the free database of Baidu cloud, then, you can use MySQL to process the data of the current day every day. The sampling performance data for common sites should be fine. Please call me Lei Feng.Day 6 how to use monitoring data to solve the problem? 
  
  Discover bottlenecks and remedy the problem 
 
 For chart creation, there are well-known highcharts, and Baidu's echarts is also very good. No matter what tool is used, the most important thing is to enable the reportHighlight key points, intuitive and clear.
 Before creating a report, I would like to ask a few questions about how to intuitively see the current situation and possible problems, where to strengthen, where to remove, and whether to use the report.
 
 With the real world that reflects user perception, subdivided into various business functions, and detailed Network and other auxiliary data, we are more comfortable in solving front-end performance. The monitoring system has provided continuous feedback on the online access status, optimized the scheme based on the existing evaluation and bottleneck selection, and adjusted according to the feedback. I believe that performance optimization is no longer a problem.
 How to choose an optimization solution?? This is another big topic. Fortunately, we already have a lot of experience to learn from. The appendix is sorted out.Performance learning materialsYou can read and learn as needed.
 
Day 7 Summary Through the above "days" efforts, we can build a small and beautiful front-end performance monitoring system. However, this is only the beginning, and front-end data has a lot of mining value. Performance optimization is also a course that needs to be carefully studied. To create a smooth user experience and make users more satisfied, set up your own front-end data platform!
 This article is written after w3ctech's entry into the famous enterprise-Baidu frontend Fex session. The PPT for sharing is here, and the video is here.
 Gorgeous non-split line~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Benefits-front-end performance learning materials sorting performance criteria ★★★★★ 
  
  Yahoo performance military rules, Chinese articles 
  Google performance optimization Article recommendation 
 
Analysis Tools ★★★ Getting started 
  
  Pagespeed is based on Google performance standards and can run plug-ins installed in browsers 
  Yslow is a testing tool based on Yahoo performance standards and can run plug-ins on a browser. 
  Pagecheck Baidu internal development, complete indicators, support for automatic operation 
 
 Advanced 
  
  Webpagetest: an advanced required tool for viewing data such as loading waterfall streams on a page 
  The chrome developer tool is powerful and worth learning. 
  Phantomjs is a powerful analysis tool and must be used by Swiss Army knife. 
  Jsperf JS executes performance analysis websites. Who knows? 
 
Browser and HTML standards ★★ Getting started 
  
  Browser cache mechanism 
  Google is recommended for articles related to navigation timing and resource timing. 
  DNS resolution process principle 
  High-performance browser network translation series 
 
 Advanced 
  
  The spdy protocol is about to be released based on the http2.0 protocol. 
  Browser rendering is hard to understand, but very classic 
  Summary of how chrome works 
 
Development practices ★★★★ General 
  
  High-performance Javascript 
  Writing-fast-memory-efficient-Javascript 
  Understanding and solving Internet Explorer leak patterns 
  Modularize the loading of FCM and seajs. It has a complete static resource management and optimization solution, and is recommended. 
  Best practices for front-end performance optimization 
 
 Animation and rendering 
  
  Requestanimationframe 
  16 Ms Optimization 
  CSS and JS do not cause repaint & reflow 
 
 Mobile Development 
  
  Improving the performance of your HTML5 app 
  Steve Souders 
  Creating High-Performance mobile websites 
  HTML5 techniques for optimizing mobile Performance 
  Mobile web site optimization Guide 
 
Performance monitoring ★★★★ 
  
  Metric Selection 
  Complete Web Monitoring-web performance at emetrics 
  Berserkjs establishes a front-end performance monitoring platform 
  NY web perf Meetup: peeling the Web performance onion 
 
Related Meetings ★★★ 
  
  Velocity is one of the most famous international conferences in the industry. 
  Google I/O 
  Qcon 
 
Recommendation blog ★★★ 
  
  Web performance today 
  Perfplanet 
  Stevesouders.com 
  Site-performance-and-Optimization 
 
Zhangtao (http://weibo.com/u/1733215473)-endless Learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More