Application of user behavior prediction based on big data in front-end performance optimization

Source: Internet
Author: User
Tags browser cache piwik

First of all, I have to say that this article has a bit of title to the party, in fact the content does not look so tall on the headline. Secondly, this article is only to do a technical solution of the possibility of discussion, and did not provide a perfect solution, at most to give a demo for reference.

Objective

If you want to reprint, please specify the transfer from: http://www.cnblogs.com/silenttiger/p/4929841.html

Front-end performance optimization, I think the main purpose is two: 1, improve page loading speed, 2, save server resources.

Here specifically to save the server resources, many people do the front-end performance optimization, often only consider the problem of front-end performance, and completely ignore the performance optimization of the front end server performance impact. In fact, for a network traffic relatively large site, save the server resources is to save money AH. For example, the smaller the size of the JS file and image file, the smaller the disk IO loan and network IO loan required by the server, the less it will naturally save some of the expense.

The existing method

If you want to reprint, please specify the transfer from: http://www.cnblogs.com/silenttiger/p/4929841.html

Front-end performance optimization, our current mainstream technology solutions are mainly two: 1, merging; 2, compression; 3. Cache.

For example, a Web site has a,b,c,d four pages, respectively, the need to refer to a\b, a\b\c, a\b\c\d, a\d these several JS files. So we consider that a JS file in four pages have references, so do not participate in the merger. Then combine b\c two JS files into X, b\c\d three JS files into Y. Now a,b,c,d four pages to face the JS file of the reference rules into a separate reference a\b, a\x, A\y, a\d these several JS files. Next, we will a\b\x\y\d these five JS files respectively to confuse the compression.

Through the above series of processing, now the user through the browser to visit our site, in a\b\c\d four pages will only need to launch two to the JS file request. At the same time, four pages can also share the cache of a JS file.

Existing problems

If you want to reprint, please specify the transfer from: http://www.cnblogs.com/silenttiger/p/4929841.html

The above-mentioned performance optimization scheme, I think a lot of people can see at a glance, in fact, there are many problems.

1. When the page is loaded for the first time, the cache policy does not work, slowing down the page loading speed.

Although we have configured the cache policy, so that users access the B page once before accessing the B page can be directly loaded from the browser cache its dependent a\x two JS files. However, if a user accesses a page only and does not have access to page B, and then accesses the B page, only A's cache can take effect, and X is not cached.

2, b\x\y\d the contents of these four JS files are redundant, wasting the server resources.

X contains the contents of the b\c two JS files, but when the user requests x after using the browser and then request B, then you need to re-download the entire B file, where the cache for x is not available on the B.

Judging from these two problems, it seems that we still have room for improvement!

New method

If you want to reprint, please specify the transfer from: http://www.cnblogs.com/silenttiger/p/4929841.html

For the above problems, we have to solve each.

The first is the problem that the cache policy does not work when the page is first loaded. In fact, this issue is the core of this article, my solution is preloaded. That is, when the user has not access to the B page, we pre-let the user's browser load b page depends on the x this JS file.

I designed a front-end resource preloading system that includes front-end JS code, back-end preload policy logic, and a database for calculating load policies. Or use the example above. Suppose a page is the homepage of the website, when the user accesses a page, the front-end JS sends the user's SessionID and the URL of the current page to the backend, and the backend logic logs the access behavior to the Visit_sequence table of the database, then collects the front-end resources to form the resource list. Include link, script, etc. referenced in the page, and calculate the MD5 of this resource list to the back end. The backend logic finds the corresponding MD5 value in the Page_resource_signature table based on the page URL, and if it is not found, it requires the front-end JS code to send the entire resource list along with the MD5 value of the page URL and resource list and record it to Page_resource_ Signature and Page_resource tables; If a record is found based on the page URL but the MD5 value does not match, the front-end JS code is required to send the entire resource list along with the MD5 value of the page URL and resource list and update Page_resource_ Signature and Page_resource data in two tables, if records are found based on the page URL and the MD5 values match, the backend program is based on the records of Visit_sequence and Page_resource tables in the database. Calculates the possibility of users accessing other page resources in the system under the current page and returns to the front-end code logic, and next, the front-end code preload the resources in the list of preloaded resources returned by the backend.

Front-end code:

/*desc:performancecollector relies on jquery and md5.js to collect the paths that users jump between pages in the system and the list of static resources referenced by each page */if (typeof performance!== ' Undefined ' && typeof performance.timing!== ' undefined ') {$ (document). Ready (function () {//Statistics page ready time, and The URL of the user's SessionID and current page is sent to backend $.post (' http://127.0.0.2/index.php/Home/VisitSequence/Insert/', {SessionID: Document.cookie.substr (document.cookie.indexOf (' phpsessid= ') + Ten), PageUrl:window.location.href.indexOf (' # ') < 0? Window.location.href:window.location.href.substring (0, Window.location.href.indexOf (' # ')), cost:performance.        Timing.domcontentloadedeventstart-performance.timing.responsestart});        Collect page resource information var resources = [];        $ (' link '). each (function () {Resources.push ($ (this). attr (' href '));        });        $ (' script[src] '). each (function () {Resources.push ($ (this). attr (' src '));        });        Computes the MD5 of the resource to the server compared to the setTimeout (function () {    var resourcesignature = MD5 (json.stringify (resources));                $.post (' http://127.0.0.2/index.php/Home/VisitSequence/CompareSignature/', {signature:resourcesignature, PageUrl:window.location.href.indexOf (' # ') < 0?                Window.location.href:window.location.href.substring (0, Window.location.href.indexOf (' # '))}, function (data) { If you do not find the signature of this page resource, schedule the Delay upload page resource signature and resource list if (Data.find = = 0) {$.post (' HTTP://12                        7.0.0.2/index.php/home/visitsequence/updateresource/', {signature:resourcesignature, Resources:resources, PageUrl:window.location.href.indexOf (' # ') < 0?                Window.location.href:window.location.href.substring (0, Window.location.href.indexOf (' # '))});                }else{//Scheduling delayed preload Resources loadresource (data.resources);        }            }); }, Math.ranDom () * 1000 + 2000); Preload resource function LoadResource (resources) {///In this method, load the resource listed in the Console.log (' Loadresou        Rce ', resources); }    });}

database table structure:

Sequence diagram:

The most critical step in this process is to "calculate the likelihood of each page resource being accessed", which is the red part of the sequence diagram. This action can be achieved by using a program to analyze the user's previous browsing history. For example, in our common Piwik system, the upstream and downstream relationships for each page are provided directly:

For example, we can clearly see through the Piwik interface, after access to index.php this page, 37% of the users will then visit xxxx/xx=attendance&menuid=19 this page, and 20% of the users will then visit xxxx/ Xx=ast&a=index&menuid=30 this page. Join both pages reference sharelib.js this file, then users visit index.php This page, need to access sharelib.js this resource of the possibility of up to 57%, So can we get index.php's front-end code to preload sharelib.js this resource? This way when the user really happens to go to the page to browse other pages, it is likely that the page after the jump to the front-end resources we have pre-loaded, the browser can directly from the cache to read the corresponding data, so as to achieve faster page loading speed effect!

Thus, by detailing the behavior of the user browsing the site and analyzing the resource references for each page, we can actually prejudge that the resource is worth preloading before the user accesses a page. Thus, the effect of resource preloading is realized.

Let's look at the second problem, the content redundancy caused by resource merging.

As a matter of fact, when we solve the first problem, the second problem is no longer there. Because we can pre-load the page's resources before the user enters the page, there is no need to merge the front-end resources, and there is no content redundancy caused by resource consolidation.

New issues

If you want to reprint, please specify the transfer from: http://www.cnblogs.com/silenttiger/p/4929841.html

This new scheme solves some of our problems, but it is not perfect.

1. More complex teamwork

In the former front-end performance optimization scenarios, we often only need a group of front-end developers to participate in the line, but now this scenario, because the need for the backend to provide user behavior prediction data, so it is likely to require the back-end development of the students involved. If the site's user data collection is done by a dedicated team, it is likely that this dedicated team will be involved in the design and implementation of the entire scenario. This undoubtedly greatly increases the complexity of team collaboration, and the requirements for project management level are further improved.

2, the Site home page has no effect

Based on the User Behavior Prediction optimization scheme, only after the user enters the site to take effect, if the user does not enter the site at all, we can not do anything, so, the homepage of the site in this optimization program completely get no benefit. And the site is often the first page of the site is one of the largest number of respondents, so the impact of this problem is still relatively large.

3. Need to weigh the real-time and performance of data

When a server returns to a resource that the front-end user might need to access next, is it possible to calculate the probability of each resource being accessed in real-time through the data in the database, or do we calculate it in advance by some mechanism and then read it back to the front end? If it is real-time computing, the probability of accuracy will be higher, but the user access to the historical data too much, this real-time calculation will consume too much system resources is a big problem, and if we calculate this data in advance, when the site page updates, if these calculated probability data is not updated, When users visit our site, they are unable to enjoy the benefits of pre-loading, and because we abandon the traditional optimization method to get a worse user experience, then it is a question of when these probability data will be updated.

If you want to reprint, please specify the transfer from: http://www.cnblogs.com/silenttiger/p/4929841.html

This article is entirely rooted in one of my brain hole, is suddenly an epiphany, think of this performance optimization scheme. I tried to search for the relevant keywords on the Internet, but I didn't find very good information, so I thought, did I first? If so, there must be a lot of details and shortcomings that have not been considered, written for your reference. If not, please do not hesitate to share your practical experience!

If you want to reprint, please specify the transfer from: http://www.cnblogs.com/silenttiger/p/4929841.html

Application of user behavior prediction based on big data in front-end performance optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.