The basic concepts of A/b test are described above, and we continue to explore how to implement A/b test.
Let's take a look at a diagram:
(Note: Thank Algo for providing this picture.) )
The diagram above shows the implementation of A/b test. From left to right, the four thicker vertical bars represent the four key roles in A/b Test: Client, server, data tier (data), Warehouse (Data-warehouse). The top to bottom represents three types of access: the normal access process without A/b test (no-ab-test), a A/b test access process based on the backend (back-end AB test), a A/b test access process based on the front-end (front-end AB test).
In general, a user initiates a request from the client in a browse, which is uploaded to the server, and the server's background program calculates what content to return to the user (data) at the same time. Warehouse) Add a management information to record the information about this visit. This process is the horizontal process on the diagram. Once the data warehouse has collected enough data, it is ready to start parsing (Analytics), which is the upper-right part of the diagram.
A/b test requires multiple versions to be presented to different users, requiring a "streaming" link. As we can see from the image above, streaming can be done at the client or on the server side. A traditional A/b test is typically split at the service side, which is based on the back-end A/b test (back-end AB test), when the user's request arrives at the server, the server returns different versions to different users according to certain rules, while the work of recording data is also done on the service side.
Based on the back-end of A/B testing technology to achieve a slightly simpler, but the disadvantage is that the technical Department of Engineering Resources involved, and the other data collected is usually more macroscopic PV (Page View) information, although it can be more complex macro-behavioral analysis, But it is often impossible to know what the user is doing on a particular version of the page.
A/b test based on the front-end can solve the problem above. It is characterized by the use of the front-end JavaScript method, the client side of the streaming, while the user can use JavaScript to record the mouse behavior (and even keyboard behavior, if necessary), directly sent to the corresponding server records. The advantage is that no technical department (if you are like us, front-end engineers and back-end engineers belong to different departments) participate, and can more accurately record the user on the page of each behavior, even including the backend method difficult to record the invalid clicks!
Next, I'll focus on some of the practices we have on the front-end A/b test.
I. Diversion
The first problem encountered is how to shunt the problem. For most of the requirements, we want the number of visitors to be evenly distributed across versions. There are a number of solutions, the simpler one is the one mentioned earlier, dividing users according to a cookie ID, provided that every visitor on your site has a duplicate cookie ID on the first visit, such as "123.180.140.*." 1267882109577.3 ". You can then divide the population according to the last one of the Cookie ID (in this case, "3"), such as a version of the singular, and an even number showing the B version.
Because the cookie ID is generally set not easily changed, based on the advantage of the cookie ID is that we can maintain a good consistency of visitors, a user if the first time to see a version, then he saw a version of a, not a moment to see a version of the B version. But the disadvantage is that if the user's browser does not support cookies, streaming will not work properly. However, modern browsers are supported by default cookies, if the user's browser does not support cookies, it should also be a very small number of special cases, the impact on the result is very small, for these special cases, we can generally safely ignore.
One more thing to note is that A/b test page must have a higher UV (unique Visitor, independent visitors), because the diversion with a certain randomness, if the page UV is too small, the number of each version is less, the result is likely to be affected by some accidental factors. And when the UV is large, according to the large number theorem, we get the result will be close to the real data. Just like to know the average height of an adult in a place, of course, the bigger the sample, the more believable the conclusion.
Ii. Display
After deciding which version to show to the current visitor, how do you load the corresponding version with the front-end method? This needs to be handled in a separate situation.
Typically, if two versions have only one smaller area, we can also load two areas of HTML to the current page, first with CSS to hide them (also can default display a version), such as JS to determine which version of the display, and then control the corresponding version of the CSS display.
Sometimes, the test area is larger, the code is more, or need more background computing resources, if the beginning of the two version of the HTML loaded into the current page, it will require a large amount of overhead (such as bandwidth, background calculation). In this case, we can leave the test area blank and then delay loading in Ajax mode.
There are times, the test area is very large, almost accounted for the entire page, or completely different pages, at this time, with Ajax way to load is not suitable, you can make different versions of the page, and then use JS jump. But this way is not very good, because the front-end JS jump takes a certain amount of time, this process is likely to be users feel, and leave a bad experience. On this issue, there seems to be no good solution, at least at the front level is difficult to perfect solution, so it is not very recommended this jump way, if you really need to jump, preferably in the server end back-end code to operate.
Iii. data collection
After the correct display of the corresponding version, it is necessary to start collecting the required data. There is an optional data, is the current version of how many PV (Page views, the number of visits), if you need to record this data, the correct version when the completion of loading will send a management information. However, in many requirements, the specific version of the PV accurate values may not be important, and to collect this information requires more than one operation, so generally this data is optional.
The required data is the user's click information within the test area. When the user clicks the left mouse button in the test area (whether the click is clicked on the link, text, picture or blank), we need to send a corresponding information to the Management Server. In general, this information needs to contain at least the following data:
Current A/b test and version identification
Click the location of the event
Click Timestamp (client time)
The URL in the current point (null if the point is in a non-hyperlink region)
user identification (e.g. Cookie ID)
User browser Information
To restore the user's click position as accurately as possible, our page on the front end has a relatively high requirements, the page in different browsers have a basic consistent performance, at least in IE6, 7, 8 and Fiefox, the page landscape elements to be accurate and consistent, vertical is difficult to achieve exactly the same, but also to be as consistent as possible. In addition, such tests are not suitable for the adaptive width of the page, more suitable for the fixed-width page, in order to avoid different resolution of the page left and right to the different location of the mouse click position, click Position should be relative to the test area in the upper left corner position. In addition, it is a good idea to record the location of the test area relative to the upper-left corner of the page, and use this data when you restore the click Map and draw the hot zone map later.
The process of this phase is roughly as follows:
How do you send and store data? It depends on how your Management server stores the information.
Iv. Data storage
We use a dedicated server to collect management information, and in order to support as many as possible the most intensive requests, the server's Apache Service Site Directory has only two static files, respectively, abtest.html and Abtest.gif, Both are very small blank files (blank pictures). On the guest side, you need to request any one of the two files with the associated parameters in the way of get. Like what:
http://abtest.xxx.com/abtest.gif?abid=1-a&clickblockx=244&clickblocky=372&clickblockw=392& Clickblockh=76&clicktime=
1263264082137&clickrx=233&clickry=47&clickurl=&clickbeaconid=
123.180.140.*.1267882109577.3&browsertype=firefox
This request can be sent via Ajax, or it can be done by creating the new Image () object on the page in JS.
For a Management server, this is just a normal HTTP request, it will leave a normal log record in the log, in the form of:
123.180.140.*––[13/jan/2010:15:21:15 +0800] "get/abtest.gif?a=123&b=456&c=789 http/1.1″304–"-"" Mozilla/ 5.0 (Windows; U; Windows NT 5.1; En-US) applewebkit/532.6 (khtml, like Gecko) chrome/4.0.266.0 Safari/532.6″
You can see, in addition to the information JS sent us, Apache also helped us to record some information, such as guest IP, server time, user browser information.
This is enough for data logging and storage. The Apache static file + log method is efficient enough to basically worry about performance problems. The rest, the other question, is how to read and analyze the information from the Apache log, which has nothing to do with the front end, and is a more complex issue that will be covered in the subsequent logs.
Source Address: http://www.aliued.cn/?p=2976