Implementation of multi-layer experimental flow segmentation based on hash calculation

Source: Internet
Author: User
Tags extend hash implement new features range split advantage

 1. Background information

Site new features or new strategy in the station after the development of the complete, before the full flow on the line to evaluate the advantages and disadvantages of new functions or new strategies, commonly used evaluation method is a-b test, the practice is to sample a total of two small flow, respectively, the new policy branch and the old Strategy Branch, by comparing the two flow under the different indicators, We can evaluate the pros and cons of the new strategy, and then determine whether the new strategy is full flow.

The sampling referred to above refers to the division of line traffic according to a certain randomized approach. A sampling may refer to the method of dividing, or to a subset of traffic. Sampling is a special kind of small flow, requirements for the Division of traffic must ensure uniformity and randomness, and can be filtered out according to the requirements of the specification of the part of the sampling process is divided into flow and flow screening two steps, flow segmentation refers to the full flow of uniform scattered, extracted from the fixed flow ratio, Flow screening is the aid of traffic segmentation, the filtering process is to filter out the parts of the irregular flow from the segmentation, which is mainly concerned with the realization of traffic segmentation.

 2. Single-Layer Flow segmentation architecture

To achieve the flow of the common method of segmentation is a single layer of flow segmentation, flow segmentation needs to be carried out in some way, that is, the breaking of the flow of traffic segmentation basis, for example, we can according to the flow of cookies scattered, or random scattered, and so on, the different ways to split the object If we break according to cookies, then the complete collection of our splitting objects is all cookies, if it is randomly scattered, then our complete collection of objects is the site of all traffic.

  

Fig. 1.1 Schematic diagram of single layer flow splitting architecture

With the above idea, how do we achieve a single layer of flow segmentation? As shown in Figure 1.1, we follow the specified flow-splitting method, the required input parameters first through a hash calculation, the uniformity and randomness of the results are guaranteed by the hash algorithm, with the results of the hash, the flow of the process is not finished, we also need to map the hash results to the complete collection of objects, to achieve the method is The complete set of the splitting object is regarded as an interval segment, the hash result is then corresponding to the interval section, the size of the interval is the smallest granularity of the segmentation, for example, if the minimum slice granularity is 0.01%, the interval segment we select is [0,9999], and the interval section is defined, We can take the hash result to a numerical modulus, this value is equal to the interval maximum plus 1, after modulo the results can only be corresponding to the complete range of segmentation objects above, so we will all traffic to the flow of the complete set of objects.

Finally, we subdivide the interval segment according to the experimental requirement, divided into several subgroups, used for experimental comparisons, such as the following figure, divides the entire 100% interval into multiple subgroups, each with a unique number--sid, as the unique identifier of the interval, and the sid=1 's sub interval corresponds to 1%. So its sub range is [0,99], and, similarly, the interval value of the second 1% of sid=2 corresponds to [100,199], so that we divide the complete interval of 100% into a number of subgroups, and two of the same subgroup can be used for experimental strategy comparisons.

  

Fig. 1.2 Division of Traffic Sub-range

  3. Multi-layer flow segmentation architecture for reusable traffic

This method of single layer flow segmentation is a kind of exclusive flow-splitting method, a child can only be used to provide an experiment, a request can only hit an experiment, the advantage of the experiment is to decouple, does not affect each other, the disadvantage is limited resources, the distribution of traffic after the completion of the follow-up demand will be in the long-term waiting and starvation state, This exclusive mode of flow segmentation, obviously in the case of increasing the demand for experiments is completely unable to meet, in order to solve the problem of monopoly, we can use multi-layer flow segmentation.

The idea of multi-layer flow segmentation is to extend a single-layer structure to a multilayer structure, as shown in the following figure, the multilayer must satisfy the orthogonality, where the orthogonality between the layers means that any one of the subgroups of a layer can be randomly and evenly corresponded to the other layers, This allows us to spread the effects of one sub range evenly across the entire layer.

Multi-layer flow splitting architecture can extend experimental traffic from 100% to 100%*n, it is said that the flow of each layer can be used in comparison to the experiment, a request can hit multiple experiments at the same time, the experimental flow is reused, multi-layer flow is to meet the different levels of the impact of the experiment is even, predictable, The premise of carrying out multilayer experiments is to ensure that the effect of the experiment is acceptable, some experiments are not allowed to reuse, for example, show class experiments, if two experiments each specify a set of presentation style template, and from the perspective of the module to show a request can only show a style template, therefore, Experiments that are incompatible here can only be found in the same layer.

  

Fig. 1.3 Schematic diagram of multi-layer flow splitting architecture

Another advantage of multi-layer flow segmentation is that we can use different segmentation methods for each layer, this also greatly enriched the flow of the diversity of segmentation, then, there is another problem, each flow layer can only use a flow-splitting method, if the same layer has a variety of ways to solve the needs of segmentation? The answer is implemented through nesting of layers, which means that layers can contain other layers in one layer, as shown in the following figure, the rectangular representation layer, the circular representation of the experimental flow of the split, where the 1th layer contains 2, 3, 43 layers, 4th layer also contains 9, 10, 113 layers, In order to achieve the goal of multiple flow splitting in the same layer, we need to divide the layer into intervals, for example, the 2nd layer in the following figure is divided into 5, 62 layers, but the two layers must also be split in the same way, because these two layers are equivalent to dividing the flow of their parent layer into two parts, Therefore, the two parts of the generation should be consistent, and finally, in the 5, 6 layer, we can build other layers, these two layers can be used to different ways of flow segmentation.

  

Fig. 1.4 Multi-layer nesting diagram of flow splitting

Theoretically, the number of tangent layers of a multilayer flow-splitting architecture is infinite, can support any number of traffic layer, but from the perspective of implementation, the layer is very difficult to achieve, because, in order to ensure the orthogonality between layer and layer, we have to implement a set of hash algorithm for each layer, To ensure that the results of each hash algorithm is orthogonal, to achieve the infinite number of layers, you need to implement an unlimited number of orthogonal hash algorithm, the increase of the hash algorithm will lead to the decline of orthogonality, we can achieve a finite orthogonal hash algorithm, The number of hash algorithms is required to meet all the experimental requirements. Here we introduce a multi-layer flow segmentation architecture implementation method.

In order to achieve multi-layer flow segmentation, our idea is to implement a hash algorithm, this hash algorithm input is a cookie, random value and other information, output is the result of a single hash, and to ensure that the results of this hash algorithm is enough uniform and random, and then the hash to transform, Multiple orthogonal hash algorithms are extended.

In order to verify the randomness and uniformity of a single hash algorithm, we carried out experimental verification, as shown in the following figure, each row represents a 100% complete set, each column represents from the total concentration of 10%, the complete test is 100w, from the following test data, each layer of flow segmentation results are more uniform and random.

  

Fig. 1.5 Experiment data of uniformity and randomness verification of single hash algorithm

With the single layer hash algorithm, we need to expand the single layer of hash into multi-layer, there are many methods here, this paper uses the shift transformation method of the single layer of hash expansion into a multi-layer hash, multi-layer experimental flow segmentation of the real environment test data as shown below.

 4. Actual data test

In order to verify the feasibility of the algorithm, we carried out the actual data test, the test results are as follows:

Table 1.2 Homogeneity Verification data

  

  

by Yangfangwei&huangjin&yaoshiyu



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.