Recently, a few friends mentioned the concept of "phased release" and related issues. I would like to explain several specific release methods (the specific name of Chinese translation is not necessarily correct), their advantages and disadvantages, and implementation difficulties.
These methods can be used as a fast-running software or Web Service Company to gradually release new code or products, while trying to improve the methods, these methods can avoid the instantly devastating consequences of a product/code vulnerability in a single release on the website.
These methods have their own advantages and disadvantages and difficulties. According to the actual situation, a company may use different methods for different releases.
- Multi-phase code push: this is a common code release method for agile development teams. The basic operation is that the entire team shares a code library. The latest version of the code is made into a new release Branch (release Branch) at a certain frequency (for example, once a day or once a week ), gradually release the Production Branch to the product line.
- Features: the process of "gradual selection" cannot be controlled by code (if the code is controlled, a new version of the control code may crash the entire code release process if there is a problem ). The operations team is responsible for the "gradual selection" process: for example, selecting the first machine for each cabinet or the first Cabinet for each cluster, or the key to selecting a data center distributed evenly among multiple data centers is to distribute data evenly to different machines. If there is a problem with the new code on a machine with a certain configuration, the Operation team can promptly discover the problem. In addition, multi-phase
The release cycle of code push must be shorter than the iteration cycle of agile development. Code is usually released to all machines within one day or one week.
- Monitoring: multi-phase code push is generally used for real-time monitoring: the code logic error information is classified by code version (such as SVN revision number) to ensure that the new version of the Code does not bring new errors; hardware information (CPU memory I/O) is classified according to the selected machine, Cabinet, cluster, and data center: ensure that the new version does not cause greater resource consumption. After the above information is confirmed, you can install new code for a larger machine.
- Difficulties:
- If the front-end Server Load balancer does not guarantee the consistency between the user and the machine, a user may see several new versions and several old versions during the release process (for example, the first page is the new version, ajax is the old version). Incompatible versions may cause JavaScript errors, CSS misplacement, or even logic errors. The javascript architecture requires some security checks, it is also required that programmers consider version compatibility during development (generally not easy in fast-paced web development), or use a front-end Load balancer that keeps users consistent with machines;
- During monitoring, the hardware resource consumption information may be subject to a large disturbance during the release process, but it has nothing to do with the Code (for example, after the restart, the cache should be re-warmup, increase Io, and generate false reports ), this should be ruled out by the code release Manager (pusher) experience.
- AB testing: this is a common method for product release. Compared with step-by-step code release, AB testing usually takes a longer period (for example, several weeks or even months ). The basic operation is to add one or more configuration controls to the product developer (generally, each product configuration should contain the configuration ID ), you can adjust the corresponding configurations to publish a product to the user group of "gradual selection.
- Feature: "gradual selection" is a logic process with code control. Generally, the product is selected based on the user ID, and there are also IP addresses or other information.
- Monitoring: The data of the AB test is generally classified according to the product configuration ID and the open/close status, and analyzes the impact of a product configuration on user behavior when it is enabled or disabled, and the consumption of hardware resources, which can predict the impact of this product after 100% release.
- Difficulties:
-
- How to make a choice: different products have different options. Generally, you can consider the user ID. However, if the cache efficiency of the browser is very high, you may need to consider IP addresses (because one browser may be used by multiple users ); IP addresses or real-time random selection may be required for products (such as tests of various registration processes) not registered users;
- Product performance evaluation: Some products require network effects. If a sample is randomly selected based on the user ID, the network effect may be broken to invalidate the product during the AB test (for example, the average user connection degree of a social network is 50, that is, one user connects to 50 other users, according to the AB tests randomly sampled by 1% user IDs, the connection degree within the selected user subgroups may be less than 1)
- The logic of "step-by-step selection" is a code. If this code is written incorrectly, it may have disastrous consequences.
- Dark launch: I think the term "grayscale launch" may come from dark launch. This is another means of product release, which is often used for products that require one release.There are some products that may be due to marketing strategies, or because of the characteristics of the product itself (such as Facebook user name registration, or may be like the train ticket sales system) that cannot be gradually launched through the AB test. At the same time, we need to know the impact of a product launch. In this case, we can use grayscale launch. The basic operation is the code that needs to be run when a user accesses a website to open a new function, but the output, interaction, and write operations visible to the user are blocked, follow the AB testing method to gradually release the product that removes user interaction.
- Features: the test process of new products is invisible to the outside world.
- Monitoring: Like AB testing, the main focus is on system load and resource consumption.
- Difficulties:
-
- How to shield user interaction: on the one hand, we need to obtain almost real product loads, and on the other hand, we hope that we do not need to mess up the code and make great changes to the real release;
- Prediction of the load after a real product is released: a positive feedback may be generated after the product is released (for example, a product is very popular after it is released, causing more users to register with zookeeper) gated launch can only predict the first level of effect, but cannot predict the chain reaction caused by changes in user behavior.
These methods seem to be called phased release, and they are quite different. Depending on the product release needs, each has its own advantages and disadvantages.