Continuous integration of configuration management

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

Although continuous integration has been said for many years, in order to maintain the coherence of knowledge, or to summarize a bar, the text of a lot of content from the network

The goal of continuous integration is to allow the product to be quickly iterated while maintaining high quality. Its core measure is that the code must pass through automated testing before it is integrated into the trunk. As long as one test case fails, it cannot be integrated.
"Continuous integration does not eliminate bugs, but makes them very easy to spot and correct," Martin Fowler said. "

Why do you want to do continuous integration

As mentioned in code complete, for continuous integration (in the book, the term Steve McConnell uses incremental integration) has the following advantages:

Easy to locate errors. That is, when your continuous integration fails, it means that your new code or modified code causes errors, so you can easily know who made the mistake and who to talk to.
Achieve system-level results early in the project. Since the code is already integrated, even though the entire system is not yet available, at least you and your team can already see it already there.
Improve control of progress. This is obvious, if you are integrating every day, and of course you can see what features are available every day, and what features are not yet implemented. If you're a programmer, you don't have to worry about how much I'm doing when you report a task, and if you're the project manager, then you're not bothered by what the programmer says is the concept of 50% of the code being completed.
Improve customer relationships. The reason is ditto.
Test each unit in the system more fully. This is one of the great benefits of the combination of daily build and smoke test that we often talk about.
Can build the entire system in less time. I'm afraid you'll have to come to a conclusion when you implement it. As far as we are concerned, continuous integration does not shorten the time for each project, but it is more controllable and more secure than when it is not implemented.

Over time, the more benefits of continuous integration are gradually recognized, such as:

facilitates the collection of development data for the project. For example, changes in project code volume, often error tests, source code that often goes wrong, and so on.
Continuous code quality improvements in combination with other tools. such as with Checkstyle, PMD, FindBugs, FxCop and so on the combination.
A continuous test combined with a test tool or framework. such as with Xunit,silktest, LoadRunner and so on the combination.
Easy code Review. In each build, we can see what changes have been made to the previous build, and then we can implement code review for these changes.
Facilitate the management of the development process. For example, to submit a development build to the Test team for testing, the test is satisfied, and then submitted to the release group to publish.

Continuous integration practices

The meaning of practice is simply to say how to do it. There are 10 practices from the original Martin Fowler's original article, which we'll talk about here.

Maintain a single code base

Software projects require a lot of documentation to work together to build the final product. Keeping track of all your files requires a lot of work, especially in projects where multiple developers are involved. Therefore, we can not be surprised to see that different software development teams are developing tools for managing these files-source code management tools, also known as configuration management, version control system, code base, etc. These tools are an integral part of most software projects. However, sadly and surprisingly, not all projects use such tools. I do see (albeit rarely) projects that do not use these tools, and they work together using a chaotic combination of local and shared disks.

So, as a basic continuous integration practice, make sure you use a decent code management system. Cost is not a problem, there are many high-quality open source code management tools. The current selection is Subversion (Translator Note: Now with the updated Hg and Git). (The older Open Source tool, CVS, is still heavily used, though less powerful, but subversion is a more modern choice.) Interestingly, when I was chatting with some developers, I found that they liked subversion more than most commercial code management systems. As far as I know, the only thing worth paying for is perforce.

When you have a code management system, make sure that each developer has easy access to the source code. No one should still be asking, "Where is the Foo-whiffle file?" "Everything has to be in the code base.

Although many teams are using code libraries, I often find that they don't put everything in it. If you need to use a file, they know that the file is put into the code base, but all that is needed for the build should be included in the code base, including test scripts, properties files, database schema files, installation scripts, and third-party libraries. I know there are projects that add compilers to the code base (very important for the early-fragile C + + compilers). The basic principle is that the check out code on a new machine can be built successfully. Things on the new machine should be as small as possible, often including large, hard-to-install, and stable software, such as operating systems, Java development environments, or database management systems.

You need to add everything you need for your build to your code management system, and you need to go in and out of the East and west where you often operate, and IDE configuration is a good example of how you can share your IDE configuration.

A major feature of the version control system is that it allows you to create multiple branches to handle different "development flows". This function is useful, but it is often overused and has caused a lot of trouble for developers. So, you need to minimize the use of the branch, and it is recommended to use the mainline, where there is only a single development branch in the project, and everyone is "offline" for most of the time.

In short, you should put everything you need to build into your code management system instead of putting the built output in. Some friends do put the build output in a code management system, but I think it's a bad taste that can lead to deeper problems-usually you can't finish rebuilding.

Make build automation

Turning the source code into a running software system is often a complex process, including compiling, moving files, loading database schemas, and so on. But most of these tasks can be automated, and they should also be automated. It is time-consuming and fundamentally wrong to have people enter strange commands or to click on a dialog box.

Building the required automation environment is a common feature for software systems. Unix make has been born for many years, and the Java community has ant,. NET community is Nant, and now there's MSBuild. When you use these tools to build and start your system, be sure to use only one command to complete the task.

A common mistake is that in an automated build it does not completely include what is needed to build. The database schema file should be taken from the code base and executed automatically during the build process. In the light of the principles I've outlined above, anyone should be able to pull code from a new machine and run the system with just one command.

Build scripts are diverse and often specific to a platform or community, but this is not necessarily the case. Most of our Java projects use Ant, while others use Ruby (The Ruby World Rake is a great build tool). We used ant to complete the early construction automation of a Microsoft COM project and to get a lot of benefit from it.

Large builds usually take a long time, and you don't want to run all of the build steps if you only make minor changes. As a result, a good build tool can analyze where the changes need to be made and make the analysis process itself part of the entire build process. It is common practice to check the modification date of the source code and the target file, and compile only if the source code is modified later than its corresponding target file. Dependencies have become subtle: If a target file is modified, the files that depend on it also need to be rebuilt. Some compilers can handle this dependency, and some are not.

Depending on your needs, you can choose different things to build. The build can include tests, can be excluded, and can even include different test plates. Some components can be built separately. The build script should allow you to make different build goals for different situations.

Most of us use the IDE, and most Ides have more or less integrated build management capabilities. However, building files like this is often IDE-specific and very fragile. In addition, they need to rely on the IDE to work. While it is not appropriate for developers to make such a build configuration in the IDE, a master build script that can be called by other scripts is critical for a continuous integration server. For example, a Java project, each developer can build in their own IDE, but there should also be an ant master build script to ensure that the build can be completed successfully on the integration server.

Make build self-test

The traditional construction includes the process of compiling, linking and so on. At this point the program may be able to run, but that doesn't mean the system will run correctly. Although today's static language has been able to catch many bugs, but more through the slip.

A quick and efficient way to find bugs is to include automated tests in the build process. Of course, the test is not perfect, but it does find a lot of bug--enough. Especially with extreme programming (XP) heating up, test-driven development (TDD) also makes self-test code popular, and more and more people are beginning to notice the value of this technology.

Readers who often read my book may know that I am a big fan of TDD and XP, but I would like to emphasize that these two methods and self-testing do not necessarily relate to each other. Both TDD and XP require that the test code be written before the function code is passed. In this mode, testing is used both to discover bugs and to complete system design. This is very good, but not for continuous integration, because at this point our self-test code requirements are not so high. (TDD, however, is my first choice for writing self-Test code.) ）

For self-test code, you need a set of automated tests to detect bugs in a large number of code libraries. The test can be run with a simple command and has a self-test function. The results of the test should indicate which tests were unsuccessful. For self-test builds, test failures should result in build failures.

In the past few years, TDD has made the open source Xunit family Popular and an ideal test tool. The thoughtworks,xunit is already a very useful test tool and I also often advise people to use it. Originally developed by Kent Beck, this set of tools makes it easy to build a self-test environment.

Xunit is well-deserved to be your starting point for code self-testing. Of course, you should also look at the more side-to-end testing tools, including Fit,selenium,sahi,watir,fitnesse, and so on, I will not list them individually.

Of course, don't expect the test to be omnipotent. As the saying goes, testing does not mean there is no bug.

Each person submits code to the main line every day

Integration begins with communication, which enables other members to see the changes you have made. In this frequent communication, everyone can quickly know the changes made in the development process.

Before committing the code to the mainline, the developer must ensure that the local build succeeds. This, of course, also includes making the test all pass. In addition, you need to update the local code to match the mainline code before committing, and then locally resolve the conflict between the mainline code and the native code before you build it locally. If the build succeeds, you can submit code to the main line.

With this frequent submission, developers can quickly find conflicts between their code and others ' code. The key to solving a problem quickly is to quickly identify problems. A few hours of commit interval makes code conflicts can also be found in a few hours, at this time everyone's changes are not much, the conflict is not big, so conflict resolution is also very simple. Conflicts that cannot be found for weeks are often difficult to solve.

Build when updating the local code base, which means that we can discover both text conflicts and compile conflicts. Since the build is self-testing, runtime conflicts can also be detected, and such conflicts are often particularly annoying bugs. Because the commit interval is only a few hours away, the bug has little hiding place. Furthermore, since there are not many changes to each commit, you can use diff-debugging to help you find these bugs.

My basic principle is that every developer should submit to the code base every day. In practice, the more frequently it is submitted, the less likely it is to cause conflict, and thus the easier it is to discover.

Frequent submissions encourage developers to split their code in hours to make it easier to track progress. Often people start to think that they can't do anything in just a few hours, but we find a mentor and a lot of practice to help them learn.

Each commit should be built on the integration machine

With daily submissions, there is also a daily test, which should indicate that the main line is in a healthy state. But in practice, one of the reasons for this error is discipline--someone who hasn't been locally updated and built before committing. In addition, the environment between the different development machines is also a reason.

Therefore, you should make sure that you build on the integration machine and that your task is complete only if the build on the integration machine is successful. Since the submitter is responsible for his submission, he has to stare at the build on the main line, and if it fails, modify it immediately. If the changes you submitted before work fail, then, I'm sorry, please revise it before you go home.

I've seen two ways to ensure the success of mainline building: one is to build manually, and the other is to use a continuous integration server.

Manual builds are the simplest, basically similar to what developers do locally-first pull the latest code from the main line on the integrator, then run the build command, and during the build you have to stare at the build process, and if the build succeeds, it shows that your task is complete. (See also Jim Shore's description.) ）

The continuous Integration Server monitors the code base and, once a commit is detected, automatically pulls the code down to the local computer, starts building, and notifies the submitter of the structure. Only when the submitter receives a notification-usually by e-mail-does it indicate that his or her task is complete.

At ThoughtWorks, we are a loyal fan of continuous integration servers, and we have led the initial development of CruiseControl and Cruisecontrol.net, both of which are widely used CI servers. Since then, we have also developed commercial cruise. In almost every project, we used a CI server, and the results were enjoyable.

Not everyone is inclined to use the CI server, and Jim shore gives a good discussion, in which he explains why he prefers to build manually. I agree with him.--ci is just installing some software, and all practices should be designed to effectively complete continuous integration. However, many teams that use CI servers do find CI servers to be a good tool.

There are many teams that build on a regular basis, such as nightly builds. This is not the same as continuous building, and is not enough for continuous integration. The key to continuous integration is to identify problems as quickly as possible. Building every night means that bugs are not found throughout the day, so it takes a long time to discover and understand these bugs.

The key to continuous building is that if the mainline build fails, you should make changes right away. In continuous integration, you have been developing on a stable code base. The failure of the mainline build is not a bad thing, but if this happens often, it means that the developer is not concerned about the local update or is not building it locally until it is committed. Once the mainline build fails, it must be fixed immediately. To avoid the failure of the mainline build, you might want to try pending head.

Build quickly

The key to continuous integration is rapid feedback, and the long-established CI is extremely bad. Most of my colleagues think that the one-hour build time is no reason for CI. I also remember that there was a team who dreamed how fast their build could be, but sometimes we had to face situations that were difficult to build quickly.

For most projects, it is reasonable to keep the build time within 10 minutes, which is one of the principles of XP, and most of our projects have achieved this goal. This is worthwhile, because the time saved is for developers to save.

If your build is an hour long, it's not that easy to make it faster. For enterprise applications, we find that the bottleneck of build time usually occurs in tests, especially those that require external interaction-such as a database.

Perhaps the best solution is to introduce a phased build (also called a build pipeline or deployment pipeline) because the build is actually staged. The first thing that triggers after a code submission is that the build is called a commit build, and the commit build should be done quickly, and the tricky thing is how to maintain the balance between speed and the bug finding.

Once you've submitted a build, someone else can work confidently. However, you may have other slow-running tests that require writing, and you can use additional machines to run these time-consuming tests.

A simple example is to divide the build into two phases, the first stage to complete the compilation, and to run the unit tests that do not require external interaction, and the database interaction is completely eliminated by stub. These tests can run out quickly, with the principle of keeping them within 10 minutes. However, there is nothing you can do about a bug that requires a lot of external interaction-especially when it comes to real database interactions. The second phase of the test run requires the operation of a real database, and should also include end-to-end testing. This phase may take several hours.

In this case, the first phase is usually considered a commit build, and this is the primary CI cycle. The second stage can be done if necessary, and it does not need to "stop all the work at hand" as it did in the first phase, but it should be revised as soon as possible. The second phase of the build doesn't have to be kept going, and can be modified in the next few days for bugs that have been discovered. For this case, the second stage is all testing, as the slowest test is usually the case.

If a bug is found in the second-stage build, it usually means that a new test should be introduced in the first phase to ensure it.

Of course, the above two-stage build is just one example, and you can join multiple build phases altogether. Other builds that are submitted after the build can be done in parallel, and can be done in parallel with several machines if the phases are built to take several hours. With this parallelization, you can introduce all the tests outside of the commit build to the build process, such as performance testing.

Run tests in a copy environment with a production environment

Tests are designed to identify problems that may occur in a production environment, so if your test environment is different from your production environment, it is likely that the test will not uncover bugs in your production environment.

Therefore, your test environment should be as similar to the build environment as possible. Use the same database, the same operating system, and the version should be the same. In addition, the library files in the production environment are also placed in the test environment, even if they are not used at build time. The IP address and port number should also be the same and, of course, hardware included.

But in fact there is a limit to this. If you're developing desktop software, it's hard to predict which third-party libraries your customers are using. Furthermore, the production environment can be very expensive. Even if there are so many limitations, you should try to replicate the production environment and be familiar with the risks that may result from the difference between the test environment and the production environment.

If you build an environment that is simple enough and doesn't have much annoying external interaction, your commit build can be done in a simulation environment. However, you may need to test doubles for reasons such as slow system response. As a result, it is common to run commit builds in a manual environment to get the speed and run other tests with a copy environment in a production environment.

I've noticed that virtualization technology is attracting more and more people's interest. Because virtual machines can save everything that is needed to build, it is relatively easy to run builds and tests in a virtual machine. In addition, virtual machine technology allows you to run multiple tests on a single machine, or you can simulate multiple machines accessing the network at the same time. As the performance of the virtual machine increases, it will attract more attention.

Make executable files easily accessible to anyone

One of the most difficult things about software development is that you can't guarantee that the right software is being developed. We find that it is often difficult to predict what you want, and on the contrary, it is much easier to judge and modify what is already there. The agile development process is precisely in line with human behavior habits.

To do this, all members of the project should be able to get the latest executables and run successfully, for example, to make a presentation, browse the tests, or just see what the project has changed this week.

This is easy to achieve: Make sure a common place to store the latest executable files. It is also useful to store multiple executables in the same place. For the most recent executable file, you should ensure that you can pass the test submission.

If your development process has a good iteration plan, it is also wise to store the resulting executable file for each iteration of the last build.

Everyone can see what's happening.

Continuous integration is primarily about communication, so it is important to ensure that everyone can easily see the status of the current system and the changes that have been made.

The build state of the mainline is very important, and the cruise server contains a Web site where you can see the current build status and the results of the last mainline build, and many teams like to use a more conspicuous logo to react to the build state, such as putting a light on the screen, a green light indicating a successful build, and a red flag failing. It is especially common that lava lamps--not only indicates the build state, but also shows the build time. If you have bubbles in a red light, the build has failed for a long time. Each team has its own choice, of course, the best for their own.

This visibility is also important for manual, continuous integration processes, and the display of the build machine should show the state of the mainline build. Usually, the person who is doing the integration will put a token on the table to show that he is doing the integration. People like to play some simple sounds, such as alarms, when the build is successful.

Of course, the CI server website can show more information. Cruise can not only show who is building, but also show the latest submitted changes. In addition, cruise can view the submission history so that team members can be very clear about the progress of the project. As far as I know, the head of some teams is this way to understand the work of project members and the overall system changes.

Another advantage of using CI sites is that even people who don't work together can see the status of the current project. In addition, you can put together information about the different projects you build.

It is not only CI sites that show build information. Because the instability of the construction is always there, we can draw the year-round calendar on a wall, one block per day, if the build is successful, QA will be on the day of the block affixed with a green label, otherwise labeled Red Label. As time goes on, this calendar will show the stability of the project.

Automating deployment

Continuous integration requires a variety of environments, and different build phases require different environments. Every day, the project's executables move between these environments, and you want to automate them. Therefore, automating deployment scripts is important, including not only scripts for the test environment, but also deployment scripts for production environments. While we are not deploying to a production environment every day, automated deployment accelerates the deployment process and can reduce deployment errors.

If you already have an automated deployment of your production environment, you should also consider the corresponding automated rollback. Because failure is something that happens at times, in this case, we want to quickly roll back to the state before the failure. In this way, we are in the deployment is not so timid, so we can frequently release software, users can also enjoy the new features as soon as possible. (The Ruby on Rails community has a tool called Capistrano, which is a typical example.) ）

In a clustered environment, I saw a scenario where only one node was deployed at a time, and the deployment of all nodes was gradually completed within a few hours.

Another interesting way to deploy some of the most popular web applications is to pilot the deployment of a subset of users, and then decide whether to deploy to all users through the user's trial. Automated deployment, as a principle of CI, can be well qualified for these tasks.

The difficulty of continuous integration

There are so many benefits of continuous integration, and the idea of practice is clear, so why are so many teams not doing well enough? There are a lot of difficulties in doing continuous integration, which will be analyzed below.

Many maintenance-period products are dispersed in many code repositories or many branches, which is difficult to maintain uniformly
This is especially true for software products that have survived for years and are still profitable for the company. In another case, the company acquired software products, because of historical reasons, it is difficult to integrate together.
Automated testing
This is the hardest. To be honest, we can take a look around the team and test the extent of the automation. I think more than 80% of the projects are not doing well at this point. It is difficult to do a good job of automated testing, a) the SOFTWARE product itself structure of the reasons for the organization. For example, the software architecture coupling is very high or too fragmented, it is difficult to automate testing. B) There are also reasons for the form of software products. For example, if the software products mainly provide interface or clear services, then it is easier to automate, if the product is mainly a web interface, this automated testing is relatively difficult. C) Another important reason is the maintenance cost of automated testing. Automated test cases are not overnight. While writing the business code, it is necessary to complete the corresponding test cases. This is a cost to take. Does the team have the energy and time to do it? At first, it may be good to say that once the business pressure comes, you will find that many test cases can not get through, and finally nothing.
Build quickly
It takes time and cost to build. There are hardware factors, there are linguistic factors, as well as software architecture factors.
- How much are we willing to pay to buy a machine to do this thing? When some companies already used top with trash to compile their own apps, and some companies with the kind of Mac Mini is not SSD. This is the gap.
- Some systems are php,ruby,python and other explanatory language, some just need to compress, confusing packaging can be on the line, this speed of course, but there are many systems with C/c++,golang, which is relatively time-consuming. This is a linguistic factor.
- Software architecture. This has a significant impact on the build time. For example, a large C + + system, if the module is relatively independent, relying less on each other, it is completely possible to divide the entire product into a number of small modules for parallel compilation. Then the final overall time is determined by the longest-consuming module. You can drop the overall build time at once.
- There are a lot of parallel distributed build systems that are charged, which involves a license purchase to the problem.
Many environments are difficult to simulate
Although some of the company's products are now a Web site, an app, but have to say that there are many large-scale banking systems exist. These companies in the online system on a set, it is difficult to find a pre-launch environment, many are written well, directly on-line test. Mistakes are hard to avoid.

While we can now use better and stronger CPUs, larger memory, faster storage (such as SSDs), parallel builds, distributed builds to speed up our build process, these are visible costs, and the automated tests that take a lot of time to do will cost a lot of labor. , and for some special industries, sometimes it is difficult to find a pre-production environment, which is very embarrassing.

Summary

It's easy to do continuous integration at first, and it's really good to do it.

Abbreviation Explanation:

Ee:electrical Engineering, electronic engineering commonly known as EE or double E

Cs:computer Sciences, Computer science

Swe:software Engineering, Software engineering

Resources:

Code complete ([US] Steve Maik Cornell)

Continuous integration ([US] Martin Fowler)

Continuous integration ([China] Jack Zhou translation)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Continuous integration of configuration management

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Continuous integration of configuration management

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support