Continuous integration (Continuous integration, hereinafter referred to as CI) as one of the advanced project practices, has gradually been the focus of domestic software companies in recent years, but for many friends, may have never heard of the word continuous integration, or just understand the concept but did not practice.
What is continuous integration. What benefits does it have for software development?
concept of continuous integration
With the increasing complexity of software development, how to work well together among team members to ensure the quality of software development has gradually become an unavoidable problem in the development process. Especially in recent years, agile is becoming more and more prosperous in the field of software engineering, and how to adapt and guarantee the software quality in the changing demand is especially important.
Continuous integration is a software development practice for this type of problem: it advocates that team development members must often integrate their work, even multiple times a day. Each integration is validated through automated builds, including automated compilation, publishing, and testing to quickly identify integration errors and enable teams to develop products faster.
Let's take a project as an example to describe how a common team uses CI:
First, explain the integration: all project code is hosted on the SVN or Git server (hereinafter referred to as the Code server). Each project has several unit tests and integration tests. Integration testing is a logical extension of unit testing: On the basis of unit testing, all modules are assembled as subsystems or systems for integration testing according to the design requirements. The practice shows that although some modules can work independently, they are not guaranteed to work properly. Some of the problems that are partially reflected in the world are likely to be exposed (detailed information about unit tests and integration tests, which readers can refer to for documentation).
Simply put, the integration test runs all the unit tests and other tests that can be done automatically. Only the code that passes the integration test can be uploaded to the code server, ensuring that the uploaded code is not problematic. Integration is generally referred to as integration testing.
Continuous, it is obvious that the long-term integration testing of the code. Since it is long-term, then it is best to automate, otherwise manual execution is not guaranteed, and labor-intensive.
For this purpose, we need a server that will periodically pull from the code server, compile, and then run the integration test automatically, and the results of each integration test will be documented.
In a project, the period for which this work is performed is set to 1 days. This means that the server will automatically perform an integrated test of the latest code on the code server on a daily basis.
features of continuous integration It is an automated, cyclical integration test process, from pull-out codes, build builds, run tests, result records, test statistics, and so on, without manual intervention, with dedicated integration servers to perform the integration build, and code-hosting tools support.
the role of continuous integration to ensure that team developers submit code quality, reduce the pressure on the software release, any one link is automated, without too much manual intervention, to reduce the duplication process to save time, expense and workload; In the process of practice, the author summarizes the following experiences: The sooner the code is push out, the sooner the user can use it, the faster it is the business value; The sooner the user uses the earlier feedback, the sooner the team gets feedback, the better or worse, the user does not give feedback, indicating that we do what the user does not want (through use case tracking) or the market is not good, can help the product market The more the backlog of code inventory, the more the backlog, the greater the probability of cross-infection between code, the higher the complexity and risk of the next release, the more the code inventory, the heavier the burden of workflow, the greater the cost of management.
Understanding the advantages of continuous integration, readers are not immediately deployed a set of impulses. However, it is not easy to deploy a continuous integration environment.
For example, a typical scenario for continuous integration of a Java project. All we need to do is build a continuous integration environment based on Jenkins + Maven + Git (SVN), plus the tests and processes required for continuous integration to run roughly.
But after building a complete line of continuous integration, it found that it was not as smooth as imagined, and even less efficient than the local development test. Why such a situation arises.
There are several common problems encountered in the continuous integration process.
"Compile-time dependencies and run-time dependencies"
It is not literally difficult to understand these two kinds of dependencies. While compile-time dependencies are often run-time dependent, they do not determine that compile-time dependencies must be run-time dependent. For example, some jar packages that provide APIs are required during development, but may be required at run time for specific API implementations of JAR packages. Second, the dependent package will have its own dependencies, in which case the project will have indirect dependencies (run-time dependencies) on those packages, and so on, eventually forming a dependency tree. When the project runs, the packages on these dependent trees must all be in place or the project will not run.
MAVEN uses scope to determine the type of dependency in the Pom, helping development and OPS to get rid of the manual processing of dependency trees, while all dependent packages at runtime are eventually installed into the production environment, and this part of the work Maven does not automatically complete. Therefore, a common way is to copy the package that the runtime relies on into the project file, such as the Java Web application's web-inf/lib, and then make the project all in one package. After the project package is installed, modify the environment variables to include the path to the corresponding environment variables, such as classpath.
As a typical example, the current operating system and other system frameworks take into account the processing of runtime-dependent trees, such as Ubuntu's Apt-get,centos Yum,ruby Rubygem,node npm and so on, when installing a package, The system automatically downloads and installs the required dependent packages sequentially.
"Complexity of Dependencies"
In addition to the dependencies on the package, there are specific requirements for the operating environment. For example, Web applications need to install and configure Web servers, application servers, data servers, and so on, and enterprise applications may require Message Queuing, caching, timed jobs, or exposing services to other systems as Web service or API. These can be seen as the dependencies of the project at the system level facing the outside. Some of these dependencies can be handled by the project itself, while others cannot be handled by the project, such as running containers, operating systems, etc., which are the operating environment of the project.
In summary, the complexity of dependencies is mainly two: dependency on version compatibility issues between packages. Compatibility issues are a nightmare for software developers. Indirect dependency, or multiple dependency issues. For example: A relies on Python 2.7,a also relies on B, but B relies on Python 3, forcing Python 2.7 and Python 3 to be incompatible. This is the most painful thing to depend on.
In a simple project, the development and operation of the environment are built by the developer, when the company becomes larger, the operating environment of the system will be built by operations personnel, and the development test environment if the operation and maintenance personnel to build the workload is too large, by the developers themselves set up the operation is complex and easy to produce inconsistent situation. Suppose the company developed a new version for project A and Project B. However, the environment and software upgrades are not synchronous, and the likelihood of errors is very high (think of indirect dependencies and multiple dependencies). There is no impression of such a scenario: When a new version is deployed, a problem is found, the test or deployer says: "There is a problem with the version and cannot be run." "The developer said:" I'm fine here, it's working. ”
If the project is simple, there is no drag on the history Project and code, and there is no association between the projects, just the management of resources: assigning machines, initializing systems, assigning IP addresses, etc. The running environment, database, development environment of each project are completed manually by the developer of the project, so the environment is out of the question. Very simple, cold--re-installed system.
The ideal is plump, the reality is very bony, the problem of history often has a great impact on current projects: multi-lingual (java,php, C), multi-system (various windows, Linux), Multi-build tool version (JAVA7, 8), Various configuration files and a variety of black technology patch scripts scattered all over the system, no one can find, and no one to understand. The configuration was changed by WHO, the service is down, there is no management.
After the analysis of the problem, how can we overcome these problems, using what tools and methods, so as to achieve silky smooth continuous integration. This is the next leading role: Docker, Apposoar and Apphouse.
Docker is now in full swing at home, Docker can be built once, everywhere to run, a variety of practices and production deployment also launched, Docker basic knowledge and benefits of the author is not here popular science.
Appsoar, Docker-based, enterprise application-oriented, security and stability as the goal, to create a professional, easy-to-use enterprise-class container cloud solution. It offers a range of powerful features such as Enterprise App Store, friendly graphical interface management, multi-environment management, hybrid cloud support, container persistent storage, container network mode, container scalability, container load balancing and scheduling, API interface support, and system high availability.
Apphouse provides an enterprise-level image Warehouse management solution for container running platform, which integrates image management, mirror security, mirror high availability and mirror high speed access. Apphouse support HTTPS access, role-based permissions control, high availability of the system, flexible deployment, one-click Installation, one-click Upgrade, friendly graphical interface, ldap/ad user integration, support swift/glusterfs/public cloud storage access, and provide rich API interface ; Support to Git source Repository, code build automatically, provide Webhook interface, support continuous deployment. Resolves a package of Docker build, push, pull issues.
Anyway, since in order to achieve smooth continuous integration requirements, then why choose to use Docker, Appsoar, Apphouse to deploy.
As we mentioned above, the traditional way to build a continuous integration platform often encounters 4 problems: Compile and run-time dependencies, dependency complexity, inconsistent environments, and flooding deployments, and then we'll analyze how the next Docker+appsoar+apphouse combination solves these problems.
First, Docker makes it easy and convenient to deploy your app in a "containerized" manner. It is like a container, packaging all dependencies, and then deploying it on other servers is easy, not changing the server and discovering that the various configuration files are scattered, which solves the problem of compile-time dependencies and runtime dependencies.
Second, Docker's isolation makes the application run as if it were in a sandbox, and each application considers itself to be the only program running in the system, as in the example above, a relies on Python 2.7 and a also relies on B, but B relies on Python 3, so that we can deploy a Python 2.7-based container and a Python 3-based container in the system, which makes it easy to deploy many different environments in the system to solve the problem of dependency complexity. Some friends here may say that virtual machines can also solve such problems. True, virtualization can do this, but this requires hardware support for virtualization and the ability to turn on virtualization-related features in the BIOS, as well as the need to install two sets of operating systems in the system, the emergence of virtual machines to solve the problem of strong coupling between the operating system and the physical machine. But Docker is a lot lighter, just kernel support, without hardware and BIOS requirements, can easily and quickly deploy multiple sets of different container environments on the system, the emergence of containers to solve the application and operating system strong coupling problem.
Because Docker is application-centric, the image is packaged with the environments needed for applications and applications, once built and run everywhere. This feature perfectly solves the problem of environmental inconsistency in traditional mode after application migration.
At the same time, Docker does not matter how the internal application starts, you love how to come, we use Docker start or run as a unified standard. This allows the application to be launched in a standardized way, without the need to memorize a bunch of different startup commands based on different applications.
Docker-based features are now common in the process of continuous integration using Docker as follows: Developers submit code, trigger image build, build image upload to private repository, image download to execution machine, mirror run.
Its basic topology is shown in Figure 1.
Figure 1 Continuous integration of basic topologies using Docker
Friends who are familiar with Docker know that Docker starts very fast and can be said to be a second start. In the above five steps, 1 and 5 are time-consuming, the entire continuous integration is mainly time-consuming to focus on the middle of 3 steps, that is, Docker build, Docker push, DOCEKR pull, or not to achieve the extreme requirements of smooth, down we analyze the next build, push, Time-consuming and workaround for pull:
Docker build network optimization
Dockerhub's official image abroad, for well-known reasons, the network will be a big bottleneck when building at home, even some companies have no internet connection.
In this case, it is recommended to use the domestic mirror source, or set up a private warehouse, to save the basic image of the project needs, the construction process of network transmission are controlled in the domestic or intranet, so that no longer consider the network problems. Using the. dockerignore file
The Dockerignore file is designed to exclude unneeded files and directories from the Docker build process, so that the Docker build process can be as fast and efficient as possible and that the built-in image has no unnecessary "junk". Minimize the number of mirrored layers (layers)
Minimizing the number of mirrored layers can speed up the container's start-up, but there's another problem to weigh: dockerfile readability. You can write a dockerfile complex to create a mirror of the minimum number of layers, but at the same time your dockerfile readability is reduced. So we're going to compromise between the number of mirrored layers and the readability of dockerfile.
The Docker Registry has been upgraded to V2 with a number of security-related checks, and the mirrored storage format in V2 becomes gzip, and the image takes up more time in the compression process. Let's simply break down the process of Docker push. Buffer to disk, the layer file system to compress the cost of a temporary file, upload files to registry, local compute Compressed Package digest, delete temporary files, digest to registry; Registry compute the Upload Package digest and verify; Registry transfers the compressed package to the backend storage file system, repeats 1-5 until all layers have been transferred, calculates the manifest of the image and uploads to registry repeat 3-5.
This design results in a slow push, and if you use the official Dockerhub, you need to consider the network implications mentioned in the Docker build section, and the Dockerhub Public Image Library also takes security considerations into account.
At the same time, Docker and registry set up too many security precautions (such as two-way certificate authentication), mainly to prevent the image of the public cloud in the environment of forgery and unauthorized access. But in a trusted environment, if the build and push processes are in their own hands, many measures are superfluous.
The speed of the Docker pull mirror is critical to the service startup speed, but the registry V2 can be pull in parallel, and the speed is greatly improved. However, there are still some small problems affecting the speed of the boot: Download image and decompression image is serial, serial decompression, because v2 are gzip to decompress, although the parallel download or serial decompression, the intranet words decompression time than the network transmission is longer; and registry communication, Registry in the pull process does not provide download content only to provide download URL and authentication, this part of the extended network transmission, and some metadata to go back-end storage access, delay or some.
Through the analysis, we can see that, in fact, the Docker build, push, pull is actually time-consuming in the network transmission (main) and security precautions (slight), the entire transmission process even more than the time of all other steps This allows our apphouse to easily build a local enterprise-grade image warehouse, transfer the network transmission to the intranet, and take full control of the build, push and pull processes, thus improving efficiency and resolving security issues, which can be described as double benefit.
With the help of Docker and Apphouse, we are only one step away from the pursuit of the ultimate silky smooth continuous integration goal, Docker solves the dependencies and environment problems, Apphouse solves the problem of the security and fast transmission of the mirror, then the deployment and management of the container.
Docker implements the innovation of the underlying technology, which releases the developer from the entanglement with the system, but the problem that hinders the enterprise from using Docker is the large-scale deployment of containers, management problems, and the lack of enterprise-level container tools and systems.
Once the image is created, it needs to be published to the test and production environment. Because Docker consumes less resources, it's not surprising that you deploy hundreds or thousands of containers on a single server. How to use Docker more rationally in this phase is also a challenge, and the development team needs to consider how to build a scalable distribution environment.
Appsoar provides a user-friendly web management interface, rich compose file format and full-featured API interface to describe complex application structures with very simple files through compose implementation, making deployment easier. And, Appsoar also offers a rich enterprise App store that makes it possible to create services in one click. This allows the application to be built quickly, and developers need to focus on the development itself.
After the last link, the entire Continuous integration platform architecture evolves as shown in Figure 2.
Figure 2 Evolution of the overall continuous integration platform architecture
Through the combination of docker+appsoar+apphouse, the development team in the face of complex environment, can be combined with the actual situation of their own team, customized to suit their own solutions, so as to create a silk-like smooth continuous integration system.