Since 2014, when I was exposed to microservices, I found that in the evolution of microservices,
development, testing, and operation and maintenance required close love and close cooperation to achieve the desired results.
This series of articles mainly includes three parts:
Part 1: Microservices and DevOps;
Part 2: Microservices ecosystem;
Part 3: Engineering practice of microservice architecture;
This article focuses on the third part: the engineering practice of microservice architecture.
Part 4: Engineering Practice of Microservice Architecture
Finally, the engineering practice of microservice architecture. This is Netflix's architecture diagram after moving his business from the data center to
Alibaba Cloud in seven years from 2009 to 2016. For our system, does it mean that when we split the architecture into 50 and 100, we can also obtain such benefits?
This is the first issue that many organizations and teams consider when doing microservices. If we split the structure into 50 and 100, can we get the same benefits? the answer is negative. The chief
cloud architect of Netflix said that they have done a lot of evolution on process tools and practices.
1 Development Practice
The excellent practices we have followed in the past microservice evolution process. For development, I will only mention three practices here:
First of all, for each service, the most basic and simple needs to build an independent code warehouse for each service, the purpose is to enable this service to be isolated, rather than found a lot of related code after opening.
2. The second point-self-explanatory documentation. Because in many cases in the past, I found that I might use Word
To write a document, it may be written in another way, but this document will involve a lot of design content. In many practices, I don’t care much about the process and design of the service itself. What I care more about is what I want to know about this service. , Who can coordinate with this service.
As a new developer, how long can I spend to run this service, which means where is the code address of this service, where is the CI, and when I deploy, how to complete the production environment and test environment? deploy.
3. Third-easy to run locally. When we did a service split, we found that many of the services in the team could not be run independently on the local. How to let developers quickly deliver, here are three commonly used methods.
The first is to use the local environment configuration plus Mock, which can be defined as its own MOCK, and quickly build the services we care about in the local environment. The second is that Docker-compose can also be used in a shared environment, and some dependent environments can be deployed on the cloud.
This is a real case in the past. We defined the ETL service. What we do is synchronize
the data in the service to the original system. This model is very common in the evolution of microservices. This process is for data transmission. , The most important thing is the oil mark, I want to ensure that my transmission will not be repeated, and the data will not be lost, so we save the oil mark on Amazon S3.
However, when developing locally, the efficiency of development decreases because the access to S3 from the local is slower. In order to solve this problem, we made S3 a Mock, which can be defined with any Mock framework, so that when running locally, I don’t need to care about the specific S3 operation. I only need to care about catching it, because it is a third-party business. , Very stable.
4.2 Test practice
For testing, the triangle of the test strategy must be clear. When we think about testing, it is not only a systematic test, but unit, integration, components and end-to-end testing. The value that each test brings to us is different. The more the bottom layer, the higher the unit test efficiency, but there is no way to consider the business value problem from the perspective of the real user.
The higher you go, the easier it is to describe. For example, I hope that the user can enter the user password and log in successfully, but if the test it brings is only done from above, the cost is very high. Because we need to open the webpage either manually or automatically, the feedback cycle is also relatively slow. Maybe you did a lot of tests on the page, and then found that the last error is due to the wrong variable in a certain line of code by the developer.
For testing, the triangle of the test strategy must be clear. When we think about testing, it is not only a systematic test, but unit, integration, components and end-to-end testing. The value that each type of test brings to us is different. The more the bottom layer, the higher the unit test efficiency, but there is no way to consider the business value problem from the perspective of a real user.
The higher you go, the easier it is to describe. For example, I hope that the user can enter the user password and log in successfully, but if the test it brings is only done from above, the cost is very high. Because we need to open the webpage either manually or automatically, the feedback cycle is also relatively slow. Maybe you did a lot of tests on the page, and later found that the last error is due to the wrong variable in a certain line of code by the developer.
Contract-based testing has a framework in the industry called "Consumer Driven Contract Testing Practice", which combines BDD and TDD
The none-mode leads to the architecture level. For the two services, if I have consumers and providers, my value point must be with the consumers. I want to drive my contract from the consumer logic itself. After this step, take this contract to verify my provider, this way has simplified a lot.
The biggest advantage in this way is that it takes our two online integration tests. For example, we have done integration tests in the past. The easiest thing to think of is to deploy all services and send them in an automated or manual manner. Requests, allowing requests to interact with the process or provide results, are very demanding on the stability of the service.
So after this way in the past, you will find that I can turn the original integration test into two independent offline unit tests. The concern on the left is whether the consumer can drive the contract, and the concern on the right is whether the contract can be verified by the provider.
This is the framework we use the most is called Pact. Why we chose this is because it is better to support multiple languages. At the same time, there is a very important point. We can collect all the generated contracts because the contract describes the request and The response process can be redrawn, and the call graph is easily implemented.
4.3 Deployment Practice
The third is deployment. When my continuous integration builds the package and stores the package somewhere, I only need to care about my deployment pipeline, where to get the package, and deploy it to a class production and production environment.
The core challenge is to deploy the pipeline, and make a small suggestion. For the deployment pipeline, there are many tools, but the core is that we need to know very clearly how our packages are managed and the package is versioned. 2. The naming method of packages, for example, when doing services in the past, the naming method of each service is very clear.
Second, as far as deployment is concerned, we have to do our utmost, especially colleagues who are engaged in operation and maintenance. Only three parameters are required in a command: first my service name, second my version number, and third is my development Environment, which is the core issue to be considered in the process of deployment evolution, of course for
What tools and methods to use after deploy are negotiable within the team, and there are many open source tools.
4.4 Operation and Maintenance Practice
Finally, there is
operation and maintenance, which includes three points, monitoring, alarming and log aggregation. I only mention a few key points of the core. System monitoring is concerned with CPU memory and disk; application monitoring is concerned with the application itself, which may have response time and health checks.
The third part is the extended part. Do we have to consider constructing the business scenario of the application itself, for example, through some collection methods, to monitor whether the business is running normally during the operation process, not just to check the availability. There must be a clear notification method in the alarm, and there must be an alarm escalation strategy.
oneCall engineer, Backup engineer and service owener.
The first step is to evaluate the impact;
The second is to send emails within the group, let an experienced engineer help you to evaluate first;
The third is Mitigate, you need to spend the fastest time to cover this service;
This process is a common process for online services to do O&M.
Log aggregation. For service, 50, 100, 200, how to get your logs for this service, you need an independent log aggregation mechanism that can help us collect many service call chains, analyze and alert the logs, commonly used With ELK, log output and collection can be easily supported. These three points are what I think may be ignored by everyone in the operation and maintenance monitoring.