Micro-service construction based on GO technology stack-microservices

Source: Internet
Author: User
Tags http 2
This is a creation in Article, where the information may have evolved or changed.
Editor's recommendation:
This article comes from the technical website, the material of this article originates from some of our best practice cases in the development, from the development, the monitoring, the log these three aspects to introduce our micro service builds the experience based on the go technology stack.

In the microservices construction of large systems, a system is split into many modules. These modules are responsible for different functions, group synthesis systems, and ultimately can provide rich functionality. In this form of construction, developers typically focus on maximizing the functionality of the decoupling module to reduce the additional development costs associated with inter-module coupling. At the same time, MicroServices are confronted with new problems such as how to deploy such a large number of service systems, and how to operation and maintenance of these systems.

This article is based on some of our best practices in development, from the perspective of development, monitoring, and logging, we introduce some of our experience in micro-service building on the Go technology stack.

Development

During the development of MicroServices, different modules are responsible for different developers, and clearly defined interfaces help to identify the developer's work tasks. In the final system, a business request may involve multiple interface calls, and it is also a challenge to have an accurate and clear call to the remote interface. For these problems, we use GRPC to be responsible for the formulation and invocation of the Protocol.

Traditional microservices are often based on HTTP protocols for inter-module invocation, and in our microservices build, Google's GRPC framework is chosen for invocation. The following table compares the features of the HTTP RPC framework with GRPC:

The GRPC interface needs to use the PROTOBUF3 definition, which can be successfully called by static compilation. This feature reduces the communication costs associated with interface changes. If you use HTTP RPC, the interface changes need to change the interface document, and then known to the caller, if the caller is not modified in a timely manner, it is likely that the service will be run to find errors. In this mode of GRPC, error guarantees caused by interface changes can be eliminated at compile time.

In terms of performance, GRPC has improved significantly compared to the traditional HTTP RPC protocol (according to this review: Https://dev.to/plutov/benchmarking-grpc-and-rest-in-go-565,gRPC is 10 times times faster). The GRPC uses the HTTP 2 protocol for Transport, comparing the HTTP 1.1,http 2 multiplexing TCP connection, reducing the overhead of each request to establish a TCP connection. It should be pointed out that if the pure pursuit of performance, the previous industry will generally choose to build on the TCP protocol RPC protocol (thrift, etc.), but the four-layer protocol can not easily do some transmission control. In contrast, GRPC can be placed in the HTTP header control field, with Nginx and other proxy server, can be easily implemented forwarding/grayscale and other functions.

The next step is to talk about how we use some of the features of GRPC in practice to simplify the development process.

1. Use context to control the life cycle of the request

In GRPC's go language implementation, the first parameter of each RPC request is the context.

The HTTP2 protocol places the context in the header and passes along the link, so you can set an expiration time for each request and, once the timeout is encountered, the initiator ends the wait and returns an error.

CTX: = Context. Background ()//blank context
CTX, Cancel = context. Withtimeout (CTX, 5*time. Second)
Defer Cancel ()
Grpc. Callservivex (CTX, arg1)

The above code, the initiator set the waiting time of about 5s, as long as the remote call does not return within 5s, the initiator will error.

In addition to being able to join the timeout period, the context can also add other content, the following we also met with the context of another magical.

2. Implementing access Control with TLS

GRPC integrates the TLS certificate function, which provides us with a perfect control scheme. In practice, assuming that service a exists in our system, because it is responsible for manipulating the user's sensitive content, it is necessary to ensure that a is not abused by other services within the system.

To avoid misuse, we designed a self-signed two-level certificate system, service a mastered the self-signed root certificate, and issued a secondary certificate for each service that invokes a. In this way, all calls to a service must be authorized by a, a can also identify the caller of each request, so it is convenient to do some logging, traffic control and other operations.

3. Tracking requests online using trace

The GRPC has a trace system with a tracking request that tracks detailed log information for the last 10 requests or records all requested statistics.

When we join the trace log for the request, the trace system logs the last 10 requests for us, and the example shown in the trace log is the tracking of the business data.

On the macro level, the trace system records the requested statistics, such as the number of requests, the distribution according to the different request time, etc.

It is necessary to note that this system exposes an HTTP service that can be turned on or off at run time via the debug switch to reduce resource consumption.

Monitoring

1. Determine monitoring metrics

The first question we faced when we received the task of setting up a monitoring system for the entire system was what to monitor. In response to this question, Google SRE This book provides a very detailed answer, we can monitor the four major gold indicators, namely delay, flow, error and saturation.

The delay measures how long the request takes. It is important to note that, given the long tail effect, the use of average delay as a single indicator of delay is far from sufficient. Accordingly, we need to delay the median 90%, 95%, 99% values to help us understand the delay distribution, there is a better way to use the histogram to statistical delay distribution.

Traffic measures the demand pressure on the service. Traffic statistics for each API let us know the hot path of the system and help to optimize it.

Error monitoring is the statistic of the result of an incorrect request. Similarly, each request has a different error code, and we need to count the different error codes. In conjunction with the appeal police system, this type of monitoring allows us to perceive errors and intervene as early as possible.

Saturation mainly refers to the load monitoring of the system CPU and memory. This type of monitoring can provide a basis for our capacity-expansion decisions.

2. Monitoring selection

When choosing the monitoring scheme, we are faced with two main choices, one is the company's own monitoring system, and the other is the use of open source Prometheus system to build. The differences between the two systems are listed in the following table.

Given that our entire system has about 100 containers distributed across 30 VMS, Prometheus's stand-alone storage is not a bottleneck for us. We don't need to keep historical data intact, and the biggest advantage of our self-built system is not enough to attract us to use. Conversely, because of the desire to count the many indicators that have been produced by the four major gold indicators, Prometheus's convenient DSL simplifies our indicator design to a great extent.

Finally, we chose Prometheus to build a monitoring system. The framework of the entire monitoring system is as shown.

Each service registers its own address in Consul, Prometheus automatically pulls the target address that needs to be monitored from the consul, then pulls the monitoring data from those services and stores it in the local storage. In Prometheus's own Web UI, you can quickly use the PROMQL query statement to get statistics, and you can also input query statements Grafana, fixed monitoring metrics for monitoring.

In addition, with the plug-in Alertmanager, we can write alarm rules, when the system is abnormal, the alarm will be sent to the phone/mail/mailbox.

Log

1. Log format

A frequently overlooked issue is how to choose the format of the log records. A good log format facilitates the subsequent tool's cutting of log content and facilitates the indexing of log storage. We use Logrus to print the log to the file, the Logrus tool supports the log format wrapped in a space-delimited single-line text format, JSON format, and so on.

Text Format

Time= "2015-03-26t01:27:38-04:00" Level=debug g= "Started

Observing Beach "Animal=walrus number=8
Time= "2015-03-26t01:27:38-04:00" Level=info msg= "A Group"

Of walrus emerges from the ocean "Animal=walrus size=10

JSON format

{"Animal": "Walrus", "level": "Info", "msg": "A Group of

Walrus emerges from Theocean "," size ": Ten," Time "

: "2014-03-10 19:57:38.562264131-0400 EDT"}
{"Level": "Warning", "MSG": "The group ' s number increased tremendously!", "number": 122, "OMG": true, "time":

"2014-03-10 19:57:38.562471297-0400 EDT"}

2. Call log collection on end-to-end links

In a microservices architecture, a business request undergoes multiple services, and collecting logs from an end-to-end link can help us determine where the error occurred. In this system, we generate the global ID at the request entrance, passing the ID in the link through the context in GRPC. The logs from different services are collected into Graylog, and queries can be queried through an ID to query the entire log on the link.

, full-link retrieval can be performed using Session-id as the ID of the entire call chain.

Summary

MicroServices build systems that have challenges in deployment, scheduling, service discovery, consistency, and so on, and the Go technology stack has best practices in these areas (Docker, Kubernetes, Consul, ETCD, and so on). Specific content in the online has a very good tutorial, there is no swim, there is a need to self-check.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.