The best technical combat from Docker to Kubernetes

Last Update:2018-07-18 Source: Internet

Author: User

Tags rabbitmq etcd opsgenie

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The best technical combat from Docker to Kubernetes

": Https://pan.baidu.com/s/18nnAZJeQvS_wO_SGBx-vvQ"

Docker is the cornerstone of a world-changing box, micro-service, leading the cloud into the 2.0 era

Docker is used to manage containers, k8s is a container orchestration tool, very powerful!

Docker is a challenging and interesting open source project that completely frees up the power of Linux virtualization, greatly easing the supply of cloud computing resources. At the same time, Docker has doubled the cost of cloud computing, making application deployment, testing, and development an easy, efficient and fun thing to do.

How does the software development landscape change in the Docker and Kubernetes era? Is it possible to use these technologies to build a once and for all architecture? Is it possible to unify the development and integration process when everything is "packaged" into a container? What are the requirements for these decisions? What restrictions do they bring? Do they make it easier for developers, or vice versa, to add unnecessary complexity?

Now it's time to articulate these and other issues in text and original illustrations!

This article will take you on a journey from real life to development and architecture to the end of your real life, and answer the most important questions you'll encounter at these docking stations. We will try to identify some of the components and principles that should be part of the architecture and demonstrate some examples, but not into their implementation areas.

The conclusions of the article may upset you or be very happy, depending on your experience, your views on the three chapters, and even your mood when reading this article. Below you can post a comment or ask a question and let me know what you think!

From real life to development workflow

In most cases, all the development processes that I have seen or are honored to build are for a simple purpose-to shorten the time interval between the concept generation and the delivery of the production environment while maintaining a certain level of code quality.

It doesn't matter whether the idea is good or bad. Because bad ideas come in a hurry, too--you just have to try and throw them into the mitchellsays. It is worth mentioning that a rollback from a bad idea can fall on the shoulders of an automated facility, which automates your workflow.

Continuous integration and delivery looks like a lifesaver in the field of software development. What could be simpler than that? If you have an idea, you have the code, then do it! If not a minor problem, it would be flawless-the integration and delivery process is relatively difficult to separate from company-specific technology and business processes.

However, despite the complexity of the task, there are some good ideas in life that can make us (and, of course, I am sure) closer to building a flawless mechanism that can be used in almost any situation. For me, the most recent step away from such a mechanism is Docker and kubernetes, whose level of abstraction and way of thinking makes me think that it is now possible to solve 80% of the problem in almost the same manner.

The remaining 20% of the problems are obviously still in place, but that's why you can focus your creative talents on interesting work, rather than dealing with repetitive routines. By taking care of the "architecture framework" once, you can forget the 80% issues that have been resolved.

What does all this mean? And how does Docker solve the problem of development work flow? Let's look at a simple process, which is sufficient for most work environments:

With the right approach, you can automate and integrate everything in the above sequence diagram and leave it behind in the coming months.
Setting up the development environment
?

A project should contain a DOCKER-COMPOSE.YML file, which will save you from having to consider what needs to be done and how to operate the application/service on the local machine. A simple command docker-compose up should start your application and all its dependencies, populate the database with fixtures, upload the local code inside the container, enable code tracing for immediate compilation, and eventually start responding to requests at the desired port. Even when you set up a new service, you don't have to worry about how to start, where to commit changes, or which frame to use. All of this should be described in advance in the standard description and specified by the service template for the different settings: Front end, back end, and worker.
Automated testing
?

All you want to know about the "black box" (As for why I call the container so much more information will be stated later in the article) is that everything inside it is intact, yes or no, 1 or 0. You can execute a limited number of commands inside a container, and Docker-compose.yml describes all of its dependencies, and you can easily automate and integrate these tests without paying too much attention to the implementation details.

For example, like this!

In this case, testing not only means unit testing, but also functional testing, integration testing, (code style) testing and replicas, checking for outdated dependencies, and whether the license to use the package is normal, etc. The point is that all of this should be encapsulated in a Docker image.
System Delivery
?

It doesn't matter when and where you want to install your project. The result is like the installation process and should be consistent. There is no difference as to which part of the entire ecosystem you want to install or which git repository you will get the code from. The most important component here is idempotent. The only thing you should specify is the variable that controls the installation process.

Here are some of the most effective algorithms I have to solve this problem:
Collect images from all dockerfiles (for example, like this)
Use the meta-project to deliver these images to kubernetes through the Kube API. Starting delivery typically requires several input parameters:
Kube API Endpoint
A "confidential" object that varies depending on the environment (Local/test/pre-release/production)
The name of the system to display and the label for the Docker image for those systems (obtained in the previous step)

As an example of a meta-project that covers all systems and services (in other words, a project that describes how ecosystems are organized and how updates are delivered), I prefer to use ansible playbooks to integrate with the Kube API through this module. However, complex automation can refer to other options, and I'll discuss my options in more detail later. However, you must consider the way in which a centralized/unified management architecture is managed. Such a way allows you to easily and uniformly manage all services/systems and eliminate any complications that may come from the upcoming technology and system jungle that perform similar functions.

In general, the following installation environments are required:
"Test"--used to perform some manual checks or debugs on the system
Pre-release-for near real-time environments and integration with external systems (typically in the DMZ rather than in a test environment)
"Production"--the actual environment of the end user

Continuity of integration and delivery

If you have a unified way to test Docker images-or "black boxes"-you can assume that these test results will allow you to seamlessly (and with a clear conscience) integrate functional branching into the upstream or main branch of your git repository.

Perhaps the only trading circuit breaker here is the order of integration and delivery. If there is no release, how do you block a "race condition" on a system through a set of parallel feature branches?

Therefore, the process can only be started without competition, otherwise the "competitive conditions" will linger on the mind:
Try to update the feature branch to upstream (git rebase/merge)
Building images from Dockerfiles
Test all the built mirrors
Start and wait until the system delivers the image that was built from step 2
If the previous step fails, the ecosystem is rolled back to its previous state
Merging feature branches upstream and sending them to the repository

Any failure in any step should terminate the delivery process and return the task to the developer to resolve the error, whether it be a failed test or a merge conflict.

You can use this procedure to manipulate multiple repositories. Instead of repeating the entire process for each individual repository (step 1-6 for code base A, step 1-6 for code base b, and so on), you only need to perform each step for all repositories at once (step 1 for code libraries A and B, step 2 for code libraries A and B, and so on).

In addition, Kubernetes allows you to roll out updates in batches for a variety of AB tests and risk analysis. Kubernetes is implemented internally by separating services (access points) and applications. You can always balance the old and new versions of the component at the desired scale to facilitate analysis of the problem and provide a way for potential rollbacks.
System rollback
One of the mandatory requirements of the architecture framework is the ability to roll back any deployment. This, in turn, requires some explicit and implicit nuances. Here are some of the most important things:
The service should be able to set its environment and roll back changes. For example, database migrations, RabbitMQ schemas, and so on.
If the environment cannot be rolled back, the service should be polymorphic and support older and newer versions of the code. For example, a database migration should not disrupt an older version of the service (typically 2 or 3 previous versions)
Backwards compatibility with any service updates. Typically, this is API compatibility, message format, and so on.

The rollback state in a kubernetes cluster is fairly straightforward (running Kubectl rollout undo Deployment/some-deployment,kubernetes will restore the previous "snapshot"), but for this feature to take effect, Your meta-project should contain information about this snapshot. But the more complex delivery rollback algorithms are daunting, although they are sometimes required.

Here's what you can do to trigger the rollback mechanism:
High percentage of application errors after publishing
Signals from key monitoring points
Failed smoke test
Manual mode-Human factor

Ensuring information security and auditing
No workflow can magically "build" a bulletproof security and protect your ecosystem from external and internal threats, so you need to ensure that your architecture framework is implemented in accordance with company standards and security policies at every level and in all subsystems.

I'll discuss all three levels of the solution in a later section on monitoring and alerting, which are themselves key to system integrity.

Kubernetes has a good set of built-in mechanisms for access control, network policy, event auditing, and other powerful tools related to information security that can be used to build a good perimeter to protect against and prevent XXX and data breaches.
From the development process to the architecture
The idea of tightly integrating the development process with the ecosystem should be carefully considered. Adding this integrated requirement to the traditional set of requirements for the architecture (elasticity, scalability, availability, reliability, protection against threats, etc.) can greatly increase the value of the architecture framework. This is a critical aspect, resulting in the emergence of a concept called "DevOps", which is a logical step towards fully automating and optimizing the infrastructure. However, if you have a well-designed architecture and a reliable subsystem, devops tasks can be minimized.
Micro-Service Architecture
There is no need to discuss the benefits of service-oriented architecture--soa in detail, including why services should be "micro". I would just say that if you decide to use Docker and kubernetes, then you probably understand (and accept) that the monolithic application architecture is very difficult and even a mistake. Docker is designed to run a process and persist, and Docker lets us focus on thinking within the DDD framework (Domain driven development). In Docker, the packaged code is considered a black box with some public ports.
Key components and solutions for ecosystems
Based on my experience in designing systems with higher availability and reliability, several components are critical to the operation of MicroServices, and I'll list and discuss them later, I'll refer to them in the kubernetes environment, or I can refer to my checklist as a checklist for any other platform.

If you (like me) come to the conclusion that these components will be managed as regular kubernetes services, then I recommend that you run them in separate clusters except for the production environment. The "Pre-release" cluster, for example, can save you time when the production environment is unstable and you desperately need the source of its mirrors, code, or monitoring tools. It can be said that this solves the problem of chickens and eggs.
Identity verification
8.png

As always, it starts with access-servers, virtual machines, applications, office mail, and so on. If you are or want to be a customer of one of the major enterprise platforms (IBM, Google, Microsoft), the access issue will be handled by one of the vendors ' services. But if you want to have your own solution, can it be managed only by you and within your budget?

This list helps you determine the appropriate solution and estimate the amount of work required to set up and maintain. Of course, your choice must be in accordance with the company's security policy and approved by the Information Security Department.
Automation Service Configuration
9.png

Although kubernetes requires only a small number of components on a physical machine/cloud virtual machine (Docker, Kubelet, Kube proxy, ETCD cluster), the addition and cluster management of new machines still need to be automated. Here are some simple ways to do this:
kops--This tool allows you to install a cluster on one of two cloud providers (AWS or GCE)
teraform--This allows you to manage the infrastructure of any environment and follow the ideas of IAC (infrastructure as Code)
ansible--for any type of universal automation tool

Personally, I prefer the third option (with a Kubernetes integration module) because it allows me to use the server and Kubernetes objects and perform any type of automation. However, there is nothing to prevent you from using Teraform and its kubernetes modules. Kops does not work well in "bare metal", but it is still a great tool to use with AWS/GCE!
Git code base and task tracker
10.png

For any Docker container, the only way to make its logs accessible is to write them to the stdout or stderr of the root process running in the container, and the service developer does not care about the next changes in the log data, mainly because they should be available when necessary and preferably contain records from a point in the past. All the responsibility for meeting these expectations lies with Kubernetes and the engineers who support the ecosystem.

In the official documentation, you can find a description of the basic (and good) policy for handling the logs, which will help you choose the service that aggregates and stores large amounts of text data.

In the referral service for the logging system, the same document mentions that FLUENTD is used to collect data (when it is started as a proxy on each node of the cluster) and elasticsearch for storing and indexing data. Even though you may not agree with the efficiency of this solution, I think it is at least a good start given its reliability and ease of use.

Elasticsearch is a resource-intensive solution, but it can be well extended and has a ready-made Docker image that can run on a single node and on a cluster of the desired size.
Tracking System
11.png

Even if the code is perfect, it does fail, and then you want to study them very carefully in a production environment and try to understand "what happened to the production environment if everything worked properly on my local machine?" ”。 For example, slow database queries, incorrect caches, slower disks or connections to external resources, transactions in ecosystems, bottlenecks, and low-scale computing services are some of the reasons you have to track and estimate code execution times under actual load.

Opentracing and Zipkin are sufficient to handle this task in most modern programming languages and do not add additional burdens after encapsulating the code. Of course, all the data collected should be stored in the appropriate place and used as a component.

The development standards and service templates described above can address the complexities of encapsulating code and forwarding "Trace IDs" through services, message queues, databases, and so on. The latter also takes into account the consistency of the methodology.
Monitoring and alerting
12.png

Prometheus has become the de facto standard in modern surveillance systems, and more importantly, it has gained out-of-the-box support on Kubernetes. You can refer to the official Kubernetes documentation for more information on monitoring and alerting.

Monitoring is one of the few ancillary systems that must be installed within a cluster, which is a monitored entity. But monitoring the monitoring system (sorry for some wordy) can only be done externally (for example, from the same "pre-release" environment). In this case, cross-checking can be a convenient solution for any distributed environment, which does not complicate the highly unified ecosystem architecture.

The entire monitoring range can be divided into three levels of complete logical isolation. Here's what I think is the most important example of a tracking point at each level:
Physical layer: network resources and their availability--disk (I/O, free space)--basic resources for a single node (CPU, RAM, LA)
Cluster layer:--the availability of the primary cluster system on each node (Kubelet, Kubeapi, DNS, ETCD, etc.)--the number of available resources and their uniform distribution--monitoring--pod reloading of available resources relative to the actual resources consumed by the service
Service Tiers:--any type of application monitoring--from database content to API call frequency--API the number of HTTP errors on the gateway--queue size and worker utilization--Multiple metrics for the database (replication latency, transaction time and quantity, slow requests, etc.)-- Error analysis for non-HTTP processes--monitoring sent to log system requests (can convert any request to a metric)

As for the alarm notifications at each level, I would recommend using one of the countless external services that can send notification emails, text messages or call phone numbers. I'll also mention another system--opsgenie--it is tightly integrated with Prometheus's Alertmanaer.

Opsgenie is an elastic alarm tool that can help with upgrades, 24x7 work, notification channel selection, and more. It's also easy to distribute alarms between teams. For example, different levels of monitoring should send notifications to different teams/departments: Physical--INFRA + Devops, cluster--devops, applications-each of the relevant teams.
API Gateway and Single sign-on
13.png

To handle tasks such as authorization, authentication, user registration (external users-corporate customers), and other types of access control, you need a highly reliable service to keep the elastic integration with API Gateway. There is no harm in using the same solution as identity services, but you may need to decouple both resources to achieve different levels of availability and reliability.

Integration of internal services should not be complex, and your services should not be concerned with the authorization and authentication of users and each other. Instead, the architecture and ecosystem should have a proxy service that handles all traffic and HTTP traffic.

Let's consider the best way to integrate with API Gateway, the entire ecosystem-the tokens. This approach applies to all three access scenarios: from the UI, from the service to the service, and from the external system. Next, the task of receiving the token (based on the login name and password) is done by the user interface itself or by the service developer. It makes sense to differentiate between the life cycle of tokens used in the UI (shorter TTL) and other conditions (longer and custom TTL).

The following are some of the issues that API Gateway solves:
Access ecosystem services from outside and inside (services do not communicate directly with each other)
Integration with single sign-on services: Token translation and additional HTTPS requests, header contains user identification data (ID, role, and other details) of the requested service-enable/disable access control to the requested service based on the role received from the single sign-on service
Single point of monitoring for HTTP traffic
Composite API documentation for different services (e.g., JSON/YML files for composite swagger)?
Ability to manage routing of the entire ecosystem based on domain and requested URIs
Single access point for external traffic, and integration with access providers

Event bus and enterprise integration/Service Bus
14.png

If your ecosystem contains hundreds of services that can work in a single macro domain, you will have to deal with thousands of possible ways of service communication. To simplify data flow, you should have the ability to distribute information to a large number of recipients when a particular event occurs, regardless of the context of the event. In other words, you need an event bus to publish and subscribe to standard protocol-based events.

As an event bus, you can use any system that can operate the so-called broker: RabbitMQ, Kafka, ACTIVEMQ, and so on. In general, high availability and consistency of data is critical to microservices, but because of the cap theorem, you still have to sacrifice something to make the bus properly distributed and clustered.

Naturally, the event bus should be able to solve various inter-service communication problems, but as the number of services increases from hundreds of to thousands of or even tens of thousands of, even the best event bus-based architecture will be prohibitive, and you will need to find another solution. A good example of this is the integrated bus approach, which expands the functionality of the "dumb-smart-consumer" strategy described above.

There are dozens of reasons to use the Enterprise Integration/Service bus approach, which is designed to reduce the complexity of service-oriented architectures. Here are a few of the reasons:
Aggregating multiple messages
Split an event into several events
Synchronization/transaction analysis for the system response of the event
interface coordination, which is particularly important for integration with external systems
Advanced logic for event routing
Multiple integrations with the same service (external and internal)
Non-extensible centering of data bus

As an open source software for the Enterprise Integration bus, you may want to consider Apache ServiceMix, which contains several components that are critical to the design and development of such an SOA.
Databases and other stateful services
15.png

Like Kubernetes, Docker has changed all the rules of the game over and over again for services that require data persistence and are closely related to disk. Some people say that services should "survive" in the old way on a physical server or virtual machine. I respect this view and will not talk about its pros and cons, but I am quite certain that this assertion exists only because of the temporary lack of knowledge, solutions, and experience in managing stateful services in a Docker environment.

I should also mention that databases often occupy the central location of the storage world, so the solution you choose should be fully prepared to work in a kubernetes environment.

Based on my experience and market conditions, I can differentiate between the following sets of stateful services and examples of the most appropriate Docker solutions for each service:
Database management System--postdock is a simple and reliable solution for PostgreSQL in any Docker environment
The queue/message Agent--RABBITMQ is the classic software for building Message Queuing systems and routing messages. The cluster_formation parameter in the RABBITMQ configuration is essential for cluster setup
Cache service--redis is considered to be one of the most reliable and resilient data caching solutions
Full-Text Search-The Elasticsearch technology stack I've mentioned above, originally used for full-text search, but also good at storing logs and any work with large amounts of text data
File storage Service-a generalized service group for any type of file storage and delivery (FTP,SFTP, etc.)

Dependent mirroring
16.png

If you have not yet encountered a package or dependency that you need to have been deleted or temporarily unavailable, do not assume that this situation will never occur. To avoid unnecessary unavailability and provide security for your on-premises systems, make sure that your build and delivery services do not require an Internet connection. Configure mirroring and replicate all dependencies to the internal network: Docker image, RPM package, source code base, python/go/js/php module.

These and any other types of dependencies have their own solutions. The most common can be Google search by querying "private dependency mirror for ...".
From architecture to real life
17.png

Whether you like it or not, your entire architecture is doomed to be unsustainable sooner or later. It always happens: technology is out of date (1-5 years), methodologies and methodologies--a bit slow (5-10 years), design principles and fundamentals--occasionally (10-20 years), but inevitably.

Considering the obsolescence of technology, you need to always try to keep your ecosystem at the peak of technological innovation, plan and launch new services to meet the needs of developers, business and end users, and promote new utilities and knowledge to your stakeholders to drive your team and your company forward.

Keep yourself at the top of the ecosystem by incorporating into the professional community, reading relevant literature and communicating with colleagues. Take note of new opportunities in the project and use new trends correctly. Test and apply scientific methods to analyze the results of the study, or rely on other people you trust and respect for conclusions.

Unless you are an expert in this field, it is difficult to prepare for fundamental changes. All of us will only witness major technological changes throughout our career, but not the amount of knowledge in our minds that makes us a professional and climbs us to the top, but the openness of our thinking and the ability to accept metamorphosis.

Back to the question in the title: "Is it possible to build a better architecture?" ”。 The answer is obvious: no, not "once and for all", but be sure to actively strive to some extent, in the future a "very short time", you will be successful!

The best technical combat from Docker to Kubernetes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More