What architects need to know about architecture optimization and design

Last Update:2016-05-10 Source: Internet

Author: User

Tags docker swarm etcd apache mesos

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

This translation was first published in a public number under Infoq: "Talk about architecture", thinking that I was nearly two years old in the garden, so I was resurrected. Hahaha, this is a translation of the architecture, will introduce a number of tools, as well as frameworks, to the architecture of interested students a knowledge expansion.

In recent years, with the rapid development of the Internet, new architectural practices continue to emerge, but one thing is immutable, that is-"the way of Architecture", on how to design a flexible, high-availability and can quickly adapt to change the system architecture, we still have a lot of space to play. This article describes several key points about how to build a cutting-edge, maintainable, and secure architecture, and you can use it as a guideline for system design or to verify that the existing architecture is justified.

As we often say: there is no best architecture, only the most appropriate architecture. A good architect can design the optimal architecture scheme based on the specific requirements, the resources and other factors. Especially now, the rapid change of the business, the data everywhere and so on these factors, the technology and the framework also need to be constantly polished and improved in the process of change to adapt to the new business needs. It might have been the best architecture at the time, but then we would have to follow the changes in the business to make improvements. It's not a bad thing, we just have to be prepared to respond to change.

Not related to code

The term architect is very broad, and some architects refer to the person who is responsible for writing certain modules of the software in the company. Of course, most companies do not have such a position, they will have some technical leader to take charge of specific functions. The architects we're going to talk about here don't focus too much on the details of the code, but more about how the system works together, interacts, and more globally. They focus on some of the parts that can often be forgotten but that can have a bad impact on the system, and the responsibility is to ensure that all functions are delivered in good quality and on time. Such people have a pivotal role in the success of software products, but they tend to be responsible for several projects at the same time within a company.

Imagine two different architects to build a spaceship. The first choice is to use paper to paste a look better, and then put the ship into a beautiful, small and suitable glass window to protect them. It may look like this in the following way:

The second architect decided to use the LEGO model to spell a spaceship, which could be assembled at will and more robust, so there was no need for extra special protection.

Both ships looked pretty good, but the first one took a long time to complete and later when they needed to make improvements to the ship, the problem was revealed.

The first architect was almost blown up, because every time the change was made, they had to remove the shield and rebuild a complete spaceship. Although they already had all the models, and the craft was already familiar to him, it took them a long time to complete each transformation, and a new protective cover was needed to install the new spacecraft.

But for the second architect, none of this is needed. He just needs to transform some of the components that have an impact, make new ones, and add them to the original ship when everything is ready.

Later, the second architect wanted to further optimize their production process, because they now devote a lot of time to it. After a period of research, they decided to try a new material and method to make the ship. That is, 3D printing, they applied for a 3D printer and made all the models so that they could automate some of the usual things through the 3D printer.

Of course, this is just a very simple example. But what can we learn from it? Although the two-bit architects were able to successfully complete the initial functions at the very beginning, they all faced changes that resulted in a system adjustment. In the integration phase, complexity begins to emerge, regardless of the initial goal, and ultimately the entire design is flexible, adaptable, and modular to play a critical role.

The architecture of the software is critical, and just having better code to complete the functionality is not enough to be an excellent solution. Because it's not just about the code, it's about how the modules we write are interactive and integrated, how the data is stored, how we develop and test it, how easy it is to introduce changes, and so on.

These things are not related to writing code, but we need to take the time to think about it and be the deciding factor for the success of the system at last.

the details to consider

There are also some principles such as: modular, light-coupled, no-sharing architecture, reduce the various components before the lazy, pay attention to the service of lazy all caused by the chain failure and impact, etc.

DDD gives us a guidance on how to split components on the basis of different domain-specific contexts and business functions.

Separate services to provide specific functionality, while facilitating the response to change without affecting other services.

In most cases, if you need to update a number of services synchronously, the system coupling is not low enough. Of course, there are exceptions to the principle of perfection. For example, when you want to deploy the system on some IoT devices, you may want to deploy all the components at once. This is permissible, however, consider the coupling and flexibility between services to address the need to deploy systems on different platforms.

Even so, it is impossible to avoid coupling altogether, and it will always appear in some scenarios. This requires that we extract some abstraction layers to make the interaction between services a contract to avoid complexity and increase flexibility. This requires a sense of discernment, the ability to find the functions that must be put together for processing and not disassembled. If these features are worth putting together, then we can separate it into a microservices, following the design reasons for high aggregation.

What we want to remember is that the system is designed to be easier to add or modify the original components. A stateless architecture is the cornerstone of high system scalability.

There is a particular need to pay attention to how services and components interact, understand the pros and cons of different protocols, including speed and availability, to help us decide which one is best for us.

infrastructure, configuration, testing, development, operations

Defining policies for configuration management is important because the simultaneous configuration changes affect all of the system, so it needs to be done by an automated update scenario at the global level.

Today, for a large, data-sensitive solution, without some automated infrastructure and robust development, testing, and deployment processes, it's the same as committing suicide. We need to spend a certain amount of time planning and preparing for development, testing, and production environments, and may require additional environments to prepare for a rainy day.

Testing processes and policies is also very important. Some best practices use blue-green development, canary Deployment, A/B testing, and more. Try to keep the test environment consistent with the production environment, at least the hardware structure should be the same. Be sure to do stress and load testing, and test it as quickly as possible on the production, so that you can help us find online problems faster and more accurately.

The scalable architecture also means that services can be flexibly and independently deployed, as well as simple basic operations.

leveraging the benefits of an immutable infrastructure

Immutable infrastructure, which means that once the system is deployed, the upgrade is no longer notconsistent. When a service/application needs to be upgraded, just deploy a new version of the system and destroy the old version. In this process, the external services of the system are almost continuous. (Note by the translator)

Ensuring that the package/continuous integration process is based on a unified approach and that no changes are made to the running service (such as disabling SSH), all updates should be applied to all corresponding systems through well-defined automated configuration and packaging operations to avoid configuration misses. For example, manually modifying the configuration on an environment may miss the configuration of other environments.

Development teams should not focus too much on infrastructure, because one day it is possible that infrastructure will change, so business-related development does not have to be overly tied to infrastructure.

something between the codes (in-between the code)

"In-between the code" can be a unified summary of the capabilities provided by some infrastructure, such as: Service discovery, request routing, network communication layer, agent, load balancing and so on. Many production errors are not caused by the business logic of the code or the problems of each individual component itself, but by some common infrastructure that coordinates the various components.

As the system changes more and more quickly, pay more attention to some of the components we have changed, considering availability and extensibility. Develop a minimum risk plan to address emerging issues.

The ubiquitous Pit

Be a paranoid freak. Architecting for failure-lists all possible failures. and team brainstorming to challenge all possible failures and then propose a protection plan.

What if the connection setup fails?
What if it takes longer than expected?
What if the request returns unclear data or an incorrect answer?
What if the data returned by the request is not a good deal?
What if there is a high concurrency response?
What if the service hangs, the unit, the entire data center hangs up?
What if the database is corrupted?
What if the deployment fails?
What if some features in the production environment fail after a successful deployment?
integration there are tens of thousands of possible errors, so how can we avoid them?

Some techniques such as fusing (circuit breaker), timeout setting (timeouts), handshake (handshaking), bulkhead (bulkheads) are used to help us protect these systems before they are integrated.

Fuse mode (circuit breaker) can refer to the circuit fuse, if one line voltage is too high, fuse will fuse, prevent fire. Placed in our system, if a target service call is slow or has a large number of timeouts, at this point, the call to fuse the service, for subsequent call requests, do not continue to invoke the target service, direct return, quickly release resources. The call resumes if the target service condition improves.

Bulkhead mode (bulkheads) This mode isolates a resource or failure unit like a bulkhead, and loses only one cabin if it breaks water. For example, using microservices is preferred, such as Docker. Docker is process-isolated, and a single Docker failure does not affect other Docker containers. Or the large parallel processing work, by multiple lines pool load sharing.

Of course, if it starts to work, it shows that our system has a big problem, we need to investigate the analysis.

Note the components that cannot see the code, the lazy items, and the shared resources. In addition to having the right development and testing process, you should try to test with the same data as the real production environment and the hardware network configuration.

Tracking the response of the system to prevent some of the more common problems such as service unavailability, pay attention to the average response time of the system, and when it is abnormal, need to find the cause and take action accordingly.

An automated platform for logging, monitoring, and system operation. Because microservices are relatively independent, it can be easier to detect failures so it's easier to monitor them. Some of the better ways to collect and analyze logs are to use correlation IDs, common log data formats, and so on. Note that the log data can be very large, so consider the time period of the log and define the archive for the log. There are also some good tools to visualize the data in the page and to see some important processes more visually.

Versioning of services is also important to ensure that service updates do not affect the use of clients. In some cases it is common to run different versions of the service at the same time, and we need to be ready for long-term backwards compatibility.

It's important to remember.

In most cases, we do not build from scratch, but continue building on existing systems, and existing systems have problems with development, operations, and architectural flexibility. A lot of good developers will want to dismantle and reconstruct the whole system when they go through this situation, but we need to do it carefully. It is also dangerous when the system is broken into components or service units in the wrong way.

Most systems are a single application at the outset, and are then continually disassembled into microservices. Here are some basic ideas that can be used as a reference when we do disassembly:

Understand specific business requirements and areas of business before you start splitting
Note Some of the features and data that are shared with other businesses that need to be properly modularized
This migration and upgrade is a step-by-step, 1.1-point way to complete, just do the right thing for the moment.
A good understanding of the scope and boundaries of the business area before you begin, because the adjustment to the boundary will be very expensive
What team adjustments will be involved with a clear structure for the transformation

The impact of people, teams, and organizations

This topic is so big that we need to write a special article to make a detailed statement. To summarize briefly here, the flexibility of the architecture and the robust development, testing, operation and other processes that we have mentioned in this article will affect the organizational structure of the enterprise. The right organizational structure gives the team greater flexibility and is more likely to continue to innovate, and in this organization, teams can work at their own pace.

Organizations should not split teams by technology, such as front-end teams, mobile teams, back-end teams, or split teams based on different technical languages, but should split teams according to MicroServices (also understood to be split by separate business units). This will include a variety of different technologies within a team that can be implemented in different languages, giving the team more freedom and autonomy.

How to practice?

Containerized and Clustered Tools

Docker
Docker Swarm
Kubernetes
Mesos
Serf
Nomad

Infrastructure Automation/Deployment

Jenkins
TerraForm
Vagrant
Packer
Otto
Chef, Puppet, Ansible

Configuration

Edda
Archaius
Decider
ZooKeeper

Service discovery

Eureka
Prana
Finagle
ZooKeeper
Consul

Routing and load Balancing

Denominator
Zuul
Netty
Ribbon
HAProxy
Nginx

Monitoring, tracking, logging

Hystrix
Consul Health Checks
Zipkin
Pytheus
Salp
Elasticsearch Logstash

Data protocol

Protocol buffers
Thrift
Json/xml/other text

Some of the tools described above

As this article involves a large number of open source components, the following is a brief list of some for reference (part of the content from the Internet).

Docker Swarm

Swarm was released in December 2014 to manage the Docker cluster and to expose it as a virtual whole to the user Dockercon. Its architecture and commands are relatively simple, and it also lowers the barriers to learning and using Docker enthusiasts who want to manage Docker clusters.

Kubernetes

Kubernetes is Google open source container cluster Management system, which provides application deployment, maintenance, extension mechanisms and other functions, using kubernetes can easily manage the cross-machine operation of containerized applications.

Apache Mesos

Apache Mesos is the first open source cluster management software developed by Amplab of the University of California, Berkeley, to support application architectures such as Hadoop, ElasticSearch, Spark, Storm, and Kafka.

Mesos Aurora

Aurora is also one of Apache's open source projects and a Mesos framework for long-running services and scheduled jobs. Aurora runs applications and services through a shared pool of machines, and is responsible for keeping them running for ever. When the machine fails, Aurora intelligently re-plans these jobs to a healthy machine.

Vagrant

Vagrant is a ruby-based tool for creating and deploying virtualized development environments. It uses Oracle's Open source VirtualBox virtualization system to create automated virtual environments using chef.

Packer

Packer is an open source tool for creating the same machine image from a single configuration source for multiple platforms. The platforms currently supported include Amazon EC2, Digitalocean, OpenStack, VirtualBox, and VMware.

TerraForm

TerraForm is a secure and efficient tool for building, changing, and consolidating your infrastructure. Use Go language development. TerraForm can manage existing popular services and provide custom solutions.

Consul

Consul is an open source tool launched by Hashicorp to implement service discovery and configuration for distributed systems. Unlike other distributed service registration and discovery scenarios, the consul scenario is more "one-stop", with built-in service registration and discovery Framework, distributed conformance protocol implementations, health checks, Key/value storage, multi-datacenter scenarios, and no need to rely on other tools (such as zookeeper, etc.). It's also easier to use. Consul is implemented with Golang and therefore has natural portability (support for Linux, Windows, and Mac OS X); the installation package contains only one executable file for easy deployment and works seamlessly with lightweight containers such as Docker.

Eureka

Eureka is a REST-based service that is primarily used for location services to enable load balancing and failover of middle-tier servers in the AWS cloud.

Ribbon

The Ribbon is an open source project for cloud mid-tier services from Netflix, and its main function is to provide customer-side software load balancing algorithms.

Zuul

Zuul is an edge service that provides dynamic routing, monitoring, resiliency, security, and more. Zuul corresponds to the front door of all requests from the Web site backend of the device and Netflix streaming app. Zuul can appropriately route requests to multiple Amazon Auto Scaling Groups.

Finagle

Finagle is a Netty-based, protocol-agnostic RPC framework developed by Twitter that supports Twitter's core services.

Zipkin

Zipkin is an open source project on Twitter that allows developers to collect monitoring data on Twitter's services and provide a query interface. The system allows developers to easily collect and analyze data through a Web front-end, such as the processing time of each request service, to easily monitor bottlenecks in the system.

Hystrix

The Hystrix is designed to provide greater fault tolerance for latency and failure by controlling the nodes that access remote systems, services, and third-party libraries. Hystrix has thread and signal isolation with fallback mechanism and circuit breaker functionality, request caching and request packaging (requests collapsing, i.e., automatic batching, translator note), and monitoring and configuration functions. Hystrix is based on the Flex engineering work launched by the Netflix API team in 2011, which currently handles tens of billions of of isolated threads and hundreds of millions of of isolated signal calls per day in Netflix. Hystrix is an open source library based on the Apache License 2.0 protocol, currently hosted on GitHub.

ZooKeeper

Zookeeper is a distributed, open source distributed application Coordination Service that is an open source implementation of Google's chubby and an important component of Hadoop and HBase. It is a software that provides consistent services for distributed applications, including configuration maintenance, domain name services, distributed synchronization, group services, and so on.

Etcd

ETCD is a highly available key-value storage system that is used primarily for shared configuration and service discovery. ETCD is developed and maintained by CoreOS and is inspired by ZooKeeper and Doozer, which is written in the go language and handles log replication through the raft consistency algorithm to ensure strong consistency. Raft is a new consistency algorithm from Stanford, suitable for the log replication of distributed systems, raft to achieve consistency through an election, in raft, any node can become leader. Google's container cluster management system kubernetes, the open source PAAs platform Cloud Foundry and CoreOS fleet are widely used ETCD.

Protocol buffers

Protocol buffers is a data description language developed by Google, similar to the ability of XML to serialize structured data for data storage, communication protocols, and more. It is not dependent on language and platform and is extremely extensible. At this stage, the official support of C + +, JAVA, Python and other three programming languages, but can find a large number of almost all languages of the third-party expansion package.

Through it, you can define the structure of your data and generate code based on various languages. These data streams that you define can be easily delivered without destroying the programs you already have. And you can also update this data and the existing program will not be affected by any.

Thrift

Thrift is a scalable software framework for the development of cross-language services. It combines a powerful software stack code generation engine to build services, work efficiently and seamlessly with C + + +,c#,java,python and PHP and Ruby. Thrift was developed by Facebook and we now use it as an open source software. Thrift allows you to define a simple definition of the data type in the file and the service interface. As an input file, the compiler generates code to easily build a seamless cross-programming language for RPC client and server communication.

Original Address http://lenadroid.github.io/posts/adjustable-flexible-architecture.html

What architects need to know about architecture optimization and design

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More