After seven years of development, Gilt.com has grown from a startup using Ruby on Rails to a mainstream e-commerce platform using the Scala microservices architecture. Gilt's limited-time snapping business model is based on a huge influx of customer visits in the short term to bid on certain limited luxuries. By using a microservices architecture, it provides a combination of scalability, performance, and reliability for our services, as well as autonomy, autonomy, and flexibility for our development teams. The team is free to select programming languages, frameworks, databases, and build systems to create services for core site functions, mobile applications, personalization algorithms, real-time data sources, and notification functions.
With the explosion of software services, the problem with it is how to deploy and run the code that we created. In the early days of Gilt's development, their software ran on "bare metal" and was deployed in one, then upgraded to two traditional data centers. The team needs to request new hardware settings, which are then deployed through a hybrid use of custom scripts and Capistrano. This is a way to work, but it also poses a lot of difficulties. The first attempt at virtualization is manifested in poor performance at peak times of daily visits, where a large number of services are eventually intertwined in the same physical environment, leading to performance anomalies and sometimes downtime due to lack of separation of resources. At peak times, a large-scale deployment of a service consumes multiple threads per request, causing the entire Web site to become paralyzed.
We have tried a variety of technologies and have created tools to try to mitigate this problem by testing and continuous deployment in a production environment. But research has found that the way we allocate and automate provisioning of services on hardware in data centers has proven to be a very time-consuming and difficult way.
So we're starting to think about moving the microservices infrastructure on a massive scale to the cloud, primarily the Amazon Web Services (AWS) environment, and refining the concept of an immutable deployment (immutable Deployment), It is inspired by the experience of immutable variables that we used in functional programming. We have upgraded our own data center-based tool "Ion-cannon" to a set of open source tools, "Ion-roller", which drives us to develop the Ion-roller tool:
- Allows teams to declaratively specify how their services are deployed to AWS, such as the minimum or maximum number of nodes available, the size of each instance, and so on.
- Provides a pipeline that deploys services and applications as immutable Docker images to a production environment, supports staged publishing, and enables the transfer of access from older versions of services to new versions of services in a phased manner.
- Support for fast rollback at large releases enables a set of instances that are already in the "hot" state to remain in the previous version.
In this article, we will talk about some of the core concepts in Ion-roller, as well as related techniques and approaches to implementation, and briefly describe how it is used.
Ion-roller, cloud-based MicroServices kingdom
We begin by imagining the microservices world as an HTTP endpoint built on a separate, immutable environment, accessed through the rest API.
HTTP end Point
Gilt uses HTTP as the transport encapsulation for inter-system communication, and for users, the HTTP endpoint behaves as a combination of hostname and port (there is also a discovery layer above it). The idea is that when software and configuration in an organization evolve over time, this endpoint is used as an organization-level concept.
Although this concept is simple, many practical goals can be achieved. With reasonable application of proxy and/or network configuration, we are able to provide the following features:
- Gradually release the new software version
- Safely roll back to previous versions of software or configuration
- Detection of abnormal conditions through error rate and latency monitoring
- The ability to understand which software provides endpoints
- Ability to learn about software or configuration changes over time
Configuration is better than code when describing the environment
In the case of very small operations, requiring unrestricted flexibility in the deployment process appears to be a legitimate claim. But as the scale increases, this approach quickly becomes unmanageable, manageable, and understandable.
Declarative configuration is a useful tool for describing and managing deployment, and it can be managed in a structured style. From a configuration standpoint, the tasks it can accomplish are more limited than the code, but it is also possible to analyze and report requests.
Risk Management in software deployment
Software releases include tradeoffs: flexibility and simplicity, as well as speed and security.
The direction of most production environments is to move to a new deployment process that allows users to see these changes within a shorter period of time after feature development or bug fixes are complete. This also indicates a higher chance of making changes to the production environment. This strategy also has its drawbacks because of the risk inherent in each release. To counteract this disadvantage, the team needs to take some release processes that mitigate the risks in each release, making each release a daily, secure operation.
Risks are expressed in several ways: the number and severity of exceptions, the number and proportion of affected users, and the length of time to recover from a failure.
The release process is not as high as the development process for the number of exceptions and the severity of the exception, but frequent releases often make it faster to find the cause of an exception (because the number of variables in each publication is reduced).
All we can do is reduce the number of affected users and reduce the time to repair. Methods such as testing on a production environment, partial release, and Canary Deployment (Canary release) are applicable to this scenario. If you can test a new publication without access in any production environment, you can minimize the impact (even 0), but the number of exceptions that can be found in this way is minimal. Because many exceptions occur only during regular access, a canary machine (a single instance of running a new software) is a valuable tool to understand whether a new release is suitable for use. So far, the impact on the user is still very small, depending on how you shard the service. Maintaining an incremental publishing process minimizes the impact of bugs on users by minimizing the affected proportions.
The time to recover from a failure is a major benefit of the immutable software approach, as the old version of the software remains "usable". Once an anomaly is found, it is possible to return to the previous version in the shortest possible time with minimal risk. We will further discuss the immutable deployment later.
Expected state and actual state of the software in operation
We want to continuously monitor the difference between the expected operational state of the system behind an endpoint and the actual running state. If this difference does exist, then we should (gradually) get the actual state closer to the desired state. We continue to perform extensive testing of the running environment (including running software, load balancer settings, DNS, and so on), and if any of them are not able to meet the requirements, replace them.
For example, suppose we know that access should be handled by a particular version of the software, but that none of the servers running this version are available (for any reason, including someone accidentally removing those servers), then the deployment should be requested.
Non-variable deployment
The idea of "immutable deployment" is to deploy software through a defined, easy-to-understand configuration, and this configuration does not change over time. Software and configuration are a natural component of the release, and you cannot change the software by modifying it in some places in the data and then restarting it, but instead regenerate a copy of the software to include the updated details.
This idea produces a more predictable environment and can roll back to an older version of the software without restarting when a new release proves "problematic." (Even if you can find the old version of the software, in some cases it is a time-consuming task to fully start it, because it may require work such as loading the cache.) )
In a small-scale environment, there may be only a small number of running servers, so it may be possible to discover exceptions in a new publication only after the publication has completely ended. In traditional non-immutable publishing scenarios, all older versions of the software have been replaced.
The implementation of the immutable deployment method in our current service still has a little deficiency. Because all environments are built on demand, the build time depends on certain factors, such as the time to start a new EC2 instance, and the time to download the appropriate Docker image. In contrast, some other deployment tools require the presence of a machine pool that does not require a machine to be built before the software is installed. So we chose to sacrifice some low latency features for rollback, so there is a high latency for initial startup after the software is released.
Docker image
In a microservices environment, it is a very practical concept to abstract the choice of tools and programming languages from the deployment environment. It allows individual teams to have more autonomy when developing software. While there are many options, from OS-specific package management systems such as RPM, to cloud-specific platforms such as Packer, we ultimately decide to use Docker to achieve this goal. Docker provides a high level of flexibility in choosing the deployment environment and strategy, and is very much in line with our needs.
Reinvent the wheel?
When we look for infrastructure that is easy to manage based on immutable Web servers, the choices we find are very limited. Most of the existing software is optimized for software updates, but this is not an update that we are interested in. We also want to be able to manage the entire lifecycle of Web server traffic, but we don't find any choice to optimize this task.
Another consideration is whether to ask developers to learn and understand the value of an existing deployment toolset, or to deliver greater productivity through a conflict-free, streamlined tool that allows us to deploy a particular version of software into a production environment. We believe that developers should be able to simply release the software without having to understand the complex underlying mechanism (while preserving the possibility of customization, which is required in some high-level scenarios).
No, no, no, No. Puppet, chef and other tools
Many tools take a machine-centric perspective, and the configuration is done on a per-machine basis. We take a broader view of the system state (for example, we need four copies of a piece of software to provide services), and we don't care too much about individual machines. We took advantage of the high-level concepts provided by AWS, such as replacing machines that were always failing during machine run-time checks, and the ability to dynamically scale the HTTP endpoint's traffic as it soared, among other things.
Why not choose Codedeploy
Codedeploy is a set of software release management systems provided by Amazon, but it does not meet our needs to support immutable infrastructure. Although it's easy to build scripts, use Codedeploy to publish software, but you need to build your environment beforehand. In addition, the Codedeploy itself does not have the built-in capabilities to deploy Docker images.
Why not choose a simple elastic Beanstalk
Elastic Beanstalk provides a variety of features for creating environments that allow you to run Docker images and create a large number of supporting systems, such as EC2 instances, load balancers, auto-scaling groups (varying the number of servers based on traffic). It also allows access to log files and can be administered to a certain degree by these systems.
However, its support for the concept of "immutable deployment" is very limited, and multiple releases of a part of the software will bring a significant amount of traffic to the same user-visible endpoint, and then gradually transfer the traffic. For this, the only capability it supports is "Switch CNAMEs", which is a very coarse way to transfer traffic because all traffic is instantly transferred to the new environment. In addition, because of the nature of DNS, it is also problematic in terms of reliability, because DNS lookup results are still cached for a period of time after DNS changes, and some bad clients ignore the DNS TTL value, causing the traffic to be sent to the old environment for a long time.
In addition, Elastic Beanstalk does not provide a high-level structure to help you understand what is running in the production environment behind an endpoint. "What is running and how are they configured?" The problem is not easy to answer and requires some high-level system assistance.
We decided to use elastic beanstalk as a practical way to deploy Docker software, but needed to be properly managed at a level and controlled at its level to provide a complete workflow for the user.
Introduction Ion-roller
Ion-roller is a set of services (including APIs, Web apps, and command-line interfaces) that leverages the capabilities of Amazon's elastic beanstalk and its underlying cloudformation framework to deploy Docker images to EC2 instances.
To get started with Ion-roller, you need to do some simple things.
The following are the steps to start the deployment software process:
- Prepare an AWS account and authorize it in Ion-roller, which enables you to launch instances and access resources (if your organization uses multiple AWS accounts to run the software, Ion-roller can provide a single deployment perspective for all of these accounts. Just have enough permissions to do so).
- Docker image provided by a Docker registry (optionally hub.docker.com, or using a private Docker Registry service)
- Prepare the deployment specification for the software, including the HTTP endpoint, the number and type of EC2 instances, runtime parameters for the Docker image, environment variables, security settings, and so on. They will be provided as part of your service configuration and submitted to Ion-roller through the rest API.
If you would like to know the details and complete documentation for installing Ion-roller with your AWS account, please look forward to the open source project Https://github.com/gilt/ionroller.
Deploying software using Ion-roller
You can start the deployment process simply by using the built-in command-line tool:
Ionroller Release <SERVICE_NAME> <VERSION>
This tool can provide immediate feedback during the release process:
[INFO] Newrollout (Releaseversion (0.0.17)) [INFO] Deployment started. [INFO] Added Environment:e-k3bybwxy2f[info] Createenvironment is starting. [INFO] Using elasticbeanstalk-us-east-1-830967614603 as Amazon S3 storage bucket for environment data. [INFO] Waiting for environment to become healthy. [INFO] Created security group Named:sg-682b430c[info] Created load Balancer NAMED:AWSEB-E-K-AWSEBLOA-A4GOD7JFELTF
You can also choose to trigger the deployment process programmatically, Ion-roller provides a set of rest APIs that give you complete control over your configuration and publishing process.
Behind the system, Ion-roller triggers a elastic beanstalk deployment process that takes full advantage of its capabilities, including creating a load balancer, security and auto-scaling groups, setting up Cloudwatch monitoring, and from a Docker Get a specific Docker image in the registry.
Access redirect
Once Ion-roller detects a successful deployment, it can safely gradually transfer access from the old version to the new version of the service.
The redirection of the access is achieved through the modification of the EC2 instance, which responds to the HTTP request through the load balancer. As the release process progresses, the newly deployed instances are gradually added to the load balancer's configuration, and the previously deployed instances are gradually removed. The start time of this process is configurable.
Progressive access redirection allows you to monitor recent releases, quickly detect failures, and roll back if necessary.
Related Vendor Content
Java Developer: Test Your skills! Participate in IBM Code Rally 2015 Yes, the latest technical information on AWS (official authorization) is here! Want to take care of cloud computing solutions? Contact AWS Professional Advice! Rich and capricious: Sign up for AWS $25 Discount Coupon How to build a cloud computing architecture from one user to tens of thousands of users?
Related Sponsors
The InfoQ AWS Zone brings together the best of AWS content and the latest information !
Software rollback
Because the old environment is still accessible during the release process, the old version of the software is still running, so we can safely roll back the software to an older version. Unused old instances are removed after a period of time (configurable), so we still have the option of rolling back for a period of time after the release is complete, due to the existence of this delay.
Our goal is to continuously monitor the operation of the endpoint and automatically roll back to the old version if an exception is detected. We will use Amazon's Cloudwatch warning feature to send a signal that a rollback operation is about to take place.
Manual rollback is simple, run the following script:
Ionroller Release <SERVICE_NAME> <PREVIOUS_VERSION>
If the old instance is still available, it takes only a few seconds for the access to revert back to the old version. Of course, if you do not make any operation that makes the old version of the software unusable (such as updating the schema of the data storage system, etc.). If you want to rely on this ability to roll back the software, it's important to keep this in mind, no matter what kind of deployment system you're using.
Canary launches and tests in production environments
Ion-roller supports the concept of canary Publishing by configuring the access transfer process. When a new version is deployed to some initial instance, the process stops and some release tests are performed on the access in the production environment. After a configurable amount of time, the release process will continue to advance.
Testing in a production environment
For some use cases where you want to be able to test (or demonstrate) the new software without actually accessing it, we need ion-roller to build a separate HTTP endpoint that can process the request before the endpoint of the production environment is updated.
Keep an eye on system conditions-Ion-trail
Ion-roller is able to see the environment behind an endpoint and software changes over time, and it records events related to the environment life cycle or deployment activity during monitoring or making changes to the environment. This function can be used to realize the auditing, monitoring and reporting functions of the system. Ion-trail is a supportive service that provides an event source for all recorded deployment activities.
Conclusion--a centralized DevOps
Hopefully readers will be able to understand the idea of deploying a microservices architecture to AWS after reading this article. Ion-roller allows us to centrally implement the entire DevOps organization and make it easier for engineers to master through declarative deployment. Ion-roller allows us to perform staged releases and Hot rollbacks, and supports features such as "Canary release" and "testing in a production environment". For more information, please follow gilt tech blog (http://tech.gilt.com) and we will soon announce the Ion-roller Open source project on our blog.
About the author
Natalia Bartol had been in IBM as Eclipse Support Engineer , working on improving developer tools, then Zend Technologies as Eclipse developer and team leader. Now she is a software engineer at Gilt, focusing on the deployment of microservices and improving developer productivity. Natalia has a Master of Science certificate in software Engineering from Poznan University of Science and Technology.
Gary Coady is from Gilt Groupe, a senior software engineer, works on automating deployment, writing build tools, and imparting Scala technology. He previously worked as a site reliability engineer at Google, and spent a lot of time profiling, managing, and handling operational anomalies in large-scale environments. Gary holds a bachelor's degree (MOD) in computer Science from the University of Dublin at St. 31 College.
Adrian Trenaman is senior vice president of engineers at Gilt.com in Dublin, Ireland. He holds a PhD in computer science from the University of Macon, Ireland, and a diploma in business development from the Irish School of Management and a bachelor's degree in computer science from the University of Dublin at St. 31. Hons).
View English text: Deploying MicroServices to AWS at Gilt:introducing Ion-roller
All say the programmer's high wages, but very little understanding of their overtime pain, you are not every time in the mind, according to the time to reduce the wages are less, so will want to cry in the heart, or raise wages, or raise wages, or raise wages, why?? Because don't let us work overtime, this is impossible!!!
Want to subvert your work model? Want to reduce your overtime? Join us and explore the free mode of our programmers!
A native app for programmers to share knowledge and skills as a reward for online interactive interactive platform.
We have a top technical team of nearly 20 people, as well as excellent product and operations teams. Team leaders have more than 10 years of experience in the industry.
Now that we are recruiting the original heroes, you will be working with us to change the way programmers work and change the world of programmers! There will also be generous rewards. As our original participant, you will experience this programmer artifact with us, you can offer professional advice, we will adopt it humbly. Everyone will be a hero, and you will be the hero we need! You can also invite your friends to participate in this heroic recruiting interaction.
We will not delay you too much time, we only need your professional opinion, as long as you take 1 hours from one months, you can save two hours a day, everything is for our own!
To? Or not?
Connector Person code: 1955246408 (QQ)
Gilt how to deploy microservices to an AWS environment, introducing Ion-roller