300 million Docker container deployment challenges and Solutions

Source: Internet
Author: User
Keywords Cloud computing Docker ironworker container

Editor's note: Ironworker is a developer-oriented task queue service that allows developers to schedule large-scale tasks without setting up and managing any infrastructure. A few months ago, Iron began experimenting with Docker, which has already deployed more than 300 million Docker containers, and this article shares the challenges, solutions, and benefits of Ironworker's use of the Docker based infrastructure. The following is the original text:


Ironworker is a task queue service that allows developers to schedule large-scale tasks without having to set up and manage any infrastructure. When we launched this service more than 3 years ago, we used a LXC container that contained all the languages and code packs to run the task. Docker enables us to easily upgrade and manage a set of containers, providing customers with more locales and installation packages.

We have just started using the v0.7.4 version of Dokcer, and encountered some difficulties with the process (not normal shutdown is a big problem, but has been resolved), we have successfully overcome all the difficulties, and found that Docker not only to meet our needs, More than we expected. So we're promoting the use of Docker in our infrastructure. Based on our experience, it makes sense to do so.


Advantage

Here are a few of the Docker advantages we realize:

Updating maintenance mirrors is easy

Doker uses a very powerful approach like git to manage image, making it easy to manage a large and changing environment, and his image layering system not only saves space but also allows us to have finer-differentiated images.

Now that we are able to keep pace with the rapidly updated language, we are able to provide specialized, for example, a new ffmpeg stack designed specifically for media processing. We now have up to 15 different stacks and are expanding rapidly.

Resource allocation

The LXC container is the operating system-level virtualization approach, where all containers share the system kernel, but each container can be constrained to use the specified resources, such as CPU, memory, and I/o.docker to provide rest APIs, environment versioning, fetch/submit mirroring, and easy access to statistics. Docker supports the use of cow file systems for more secure isolation of data. This means that all changes to the file in the task are stored separately and can be purged with one command. LXC cannot track this change.

Dockerfiles makes integration simple

Our team is all over the world. As long as you release a simple dockerfile can work, when you rest, you can ensure that other work people can produce the same image as you. Overcoming the difficulties of people from different places have different schedules. A clean mirror allows it to be deployed and tested faster. Our iterations are faster and everyone in the team is happier.

Growing communities

Docker is updated very quickly, even faster than chrome. More importantly, there is a significant increase in the number of communities involved in adding new features and fixing bugs. Whether it's contributing to mirroring or contributing to docker, or even contributing to Docker's peripheral tools, a large number of smart people are working on it, so we can't stay out of it. We find that the Docker community is very active and meaningful, and we are delighted to be one of them.

Docker + CoreOS

We are also in the exploratory phase, but we find that the combination of Docker and CoreOS seems to be a better choice for us. Docker provides stable mirroring management and containers. CoreOS provides a streamlined cloud operating system, machine-level distributed orchestration, and virtual state management. This combination concerns different aspects of the problem and is a more reasonable infrastructure stack.

Challenge

Each server-side technology needs fine-tuning and customization, especially on a large scale, and Docker is no exception. (For example, we run less than 50 million of the task, calculate one months and 500,000 hours, and constantly update our mirrors). Here are some of the challenges we encounter when using a large number of Docker containers:

Backward compatibility Not enough

The rapid innovation in this field, though an advantage, has its drawbacks. One of them is backward compatibility difference. In most cases, the main problem we encounter is command-line syntax and API changes, which are not a serious problem from a product perspective.

In some cases, however, it affects operational performance. For example, in any Docker error that is thrown after any startup container, we want to parse the STDERR and respond based on the type of error (for example, Retry). Unfortunately, the wrong output format changes with different versions and has to be debugged in changing results, making us very tired.


Docker Error Rate

This problem is relatively easy to solve, but it means that each update has to be validated multiple times and you need to keep developing until the newer version is released to most of the system. We used v0.7.4 a few months ago, and now our system is updated to v1.2.0. We have made a great progress in this field.

Limited tools and libraries

Although Docker has a stable version released four months ago, some of the tools around it remain unstable. Using most of the tools in the Docker ecosystem means more effort is needed. In order to use the latest features, fix bugs, you need someone on your team to stay up late and make frequent changes to these features. In other words, we are pleased to have some Docker around the tools in development, and are looking forward to having a tool in them to stand out. We are more optimistic about Etcd,fleet, kubernetes.

Overcome difficulties

And then, based on our experience, we'll talk more deeply about the problems we're dealing with and how we're going to solve them. The list of issues comes mainly from our Ironworker chief development and engineering operations director Roman Kononov and Sam Ward, who has been debugging and standardizing our Docker operations.


An exception in debug

Explain that when we encounter problems related to Docker or other systems, we can automatically rerun the task and have no effect on the user (retry is the built-in feature of the platform).

Delete operation time is long

The container was initially deleted for a long time, requiring too much disk I/O operations. This leads to a noticeable slowdown in our system and creates bottlenecks. We have to increase the number of available kernels, which is far more than we need.


Quick removal solution for Docker container

By studying the use of Devicemapper (a Docker file system driver), we found that setting an option has the effect of '--storage-optdm.blkdiscard=false ', which tells Docker Skipping time-consuming disk operations while deleting a container greatly accelerates the container's shutdown process. When you modify the delete script, the problem is gone.


Volume cannot be unloaded

The container did not stop correctly because the Docker did not unload the volume reliably. This causes the container to run forever, even if the task has been completed. The solution is to display the call to the user himself to write some column scripts to unload the volume and delete the folder. Luckily, this problem was encountered when we used the Docker v0.7.6 version, and when Docker v0.9.0 solved the problem we removed the lengthy scripts.

Memory Limit Switch

Docker the memory throttling option is suddenly added to one of the released versions, removing the options in LXC. As a result, some worker processes reach the memory bounds, and then the whole does not respond. This took us by surprise, because even with the settings it didn't support, Docker did not make a mistake. The workaround is simple, which is to set the memory limit inside the Docker, but this change caught us off guard

Future Plan

As you can see, we are devoting a lot of money to Docker, and we will continue to invest every day. In addition to using it to isolate code that users run in Ironworker, we are also ready to use it in other areas.

These areas include:

Ironworker Backstage

In addition to using Docker as a container for tasks, we are using it to manage the incoming calls for managing and initiating tasks running on each server. The main task of each process is to take a task from the queue, put it in the appropriate Docker container, run, monitor, and then delete the environment. Interestingly, we have containerized code on the same machine to manage other containers. Putting all our infrastructure environments into Docker containers makes it easy to run on CoreOS.

Ironworker, Ironmq, and Ironcache APIs

Like other OPS teams, no one likes to deploy. We are very excited to be able to package all our services in Docker containers and then deploy them in a simple and definite manner. No more server configuration. All we need is a server that can run the Dokcer container. We are replacing our server structures, using the Docker container to build the environment for our products on the server. Become flexible, simple, and have a more reliable stack of protocols.

Build and load Programs

We are also building and loading programs in ironworker with the Docker container. A notable improvement has been the process of creating, uploading, and running tasks for the user on a large scale, specific task load, and workflow. Another benefit is that users can test the program locally, and the test environment is consistent with our production services.

Enterprise Internal Deployment version

Using Docker as the primary distribution method, the IRONMQ Enterprise internal deployment version simplifies our distribution and provides a simple, generic approach that can be deployed in almost any cloud environment. Like the services we run on a shared cloud, customers need servers that can run Docker containers, and they can relatively easily get cloud services from multiple servers running in a test or production environment.

Original link: Docker in Production-what We ' ve learned launching over technologists containers (Compile/Wang revisers/Zhou Xiaolu)

If you need more information about Docker or technical documentation to access the Docker technology community, if you have more questions, please put it in the Dcoker Technical Forum and we will invite experts to answer. Purchase tickets and other issues can be consulted QQ group: 303806405.


Container Technical daily public account has been opened, welcome attention!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.