This is a creation in Article, where the information may have evolved or changed.
Introduction: All the services have started the container upgrade, in all the main idea of the container, the non-state service container has been the trend, often plagued the architect's question is whether the database needs to be containerized, the author Mikhail Chinkov put forward his own negative views, Translated by a highly available architecture.
If we look at the 2017 technology industry, containers and Docker will still be the hottest buzzwords. We started to package developed software in Docker containers for each area. Container technology is being used from small startups to huge microservices platforms. From CI platform to Raspberry Pi. From the database to ...
Database? Are you sure you want to put the database in a container?
Unfortunately, this is not a fictional scene. I see many fast-growing projects that persist data into containers. And put the compute service and data service on the same machine. I hope that people with experience will not use this solution.
Here is my point of view, database containerized from today is very unreasonable.
7 reasons why the database is not suitable for containerized
1. Data not secure
Even if you want to put Docker data on the host to store it, it is still not guaranteed to lose data. The design of the Docker volumes provides persistent storage around the Union FS mirroring layer, but it still lacks assurance.
With the current storage driver, Docker still has a risk of being unreliable. If the container crashes and the database is not shut down correctly, the data may be corrupted.
2. Environment requirements for running a database
It is common to see DBMS containers and other services running on the same host. However these services are very different to hardware requirements.
Databases (especially relational databases) have higher requirements for IO. The general database engine uses a dedicated environment to avoid concurrent resource contention. If you put your database in a container, the resources for your project will be wasted. Because you need to configure a large number of additional resources for the instance. In the public cloud, when you need 34G of memory, the instance you launch must have 64G of memory open. In practice, these resources are not fully used.
How to solve? You can design hierarchically and use fixed resources to launch multiple instances of different levels. Horizontal scaling is always better than vertical scaling.
3. Network problems
To understand the Docker network, you must have a deep understanding of network virtualization. must also be prepared to deal with unexpected situations. You may need to make bug fixes without support or with no additional tools.
We know that databases require dedicated and durable throughput to achieve higher workloads. We also know that a container is an isolation layer behind the hypervisor and the host virtual machine. However, the network is crucial for database replication, which requires a 24/7 stable connection between master and slave databases. Unresolved Docker network issues are still not resolved in version 1.9.
Putting these issues together, containerized makes database containers difficult to manage. I know you are a top engineer, and any problem can be solved. But how much time do you need to solve the Docker network problem? Wouldn't it be better to put a database in a dedicated environment? Save time to focus on really important business goals.
4. Status
It's cool to pack a stateless service in Docker to orchestrate the container and solve a single point of failure. But what about the database? Put the database in the same environment, it will be stateful, and the scope of the system failure is greater. The next time your application instance or application crashes, the database may be affected.
5. The database is not suitable for use with major Docker features
Considering the database in the container, let's think about its value. Let's take a look at the official Docker definition:
Docker is an open platform for developers and system administrators to build, distribute, and run distributed applications. Docker includes Docker Engine (portable, lightweight runtime and packaging tools) as well as Docker Hub, a cloud service for sharing applications and automating workflows, enabling applications to quickly assemble components and eliminate differences between development, QA, and production environments. As a result, IT can distribute programs faster and run the same applications on laptops, data center VMS, and any cloud.
Based on this answer, we can easily define the main features of Docke R:
Easy to build new environments
Easy redeployment (continuous integration)
Easy horizontal scaling (from practice)
Easy to maintain consistent environment
Let's start thinking about how these functions fit into the database world.
Easy to set up the database? Let's see if there's a huge difference in whether the database is containerized or run locally.
Docker run-d mongod:3.4
Contrast:
sudo apt-key adv--keyserver hkp://keyserver.ubuntu.com:80--recv 0c49f3730359a14518585931bc711f9ba15703c6
echo "Deb [arch=amd64,arm64] http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 Multiverse" | sudo tee/etc/apt/sources.list.d/mongodb-org-3.4.list
sudo apt-get update && sudo apt-get install-y mongodb-org
Easy to build a new environment? If we talk about being a MongoDB cluster-it might be more efficient to containerized. But what about the configuration management system? They are designed to solve configuration problems by running a command. With Ansible you can easily set up dozens of Mongo instances. As you can see, there is no significant increase in value.
Easy to redeploy? How often do you redeploy the database to the next version? The database upgrade is not an availability issue, but an engineering problem (that is, availability in the cluster). Think about how your application will use the new version of the database engine. Problems that may occur when the engine is replaced.
Easy to scale horizontally? Do you want to share the data catalog across multiple instances? Are you not afraid of direct data concurrency problems and possible data corruption? Isn't it safer to deploy multiple instances using a dedicated data environment? Finally make a master-slave copy?
Easy to maintain the same environment? How often does the database instance environment change? Do you upgrade your operating system every day? Or is the database version or dependent software changing frequently? Or is it not easy to reach a consensus with the engineering team?
At the end of the month, there is not a single feature that allows me to consider database containerized.
6. Additional isolation is detrimental to the database
In fact, I mentioned this in the 2nd and 3rd reasons. But I put this as a separate reason, because I want to emphasize that fact again. The more isolation levels we have, the more resources we have to spend. Easy horizontal scaling allows us to gain more benefits than a dedicated environment. However , horizontal scaling in Docker can only be used for stateless computing services, not for databases.
We don't see any isolation capabilities for the database, so why should we put it in a container?
7. Non-Applicability of cloud platforms
Most people start projects through a common cloud. The cloud simplifies the complexity of virtual machine operations and replacements, so there is no need to test new hardware environments at night or on weekends without people working. When we can quickly launch an instance, why do we need to worry about the environment in which this instance is running?
This is why we pay a lot of money to cloud providers. When we place a database container for an instance, the convenience mentioned above does not exist. Because the data does not match, the new instance is not compatible with the existing instance, and if you want to restrict the instance from using a stand-alone service, you should let the DB use a non-containerized environment, we just need to preserve the ability to stretch for the compute service layer.
Do these 7 points apply to all databases?
Maybe not all, but it should be all databases that require persistent data, and all databases that have special hardware environment requirements.
If we use Redis as a cache or user session store-there should be no problem with containers. There is no risk of data loss because there is no need to ensure that the data is landed. But if we consider using Redis as a persistent data store, then you'd better put the data out of the container, even if you constantly refresh the RDB snapshot, it can be complicated to find this snapshot in a fast-changing compute cluster.
We can also talk about the Elasticsearch inside the container. We can store the indexes in ES, and they can be rebuilt from a persistent data source. But look at the requirements! By default, Elasticsearch requires 2 to 3GB of memory. Because of the Java GC, memory usage is inconsistent. Are you sure that the Elasticsearch is suitable for use in resource throttling containers? Isn't it better to have different Elasticsearch instances using different hardware configurations?
Do not worry about database containerized for the local development environment. You will save a lot of time and effort by placing the database in a container in your local environment. You will be able to replicate the production environment operating system. Native postgres for OS x or Windows is not a 100% compatible Linux version. You can overcome this problem by setting the container on the host operating system instead of the package.
Conclusion
The hype of Docker should be cold one day. This does not mean that people will stop using container virtualization technology, but rather that we need to bring the value of the container design to the top consideration.
A few days ago I saw a speech about how the framework survived in the messy Ruby world. The inspiration I get from this speech is the technology hype cycle, which is borrowed from this hype cycle, and we see that Docker is currently in the second phase (full of expected peaks) for too long (high-availability architecture: see Resource 1), and when we see Docker in the last phase, the situation will normalize. I think we need to be responsible for this process, and we should speed up the process.
Reference Resources
https://www.youtube.com/watch?v=9zc4DSTRGeM#7m40s
English Original: https://myopsblog.wordpress.com/2017/02/06/why-databases-is-not-for-containers/
Recommended Reading
A technical man's year-end celebration: 9 veterans summarize and think about 2016
One click to build a deep learning platform based on Docker/mesos and nvidia GPU detailed tutorials
Docker Storage Mode Selection Suggestions
Joyent CTO talks about the need for containers to change in the 2016
This article originally appeared in (http://myopsblog.wordpress.com/), translated by a highly available architecture, reproduced please specify the source, technical original and architectural practice articles, welcome to the public number menu "Contact us" to contribute.
Highly Available architecture
Changing the way the Internet is built
Long press QR code follow "High Availability architecture" public number