The actual performance of Docker in the Spring Festival galaMarch 3, 2015 16:02Docker successfully for 102 million small partners brush Micro Bo, Rob Red Envelopes provide a reliable service. Then the previous article, "Large-scale Docker cluster to meet the peak challenge of the Spring Festival Gala", here to share the actual performance of Docker in the Spring Festival Gala, as well as the future development of ideas, because space is limited here only donuts, if you want to know more dry goods, I am waiting for you at Qclub:docker (HTTP://T.CN/RWIBKS0). First, let's introduce the size of the Docker cluster of Weibo platform:
- Docker cluster size up to 1000 nodes
- QPS Peak reached 800K
- Overall service SLA reached 150ms four x 9
- Docker Deployment covers 23 services
- Spring Festival Gala dispatching nearly 300 nodes to complete service dynamic expansion
Before sharing, let's throw a question: what is the biggest technical challenge when the scale of deployment reaches the level of the million? Welcome you to discuss the message. Let's take a closer look at the technologies currently used in the Docker deployment environment for Weibo platforms:
- Host CentOS 6.5, considering the current production environment and CentOS 7 security, stability, compatibility and other aspects of the problem
- Docker uses the 1.3.2 version and needs to be aware that some of the Lib versions that 1.3.2 relies on are conflicting with the CentOS 6.5 default installation
- The network adopts the host mode, Nat and the default bridge scheme can not meet the requirements of the platform, Ovs+vlan's solution is still in development, so the production environment uses the-host way
- Cadvisor + Elastic Search + Kibana + graphite as a container monitoring scheme
- File system with Devicemapper, it is recommended to specify a formatted partition to Devicemapper, otherwise it is recommended to adjust the sparse file quota by Dm.basesize,dm.loopdatasize according to the application situation.
- Registry is docker-registry 0.9.1 version, older version of the problem of suspended animation in compression
In terms of container orchestration, a customized solution for the platform business is achieved by referencing industry practices and combining its own business features with existing infrastructure:
- One container one process, one is convenient to monitor the service life cycle, the second is to facilitate resource isolation
- Log using data volume mount, one is to avoid the container to produce a large amount of data, stepping devicemapper sparse files of the pit, the second is convenient through the container to do data collection and compression
- Container lifecycle management for Fig/composer-like functionality while increasing the capacity for large-scale concurrency scheduling
- Service discovery, implementation reference Kubernates's pods and service concept, but did not adopt top-down governance (Replication controllers), but adopt the way of self-escalation, convenient and flexible scheduling.
- Seamless docking, focusing on solving the complexity of operation and maintenance management with the coexistence of Docker mode and common process mode, ensuring that Docker clusters can be seamlessly interfaced with other operations subsystems such as degraded subsystem, on-line release subsystem, etc.
- quantifiable, from QPS, service SLA, system load, survival, and more to measure the redundancy of each service Docker cluster
For the evolution of Docker in the platform, but also corresponding to the above questions, personally think that when the deployment scale to reach the million level, the biggest technical challenges are the following, the platform will continue to explore in these areas, but also hope to be able to give experience back to the community:
- Network bottleneck, all-level container deployment, will inevitably challenge the existing network infrastructure, need to solve network scale problem through the technology of SDN etc.
- Integration with the original facility may be one of the most frustrating problems that most teams face when it comes to Docker landing, and we want to be able to provide some standardized solutions to make the migration process smoother
- Everything is container, but there are still some technical problems to be solved, such as managing container stability in containers, container management for short life cycle, etc.
- At present, we are still in the "primary stage of socialism", all also rely on the "central" command, unable to manage the dynamic cluster level, kubernates, Mesos, Swarm technology provides the possibility, but the overall solution we are still groping, we hope to hear your voice
The actual performance of Docker in the Spring Festival gala