A number of requirements for OpenStack Enterprise Private Cloud (6): Large scale scalability support

Source: Internet
Author: User
Tags haproxy

This series introduces several requirements for OpenStack Enterprise Private cloud:

    • Auto-Scaling (auto-scaling) support
    • Multi-tenant and tenant Isolation (multi-tenancy and tenancy isolation)
    • Hybrid clouds (Hybrid cloud) support
    • Mainstream hardware support, cloud rapid delivery, and SLA assurance
    • Large scale Scalability Support
    • Private cloud perimeter support (including support CDN, Commercial SDN Controller, firewalls, vpn/leased line, etc.)
    • Up-scaling (support for PaaS and SaaS)
    • Enterprise Data Center IT environment support (including bare metal/bare metal, F5, GPU, cross-cloud network connectivity, tenant billing, backup, etc.)
    • Industry Solutions
    • Independent services, including training, operations, etc.

Extensibility (Scalability) is one of the fundamental elements of the cloud, so it is no exception to the OpenStack cloud.

On the one hand, as compared with the already very mature public cloud and private cloud scheme, the current OpenStack has a lot of shortcomings in the extensibility, which has brought a considerable number of problems to its massive scale. On the other hand, extensibility itself may not be too big a problem, such as whether a cloud can support 200 nodes or support 300 nodes may not be so important, but personally think that extensibility and product quality are closely related. A well-scaled OpenStack private cloud product can be easily linked to its high quality, a system that supports large scale scalability, and its stability, reliability, and usability will be better.

This article is based on some of the current open results of OpenStack private cloud extensibility, combined with your own understanding of how to set the scalability goals for the OpenStack Enterprise private cloud and how to achieve these goals technically.

1. Scope of extensibility and some Open Data 1.1 extensibility

The OpenStack cloud includes storage, compute, and networking, where:

    • Storage is often external storage, including open source such as Ceph, as well as business-class storage such as EMC,IBM, so the extensibility of storage can be discussed separately.
    • The scalability of the network is within the scope of OpenStack extensibility
    • The scalability of the calculation is within the scope of OpenStack extensibility
1.2 Several comparable public data 1.2.1 The scalability limit for a single hp Helion private cloud: 200 compute nodes, 1000 virtual machines

Source

1.2.2 Maximum scalability for a single Huawei private cloud: 1024 compute nodes, 80000 virtual machines

(from Huawei website)

1.2.3 VMware vSphere 6.0 Extensibility: Up to 1000 nodes per VCenter, 8,000 virtual machines

(from VMware website)

1.2.4 According to the OpenStack Community User survey, 56% users of the virtual machine number within 50, 30% users of the virtual machine number within 500.

2. A number of public large-scale test Cases 2.1 Mirantis Test 2.1.1 Environment configuration for 200 and 350 node environments, respectively, on SoftLayer
Totals forresources are:* -, theVcpus*6250Gb of RAM*200TB of disk space. * -hardware servers #最多400台服务器* CPU as 1: +* RAM as 1:1.5,    * OpenStack2013.2(Havana)*Only three OpenStack Services:compute (Nova), Image (Glance) and Identity (Keystone). * For networking, we used the Nova Network serviceinchFlatdhcpmode with the multi-host feature enabled. * Standard Mirantis OpenStack HA architecture, including synchronously replicated multi-master database (Mysql+galera), so Ftware Load Balancer forAPI Services (haproxy) and Corosync/pacemaker forIP level HA* Rally
2.1.2 200-node test results

In the case of minor configuration modifications (Modify the connection pool size of sqlalchemy from 5 to 100, modify the Haproxy number of connections to 16000), the success rate of creating a virtual machine is up to 99.96%:

That is, the OpenStack community version supports 200 compute nodes in an environment that basically does not need to do any code modification and optimization, and this support is intrinsic.

2.1.3 Test results for 350 node environments

The success rate decreased slightly, but the decline was limited.

2.2 CERN Private Cloud uses Nova Cell to manage more than 5,000 compute nodes

They use 33 cells, and each child cell has about 200 compute nodes, and manages more than 5,000 compute nodes. For more information, refer to the Hyper-node OpenStack Private cloud Case (1): CERN 5000+ compute node Private cloud.

3. Some techniques to improve scalability 3.1 Compute resource extensions: Nova-cell 3.1.1 Nova cell's layered architecture

OpenStack introduces Nova-cell technology in the G release, which is summarized as follows:

    • Each cell has a separate DB and AMQP broker
    • to introduce a new Nova-cell service
      • message routing
      • schedule (unlike host scheduling, the child cell will be able to and resource escalation to parent cell)
    • cell communication between RPC
    • cells is a tree structure
      • top level is API cell, not aware of underlying physical host and virtualization
      • Child Cell no NOVA-API service
      • Child Cell No quota concept (Noopquota)

As a result, the capabilities of the Nova cell allow us to scale the OpenStack Compute cloud in a more flexible and distributed manner without the need for more complex technologies, such as database and Message Queuing. It is designed to support larger deployments. When this feature is enabled, the hosts in the OpenStack Compute cloud are grouped, called cells. The structure of the cell is the form of a tree. The host in the top-level level of the cell runs a NOVA-API service, but it can have no nova-compute service. Each sub-cell should run all nova-* types of services in general OpenStack cloud computing, in addition to the NOVA-API service. We can think of a cell tree structure as a normal openstack compute deployment, because each cell in the tree has its own database service and Message Queuing service.
The Nova-cells service handles communication between cells and selects a cell to create a new instance. This service will be required by each cell. Communication between cells is pluggable, and the current communication between cells is only implemented by RPC service. The cell service is used to realize the cell dispatching and the host node scheduling is separated from each other. The Nova-cells service first selects a cell (currently implemented as a random selection, will add a filter/weighting feature in the future, and can also be based on the parameters such as Capacity/capabilities obtained by the broadcast). Once the appropriate cell is selected and a request to create a new instance arrives on top of the cell's Nova-cells service, the cell will send a request to the cell's host scheduler to create a new instance.

Status and problems of 3.1.2 Nova Cell

Nova Cell has two versions of V1 and V2. At present, the development of V1 has been frozen, a large number of bugs have not been repaired, V2 is still in the development process, there is no formal GA, therefore, its architecture is not stable enough, the official does not recommend in the production environment deployment use. Therefore, if the manufacturer needs to use the Nova Cell, it needs to do its own development and maintenance, but at present, this seems to be the only way to expand computing resources, so you can see a lot of customers using CERN, Tianhe, RackSpace and EBay and so on.

3.2 Extension of network resources 3.2.1 Standard Neutron

The standard Neutron is very poor in scalability because it is a cross- Network traffic is all going through the network node, which makes it a bottleneck that prevents further expansion of the cloud scale.

HP used OpenStack on its public cloud, and its Neutron is based on the Icehouse version. They summarized some of the recommendations, including:

    • Upgrade Neutron (Icehouse is better than Havana are better than Grizzly ...)
    • Make sure your neutron server is properly provisioned and tuned
    • Make sure the metadata agent is properly tuned
    • Upgrade your kernel (newer is generally better)
    • Make sure Sudo is properly versioned and tuned
    • Expect improved stability, performance and scalability in Juno

Please refer to the original text for complete information.

3.2.2 Neutron DVR

DVR is introduced in the Juno version, detailed explanations can be referenced to understand OpenStack high availability (HA) (3): Neutron Distributed Virtual Routing (Neutron distributed virtual Routing)

DVR is an optimization of the standard Neutron, it brings some advantages, but also introduces a big new problem:

Advantage Disadvantage
Distributing east-west traffic and DNAT to compute nodes Put a lot of pressure on Message Queuing (for example, synchronizing ARP to all names)
Obviously reduces the pressure on the network nodes. Managing Complex
Huge code churn that affects the stability of your code
Performance degradation using the Linux TCP/IP protocol stack
3.2.3 SDN: Solves problems in some scenarios, but does not solve all problems

In essence, SDN is a centralized solution that is not intended to address Neutron extensibility issues, but rather to address performance and manageability issues.

Here is an SDN solution for domestic Centec:

It can eliminate a single point bottleneck to a certain extent and bring scalability improvements:

However, SDN itself uses a logically centralized SDN controller (Controller) that, even if they are physically distributed, has some limitations on its own extensibility.

3.2.4 Dragonflow

It is said that it is to solve the problem of Neutron DVR, and it compared with some of the advantages of DVR:

For more detailed instructions on Dragonflow, please read the official documentation. Currently, its development is led by Huawei and is still in progress, so it cannot be used in production systems.

3.2.5 Neutron optimization, has been in progress, has been on the road

Perhaps you can consider the following ideas:

    • 300-node environment, you can use a standard Neutron that has been carefully tested and configured for optimization
    • 300 to 500 node environments, consider using SDN
    • More than 500 nodes of the environment, need to adopt a distributed solution, including distributed SDN and optimized Neutron DVR and other solutions
3.3 Scalability of the control service

Because most of the control services are stateless, they can be scaled horizontally to provide extensibility. Take Tianhe second as an example, they use the configuration to manage an environment of more than 6,000 nodes:

Openstack─icehouse (2014.1)    * 8 Nova Control nodes: Run Nova-api and nova-cells;    * 8 Mirror Service nodes: Run glance-* service    ,* 8 volumes Service node: Run cinder-* service;    * 8 network control nodes: Run neutron-Server service;     * 16 Network SERVICE nodes: Run neutron-*-agent service;    * 8 Authentication Service nodes: Run Keystone Service;    *  6 Message Queue nodes:, run RABBITMQ;    * 6 database nodes: Run MySQL;    * 4 Load Balancer nodes with lvs+  Keepalived implementation of API node scheduling and distribution and high availability    ;* 2 Horizon nodes;    * 8 Ceph monitoring nodes, running Ceph Mon service;     * 16 Monitoring nodes: In order to monitor and alarm the current system state, 16 nodes are deployed as server side of ganglia and nagios;
4. How to set the scalability goals for the OpenStack Enterprise private cloud

    from HP and Huawei's products, the difference between the respective extensibility caps is very large; from the customer's point of view, from several compute nodes to thousands of compute nodes, there is a need. So how do you set an extension limit that is reasonable in terms of technology and cost? In combination with the above analysis, I think the following settings are reasonable:

Scale level COMPUTE node Upper limit Network Solutions Calculation scheme Achieve cost Customer needs Market competition Intensity
Small 200 Standard Neutron Standard Nova Low Large size, suitable for small customers Very high
In 500 Optimized centralized scenarios such as optimized and cropped standard neutron and SDN A layered scheme similar to the Nova Cell Lower Moderate, suitable for midsize customers In
Big 1000 Distributed solutions, such as distributed SDN or optimized Neutron DVR A layered scheme similar to the Nova Cell Upper Small, suitable for large customers, or IDC Low
Super Large (public cloud scale) Thousands of or even tens of thousands Modifications need to be optimized from various aspects, including schema modifications, code rewriting, new functionality additions, and the use of third-party components, which may enable High are often developed by public cloud providers themselves Only a few vendors can achieve

A number of requirements for OpenStack Enterprise Private Cloud (6): Large scale scalability support

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.