How to obtain million TPS at 1.68 USD/h cost

Source: Internet
Author: User
Keywords Solutions applications virtualization performance optimization
Tags application applications client clients cloud cloud applications cloud vendors community

The cloud infrastructure, such as Amazon EC2, has proven its value worldwide, and its ease of scaling, out-of-the-way, on-time billing, and so on, has freed developer creativity more thoroughly, but don't overlook the virtualized environment that was once considered a performance killer for applications and databases.

Despite the performance aspect, cloud vendors have been looking for ways to improve, but as users of us, our own performance optimization tools are also essential. On the entity server, Aerospike has shown the peak of the million TPS, and now we are dedicated to improving the performance of cloud applications, breaking the rumor that the cloud does not equal high performance.

We have done a variety of cloud instance comparisons, and here will show you how to get the performance of millions of transactional databases per second, based on C3.8xlarge, at $1.68 per hour.

The following posts are translated:


The higher the better.

This report records a number of experiments on the Amazon EC2 instance, and the following is the moment to witness the miracle, how to adjust the parameters and instructions to get this level of price performance ratio.

Unless otherwise indicated, the experiment will faithfully execute the following settings:

single Aerospike memory NoSQL the database server runs on a separate Amazon instance. 10 million objects, each object size in 100 bytes (Ten bins or columns, 10 bytes per size), we adjust as much as possible. Keep the object size small enough so that all the research objects can be put into memory. Using a aerospike Java client with Java benchmark tools, the client process uses a non-server instance pattern. Maximize the client load and connect it in parallel to the Aerospike server, depending on the type of server instance you are running. To maximize the load on Amazon infrastructure, the 100% read load distributed throughout key space is used in memory. We use Spot-instances to save costs, and the performance of spot and on-demand instances is essentially the same during performance testing.

Step 1: Select an instance type

When you select an instance, you basically follow these guidelines: You can support the entire database in memory, and the network and CPU that can support the transaction rate.

We do not consider i2 and HS1 instances, because although they have high storage capabilities, they limit the application's memory usage.

We investigated a number of EC2 instance types:

Runnable Memory-intensive

C COMPUTE-intensive

M--r and C balance

T--1 an entry-level host

Different types of instances have different network capabilities, so we investigate each type and test their maximum bandwidth limit, using the contract tool to push larger packages from as many clients as possible. After investigation, the results are as follows:

Moderate (upper limit is 100MBps)

High (range between 300Mbps to 1.8Gbps)

Gigabit (peak around 8.8 Gbps)

We did not consider T1 and M1.small, because Amazon labeled it "Low", which is largely a result of instances of "low" or "moderate" annotations, where network performance often becomes an instance bottleneck. In addition, the M-series instance does not have the option to increase the network at the time of the test.

Step 2: Choose a virtual machine

Amazon uses Xen, an open source virtualization platform that supports semi-virtualization (PV) and full virtualization, and Amazon has hardware virtualization options. We also used two types:

Para virtualized Machine. First, we chose C3.8xlarge. Although its cost is very high, but we think still has the bright prospects. At first, we used a PV AMI in the Ec2-classic network, but then we found that the 10Gbps-capped machines only had the ability to get 1.8 to 2.2 Gbps, and the host rate was limited to around 85,000 packets per second. After testing the C3.2xlarge, we found that the host still has a speed limit. Looking at CPU usage again, we found that in a semi virtualized host, the CPU utilization of the virtual layer under heavy load will reach 40% of the horror. Using these hosts, we can get 85,000 TPs on the Aerospike, and the speed limit appears again. Hardware virtualized Machine. In AWS's instructions, HVM, which is based on a network-enhanced instance, provides higher network performance, and users get higher packets per second. According to our tests, TPS increased by 3 times times in the absence of other additions. In the same-end configuration, a 215,000 TPS can be obtained using a network-enhanced c3.4xlarge,aerospike.

In view of the performance difference of the same price, compare PV instance, our future test will only use HVM type.

Step 3:use placementgroups

The slide-up groups is a logical group of instances in the same availability zone, using use Cloud Group in Amazon Virtual Private VPC (slide-up) to allow applications to share resources at low latency, Completely split the Gbps network.

We use slide-up groups to maximize performance, while VPC is the primary prerequisite for HVM instances.

Step 4:test Tenancy

We tested the private lease (dedicated tenancy) and shared tenancy, and we decided to choose a shared lease with no apparent difference.

Step 5: Minimize cpustealing

The biggest risk to running an application on AWS is the unpredictable nature of the computing resource, which ends the thread's time slice when the CPU is overused, such as when a thread suddenly takes up too much cpu,ec2. Meanwhile, in the next time slice, Amazon will only give it less resources. This can lead to ups and downs in application throughput by controlling the application thread CPU. In this way, we can ensure that the application has a more stable throughput rate.

In our tests, Aerospike completely avoids CPU usage. This gives us predictable performance, and applications are rarely affected by CPU-stealing monitoring (from 0.2 to 0.7) even when the load is heavy.

Step 6: Network

After the instance type, VN, slide-up Groups, lease, and tuning are selected, we still cannot determine the TPS restricted aspect. Looking at the example, we find that the system still has a bottleneck because of the interrupt (interrupt) processing on the single core. Each NIC appears to provide only one interrupt queue, which is bound to a single core by default. So we need a more flexible solution, and we've tried four ways:

1. IRQ Distribution: We tried to force the system to distribute IRQ to multiple cores (disable Irqbalance + echo ffff > *smp_affinity), and then found that it was bound to an independent core. Therefore, a separate IRQ cannot be allocated on multiple cores.

2. Interrupt coalescing: On EC2, Interrupt coalescing has a slight increase in CPU utilization, but does not translate to better processing.

3. More NICs: After testing these two ways, elastic receptacle interfaces (ENI) is undoubtedly the next route. Eni allows users to add multiple (virtual) NICs to an instance. Single NIC peaks around 250,000 TPS, adding multiple interfaces can increase program responsiveness, configure 4 interfaces and 4 clients on two instances, and we can get 960,000 TPs on a single C3.8xlarge instance. All NICs and CPUs can be leveraged by ensuring that each client is pushed to a private interface. At the same time, using a ENI with a private IP does not add to the cost.

4. Receive Packet steering: Another easy way is to use RPS ("Echo F >/sys/class/net/eth0/queues/rx-0/rps_cpus") to allocate the IRP to multiple cores. This avoids the use of multiple nics/enis, while also avoiding management complexity and bringing in more Eni similar TPS. A single NIC configured with RPS can push TPS to 800,000 through interrupt across 4 cores.

After testing the different combinations of NICs and RPS, we can achieve very high performance with some aerospike tuning (such as appropriate tuning service thread configuration) 5 clients (C3.2xlarge) plus a single node Aerospike cluster running on the C3.8xlarge can reach millions of TPs and spend just 1.68 dollars per hour.


* * DI = on-demand instance

* * RI = Reservation instance

* All is a single instance of the given type of cost analysis, 3-year and 1-year reserved instances term calculation uses a heavy load (100% utilization)

NB: These examples sometimes reflect the performance of good or bad, these numbers are more than one months to run the results.

* Core Bottleneck: We try a variety of different NICs and RPS combinations, but usually there is a large number of%hi, while there will be a 1 core blocked, peak CPU utilization only 50%.

Summary

The above solution embodies the option that Amazon EC2 has already been able to use as a high-performance database, using Aerospike on AWS You can get million TPS for $1.68 per hour.

Use AWS Marketplace to set up a C3.8xlarge instance aerospike HVM in VPC as a server. Use an AWS marketplace. With 4 additional Eni for the server instance, there is no need to deploy more elastic IP. Run afterburner.sh (cd/opt/aerospike/bin; sudo./afterburner.sh) To optimize the number of threads per aerospike use AWS Marketplace to set up 5 aerospike C3.2xlarge instances, which use instances of the same HVM and VPC as clients, and are arranged in the same slide-up group as servers. The security group is guaranteed to have 3000-3003 TCP port for communication between nodes, and is open to the Internet in order to use Amc,tcp Port 8081. Allocate 54GB RAM to the memory namespace (C3.8xlarge has 60GB) Open the Aerospike server to load the data CD through the Java benchmark client <java client>/benchmarks<br>./ Run_benchmarks-z 40-n test-w I  -o S:10-b 10-l 23-k 10000000-latency 5,1-h Your_aws_internal_ip uses 100% of the read load to run the server, and each client is assigned a private IP on a separate ENI. CD <javaclient>/benchmarks<br>./run_benchmarks-z 40-n test-w ru,100  -o S:10-b 10-l 23-k 10000000-latency 5,1-h your_aws_internal_ip through 5 clients, you can get million TPS.

But this is only the first step, in the next article, we will continue to evaluate the 4 Amazon instance price performance ratios. We will use a 4-node Aerospike cluster in memory and 5 different read and write loads.


Original link: http://highscalability.com/blog/2014/8/18/1-aerospike-server-x-1-amazon-ec2-instance-1-million-tps-for.html

If you need to know the latest AWS information or technical documentation to access the AWS Chinese technology community, if you have more questions please ask at the AWS Technology Forum and experts will answer later.

Subscribe to the "AWS Chinese technology Community" micro-credit public number, real-time command of AWS technology and product information!

The AWS Chinese technology community provides an Amazon Web service technical Exchange platform for the vast majority of developers, pushing the latest news, technical videos, technical documents, wonderful technical blogs and other related highlights from AWS, as well as having AWS community experts to communicate with you directly! Join the AWS Chinese technology community to quickly and better understand the AWS cloud computing technology.

(Translator/Shirongyang Zebian/yuping)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.