Research on high-concurrency and large-capacity NoSQL solution

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the big data age, companies are demanding more from DBAs. At the same time, NoSQL, as a new technology in recent years, has received more and more attention. This article will be based on the DBA work of Mr. Meng Xianhao, and the relevant experience of big data operations, to share two main directions: first, the company's structure evolution on KV storage and the problems that need to be solved; second, some thoughts on how to choose NoSQL and its future development.

According to official statistics, the current (April 20, 2018) NoSQL has 225 solutions, specific to each company, using a very small subset of them, the product in blue is the current push is in use.

The origins of NoSQL

The first general-purpose computer was born in 1946. But until the 1970 rdmbs, everyone found a common data storage solution. By 21st century, the DT era has made data capacity the toughest problem, with Google and Amazon proposing their own nosql solutions, such as Google's BigTable in 2006. At a technical conference in 2009, the term "NoSQL" was formally presented, with a total of 225 solutions now.

The difference between NoSQL and Rdmbs is mainly at two: first, it provides modeless flexibility to support flexible schema changes, and secondly, scalability, native RDBMS only applies to standalone and small clusters. NoSQL was distributed at the outset, addressing both read-write and capacity scalability issues. The above two points are also the root cause of nosql.

There are two main ways to achieve distribution: replicas (Replication) and Shards (sharding). Replication can address read extensibility issues and HA (high availability), but cannot address the scalability of read and capacity. And sharding can solve the expansion of read-write and capacity. The general NoSQL solution is to combine the two.

Sharding mainly solves the problem of partitioning data, mainly based on interval division (such as Rowkey Division of HBase) and hash-based partitioning. In order to solve the problem of hash distribution monotonicity and balance, the main use of virtual node is in the industry. The CODIS described below is also used with virtual nodes. A virtual node is a relationship that establishes a layer of virtual mappings between data shards and managed servers.

At present, people mainly based on the data model and access to the NOSQL classification.

A few nosql solutions that are commonly used

The size of a push redis system. Here are some of the problems encountered in the operations process.

The first is the process of technological architecture evolution. Push to deliver message push services for app developers, and before 2012, the volume of business was relatively small, when we used Redis as a cache and persisted with MySQL. In the 2012-2016, with the rapid development of a push business, single node has been unable to solve the problem. In the case where MySQL cannot solve the high QPS and TPS, we have developed the Redis shard scheme. In addition, we have developed our own Redis client to implement basic clustering capabilities, support custom read and write ratios, and check for monitoring and isolation of failed nodes, slow monitoring, and health of each node. However, this architecture does not have much consideration for operational efficiency, and lacks operational tools.

When we plan to improve our operations tools, we find that the Pea Pod team will codis open source and provide us with a good option.

The advantages of a push codis+

Codis is a proxy-based architecture that supports native clients, supports Web-based cluster operations and monitoring, and also integrates with Redis Sentinel. Can improve the efficiency of our operations, and HA is more prone to landing.

But in the process of use, we also found some limitations. So we put forward the codis+, that is, to do some enhancements to CODIS.

First, the use of 2n+1 copy scheme to solve the problem of master single point during the failure.

Second, Redis quasi-synchronization. Set a threshold, such as slave, to be readable only within 5 seconds.

Thirdly, pooling of resources. The ability to expand resources by adding regionserver like HBase.

In addition, it also features organic frame sensing and cross-IDC functionality. Redis itself is set up for stand-alone rooms and does not take into account these issues.

So, why don't we use the native Rredis cluster? There are three reasons: First, the original cluster, it has the function of routing and forwarding and the actual data management function in a function, if a function of the problem will lead to data problems, second, in the large cluster, peer-to structure to achieve a consistent state of the process is more time-consuming, Codis is a tree-based architecture, There is no such problem. Third, the cluster has not been endorsed by the big platform.

In addition, with regard to Redis, we are looking at a new NoSQL solution recently, Aerospike, and we are positioning it to replace some of the clustered Redis. The problem with Redis is that data resides in memory and is expensive. We expect to use aerospike to reduce TCO costs. Aerospike has the following characteristics:

First, aerospike data can be put in memory, can also put SSD, and the SSD has been optimized.

Second, the pooling of resources, operation and maintenance costs continue to reduce.

Support Rack-aware and synchronization across IDC, but this is an enterprise-level version feature.

At present, we have two internal business in the use of aerospike, measured down, found a single physical machine with a single inter SSD 4600, can reach near 10w QPS. For businesses with large capacity but low QPS requirements, the Aerospike solution can be chosen to save TCO.

In the process of nosql evolution, we also encounter some operations and maintenance problems.

Standardized installation

We have a total of three parts: OS standardization, Redis file and directory standards, Redis parameter standardization, all with Saltstack + CMDB implementation;

Capacity expansion and shrinking capacity

In the process of evolving technology architectures, the difficulty of scaling and shrinking is also getting lower, one reason being that codis alleviates some of the problems. Of course, if you choose Aerospike, the operation will be very easy.

Monitor and reduce operation and maintenance cost

Most of the operation and maintenance students should seriously read the "Sre:google", which in the theoretical level and practical aspects of a lot of very valuable methodology, strongly recommended.

A push Redis monitoring complexity

Three cluster architectures: self-research, CODIS2 and CODIS3, these three architectures collect data in different ways.

Three types of monitoring objects: cluster, instance, host, need to have metadata to maintain the logical relationship, and in the global aggregation.

Three personalized configuration: A push redis cluster, some clusters need to have multiple copies, some do not need. Some nodes allow full cache, and some nodes do not allow full. There are persistent strategies, some do not persist, some do persistence, some do persistent + offsite backup, these business characteristics of our monitoring flexibility to put forward a high demand.

Zabbix is a very complete monitoring system, for about three years, I have it as the main monitoring system platform. But it has two drawbacks: first, it uses MySQL as the backend storage, the TPS has an upper limit, and the second is not flexible enough. For example: a cluster placed on 100 machines, to do the aggregation of indicators, it is very difficult.

Xiaomi's Open-falcon solves this problem, but it also creates some new problems. For example, the alarm function is very few, does not support the string, sometimes increases the manual operation and so on. Later on, we added a functional supplement to it, and we did not encounter major problems.

It's a push-and-carry platform.

The first is the IT hardware resource platform, which mainly maintains the physical information of the host dimension. For example, the host in which rack on which switch, in which room of which the floor and so on, this is to do rack perception and cross-IDC and so on the basis.

The second is the CMDB, this is the maintenance of software information on the host, which instances are installed on the host, the instances belong to which clusters, we use which ports, these clusters have what personalized parameter configuration, including the alarm mechanism is not the same, all through the CMDB implementation. The data consumers of the CMDB include the Grafana monitoring system and the monitoring and acquisition program, which we develop ourselves. This will make the CMDB data live. If only a static data is not consumed, the data will be inconsistent.

The Grafana monitoring system aggregates a number of IDC data, and we only need to look at the big screen every day.

Slatstack for automating releases, standardizing and increasing productivity.

The acquisition process is developed by us and is highly customizable to the business characteristics of the company. There are also elk (do not logstach, with filebeat) to do the log center.

Through these, we build a push the entire monitoring system.

Here are some of the pits encountered during the construction process.

First, the master and slave Reset, will cause the host node pressure explosion, the primary node can not provide services.

There are many reasons for a master-slave reset.

The Redis version is low and the probability of a master-slave reset is high. Redis3 the probability of a master-slave reset is significantly less than the REDIS2, and the REDIS4 supports incremental synchronization after a node restart, which is a lot of improvement for Redis itself.

We are now mainly using 2.8.20, which is relatively easy to produce a master-slave reset.

The master-slave reset of Redis is typically triggered by one of the following conditions.

1, Repl-backlog-size is too small, the default is 1M, if you have a large number of writes, it is easy to penetrate this buffer; 2, Repl-timeout,redis master/slave default ping every 10 seconds, 60 seconds ping will be reset master and slave, the reason may be network jitter , the total node pressure is too large, can not respond to this package, etc. 3, Tcp-baklog, the default is 511. The default of the operating system is limited to 128, this can be moderately improved, we raise to 2048, this can be a network packet loss phenomenon of some fault tolerance.

These are the causes of the master-slave reset, the consequences of the master-slave reset are very serious. Master pressure burst does not provide service, the business will make this node unavailable. The response time is longer than the node for all hosts on which the master resides will be affected.

Second, the node is too large, the part is caused by man-made reasons. The first is that the splitting of nodes is inefficient, much slower than the growth of the company's business volume. In addition, there are too few shards. Our shard is 500, Codis is 1024,codis native is 16,384, fragmentation too little is also a problem. If you do the self-research of distributed solutions, we must make the number of shards, a little bit larger, to avoid business development than you expect the situation. When the node is too large, it causes the duration of the persistence to grow. Our 30G node to be persistent, the host remaining memory is greater than 30G, if not, you use swap to cause the host to persist the time greatly increases. It can take up to 4 hours for a 30G node to persist. Too high a load can also lead to a master-slave reset, causing a ripple effect.

About the pits we met, let's share a few practical cases.

The first case is a master-slave reset. The situation is two days before the Spring Festival, the Spring Festival belongs to the message push business peak. Let's simply restore the failure scenario. The first is the large-scale message that causes the load to increase; Then, the Redis master pressure increases, the TCP packet backlog, the OS generates packet drops, lost the Redis master-slave ping Packet Lost, triggered the repl-timeout 60 seconds threshold, the master-slave reset. At the same time, because the node is too large, swap and IO saturation is close to 100%. The solution is very simple, we first disconnect the master and slave. The reason for the failure is the parameter is unreasonable, mostly the default value, followed by the node too large to enlarge the effect of failure.

The second case is a problem recently encountered by Codis. This is a typical failure scenario. A host hung up, Codis turned on the master-slave switch, the master-slave switch after the business has not been affected, but we go to reconnect master and slave found not connected, not connected on the reported wrong. This error is not difficult to check, in fact, the parameter is set too small, but also due to the default value. Slave in the process of pulling data from the main node, the new data is left in the master buffer, if the slave is not finished, the master buffer exceeds the upper limit, it will result in a master-slave reset, into a dead loop.

Based on these cases, we have compiled a best practice.

First, configure CPU affinity. Redis is the structure of a single point, and incompatibility affects the efficiency of the CPU.

Second, the node size control at 10G.

Third, the host remaining memory is better than the maximum node size +10g. The master-slave reset needs to have the same size of memory, this must be left enough, if not enough, with the swap, it is difficult to reset the success.

Four, try not to use swap. 500 milliseconds It's better to hang up in response to a request.

Five, Tcp-backlog, repl-backlog-size, repl-timeout moderate increase.

Six, master does not do persistence, slave do aof+ timed reset.

Finally, there are some personal thoughts and suggestions. Choose a nosql that suits you, with five points in the selection principle:

1, business logic. First of all to understand the characteristics of their business, such as the KV type in the KV in the search, if the pattern is found in the pattern, so that the range will be reduced by 70%-80%.

2. Load characteristics, QPS, TPS, and response time. When choosing a nosql scenario, you can measure from these metrics how much performance can be achieved with a single machine under certain configurations? Redis is fully OK for a single qps40-50 when the host is sufficient for the remainder.

3, data size. The larger the data size, the more questions you need to consider, and the less selective you will be. With hundreds of TB or PB levels, there is little choice but a hadoop system.

4, operation and maintenance costs and can not be monitored, can be easily expanded, reduced capacity.

5, other. For example, there are no successful cases, there is no perfect documentation and community, there is no official or corporate support. Can let others put on the pit after we smooth the past, after all, the cost of stepping on the pit is quite high.

Conclusion: On the definition of NoSQL, there was a joke on the Web: from the 1980 know SQL, to the 2005 's not only SQL, to today's no sql! The development of the Internet is accompanied by the renewal of technical concept and the improvement of related functions. And the technological progress behind, is every technical person's continuous learning, careful thinking and unremitting attempts.

Research on high-concurrency and large-capacity NoSQL solution

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Research on high-concurrency and large-capacity NoSQL solution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Research on high-concurrency and large-capacity NoSQL solution

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support