Design and implementation of Codis part 3

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Performance, HA (high availability), operations, etc.

"Single-machine performance is never a problem for a well-designed distributed system--I

The first article said, Codis adopted a Proxy scheme, so will inevitably bring the loss of single-machine performance, after testing, in the case of not open pipeline, probably will lose about 40% of the performance, but Redis itself is a frightening thing, even if the loss of 40% is still a single A lot of numbers.
Another good place is that Codis itself can be fully utilized multi-core (Thanks to Golang), in the context of multi-threaded client, not like Twemproxy, will not be able to run full of a CPU (of course, you can deploy Twemproxy multi-instance, But it does increase the cost of operation and maintenance ... )。 And don't forget,
Codis can be extended by parallel multiple proxies to achieve a multiplier increase in performance, a machine CPU full? OK, another machine to play a proxy just. One of the Redis runs full? It's okay, migrating part of the data to another Redis instance. So, we never think that single-machine performance can explain anything. The biggest benefit of using Codis is the ability to provide elastic scaling capacity for your cache, rather than pursuing the performance of the bottom Redis ... This is why we chose go instead of C to develop.

We conducted a detailed performance test, the results of the test are as follows: Benchmark, 2 proxy in the case, a single machine can reach the 20w QPS.

Ha (Redis ha, Proxy ha)

For Redis HA, I am a very tangled, carefully used Codis students will find that when the master of a server group dies, although the server group can have multiple slave, but these slave does not automatically promote to Master
Of course it is not difficult to achieve this function, but I think this situation should be made clear to the administrator, and manual operation, because if the automatic cut to the slave, this time the original master has not synchronized to the slave data may conflict, if Master is resurrected again, Resolving data conflicts is a troublesome problem, instead of automating the operation, it is better to give the client a failure (and only the slots that the machine is responsible for will fail, and if the instance is large enough, there will be no fatal single point of failure), let the administrator handle it.

Proxy HA, in the decision to use a stateless proxy scheme, automatically brought high availability assurance, this does not say, there are many ways, such as Smart Dns,haproxy, the client connection ZooKeeper do proxy connection pool.

Speaking of the operational dimension, Codis almost all of the operations are initiated through Codis-config, codis-config in any operation, will go to ZooKeeper to take a lock, to ensure that the only operation instance, which is to prevent the routing table has been changed a measure of bad, In particular, to set up a more sensitive operation such as migration, you must ensure that you cannot have multiple slots in the migration state at the same time, so during the entire migration, the lock is not released.

So what if, in the process of migrating a slot, I forcibly Kill (kill-9) Codis-config not let it release the migrated lock? Will it deadlock?

The answer is: No.

Why? First of all, in the migration process at any stage of the interruption, there is no problem, because to the bottom of the Redis, the migration is only one atom of key, I killed the codis-config, just stopped sending instructions to the person, resulting in this slot is not all migrated clean, in Zk is also a long-term migration slot (because Codis-config was killed, no one to send the migration command, and will not modify the slot state when the migration is complete). At this time if the client has a request, Proxy will also take the initiative to send a migratekey first forcibly moved this key, so it has no effect on the client.

And Codis-config also has a feature, each time you start, will be in the ZK register a temporary node, record their own PID and machine name, and all the locks on the machine name and PID signature, each time you start, will scan all the unlocked locks, If the temporary node of the owning process of the lock is no longer present, the lock will be released directly, thus avoiding the deadlock state.

Only the next time you start a new migration task, you need to migrate this non-migrated slot by initiating a migration task, which is written as the slot ID for the From and to slots, and the new group ID is set to the group ID of the previously unfinished task, which is In order to ensure that the system can return to a clean state, then the next new migration task. This is an artificially imposed restriction.

In the process of actually using Codis, we developed a tool called Redis-port for some particularly lazy business (lazy to rebuild the cache), and it was used as a fake slave, hanging behind a redis, then synchronizing the master data back, sync to Codis cluster, so, business parties do not have to rebuild the cache at all, after the direct synchronization, change the address restart service OK, this is also asked Oh in the company to promote the project particularly easy a killer. :P

In addition, Codis has a different place from other backend middleware: it not only provides the complete Unix CLI interface, incredibly has a cool dashboard and complete RESTful API! Well, yes, and in the actual production environment, we found that using dashboard is safer, less misoperation, more convenient, and the real-time state of the system is very clear. This year, to not have a good-looking dashboard are embarrassed open source ... (Well, actually I just learned AngularJS, kungfu ... But the end result is good).

Three articles almost finished, what is the problem can be post issue on GitHub, you can also contact me on Weibo @Dongxu_Huang or @goroutine, of course, mail is also available, and if you are interested in the PEA Pod infrastructure Group, you are welcome to send your resume. We are a heavy user of Golang:) Also open source fanatics who have the opportunity to work with you, email:huangdongxu1987@gmail.com.

Announce the authors of Codis:

@Dongxu_huang (aka c4pt0r): Codis-config, distributed protocol implementation, data migration and auto Rebalancer,dashboard, unit-test ...
@goroutine (aka Ngaut): Codis-proxy, Redis Protocol resolution, Router,zkhelper, unit-test ...
@spinlock9: Redis Patch, Redis-port, Benchmark, test ...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.