The column reposted from haier_jiang is implemented based on the redis distributed cache.

Source: Internet
Author: User

In brief, writing this article is a summary of your recent work. I hope it will be helpful and a little bit of accumulation.

 

I. Why redis?

 

Redis is used as the cache in the project and memcache is not yet used. There are two main considerations:

1. Support for redis's rich data structures, its hash, list, set, and function-rich strings can be of great help to actual projects. (Refer to redis. Io on the official website)

2. redis single point performance is also very efficient (using project data testing is superior to memcache ).

Based on the above considerations, redis is used as the cache application.

Ii. Architecture Design of Distributed Cache

1. Architecture Design

Because redis is a single point, it must be used in projects and distributed by itself. The basic architecture diagram is as follows:

 

 

2. distributed implementation

Implements the distribution of keys corresponding to redis nodes through consistent hash of keys.

Implementation of consistent hash:

L hash value calculation: MD5 and murmurhash are supported. By default, murmurhash is used for efficient hash calculation.

L consistent implementation: Simulate the ring structure through the Java treemap to achieve even distribution

3. Client Selection

Jedis is mainly used to modify the partition module so that it supports Partitioning Based on bufferkey and different redis node information. Different shardinfo can be initialized, at the same time, the underlying implementation of the jedispool is modified to connect to the pool to support the construction of data keys and values. Different shardinfos are used to create different Jedis connection clients, achieve the partition effect for the application layer to call

4. Module description

L The dirty data processing module handles cache operations that fail to be executed.

L shield the monitoring module and monitor exceptions in Jedis operations. When an exception occurs at a node, you can control operations such as redis node removal.

The entire distributed module uses hornetq to remove abnormal redis nodes. You can also use the reload method to add new nodes. (This module can be easily implemented for new nodes)

The implementation of the above distributed architecture meets the needs of the project. In addition, some redis nodes can be set separately for cache data of some important purposes to set a specific priority. In addition, the design of the cache interface can also meet the requirements to implement basic interfaces and some special logical interfaces. Cas-related operations and some transaction operations can be implemented through the watch mechanism. (Refer to my previous redis transaction Introduction)

 

The above is an introduction based on the redis distributed architecture! However, the read and write operations in the application are all the same. Related writes are flushed or updated after application operations, with certain coupling. To enable read/write splitting and lower coupling between the cache module and the application, use MySQL BINLOG to refresh the cache. The following are precautions for refresh and Analysis Based on BINLOG.

 

Iii. Feasibility Analysis of refreshing cache using the BINLOG Architecture

 

1. For MySQL Log format introduction, refer to my previous introduction.

2. For mixed log format, this log format records SQL statements for database operations. Problems with this log format:

L for some SQL statements without any update operations, such as the condition is not met, the corresponding SQL statements are also recorded in the BINLOG log.

L SQL statement records may not include all update operations.

L for some distributed databases, there may be multiple SQL statements for the non-balanced fields specified by the where condition in SQL, which is related to the design!

Based on the above considerations, it is not feasible to use the mixed log format for BINLOG parsing. (The instructions on the official website are failed statementsare not logged, but do not include syntax errors. The update conditions do not match the corresponding SQL statements)

3. Use the row log format

For this log format, each line of change has a corresponding record, this log format is very convenient for parsing and collecting data, and only using this log format can be modified based on BINLOG, design related cache refresh solutions. However, the log format also has some problems:

L check whether there are a large number of batch update operations in the project. If this log format is used, one log will be recorded for each row of batch operation modifications, and the log volume generated by a large number of batch operations will be considered, and whether the IO overhead is acceptable.

Through the above analysis, cache refreshing Based on the row log format is still considered in the final project. Another problem needs to be considered. After the corresponding update operation is performed on the application layer dB, the BINLOG generated will bring about a certain delay. If the BINLOG processing module runs normally, the data latency will be very low, and the user experience will not be perceived if it is within Ms, however, the BINLOG module is multi-point, abnormal, and the corresponding latency will certainly exist, so that the cached data will certainly have dirty data.

However, through the above solutions, data can achieve eventual consistency, so how to weigh, you need to consider.

Through the above analysis, whether to use BINLOG for cache data refresh is a basic concept.

 

4. Notes for cache refreshing Based on BINLOG

1. If Java is used for related development, you can use the open-source tunststenapi

2. BINLOG log Parsing is implemented according to the MySQL Master/Slave synchronization process, that is, one thread synchronization and one thread parsing.

3. The design is divided into BINLOG processing module and cache processing sqlevent. The BINLOG processing parses the corresponding sqlevent, and then the corresponding cache refresh processes sqlevent, a simple producer-consumer mode.

4. Multiple BINLOG processing modules can be single-point, or managed using some collaboration tools to view requirements. You can use zookeeper.

5. For the data in the distributed cache, the load data problem may occur for the cached data refreshed by BINLOG. To reduce the extra pressure on the database, the flush operation can be completed at the get cache data. Based on the requirements, it is also feasible to accept the extra pressure of the database if the read and write operations are fully shared.

6. the version number can be used to control high requirements on cache data consistency. That is, some coupling is introduced at the application layer, Mark is included during dB operations, and cache refresh is also a mark, in addition, the get operation compares the double version numbers to achieve data consistency. (There is a certain relationship with article 5, whether the read and write operations are completely isolated, and some methods for implementing consistency)

5. Some experiences

 

I have been engaged in redis research, usage, implementation of distributed cache, and modification based on BINLOG for more than a year. During this time, I learned a lot, the above is a note, a note for this part of work. There are more problems in the implementation process.

You must be careful about the research and related work, and be sure to have a thorough understanding of the relevant details. Otherwise, this small problem may lead to the unavailability of the entire solution, or even greater problems. Chain reaction!

Next I have time to write an article about bloomfilter and D-Left_BloomFilter, which shows that I only have more motivation to complete it. D-Left_BloomFilter is implemented in the project, but there is no relevant implementation on the Internet. After optimization, some small records will be made on the blog.

 

The column reposted from haier_jiang is implemented based on the redis distributed cache.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.