Optimization of counting scenarios

Source: Internet
Author: User
Tags mysql insert mysql update
Preface

How can we measure your influence on public opinion when social network information surges? Count!

Counting is the most convenient and simple data mining application of the application. Taking Facebook as an example, how many times your feed is commented, how many times it is forwarded, how many times it is liked, and how many times it is browsed, how many friends do you have. It can be said that the value counted may exceed you or the content you publish.

Today, we will focus on the application of redis in counter scenarios.

Body

There must be more or less questions about counters.

Q1: where does the Count come from?

The social media content we publish is usually stored in the database, the most common is MYSQL:

Update index: insert into user_message (UID, messageid) values ('XX', 'xx ')

Update content: insert into message_2013_05 (messageid, message) values ('XX', 'xx ');

Why do we need to differentiate between indexes and content? I will not comment on it here. In this case, we need to calculate how many messages I have sent, and then select count (messageid) in the MySQL database) where uid = xxx can get the Count we want, that isIn most cases, it comes from our index data..

Q2: What are the characteristics of counting compared to other data?

1.Frequent reading and writing of a single keyIn our opinion, the total read volume may be more frequent than the content module. The addition, deletion, modification, and query of indexes will lead to frequent increase and decrease of counters. Especially when a Twitter, feed, or Weibo article is very popular, the update of a single key will become more popular.

2.Persistence requiredAll users may need to know their own counts. This data is as important as the content itself.

From the above two requirements, redis is the best choice, and hotkey replacement from MySQL update count + 1 & memcache to redis incr is much more elegant. Reduces the risk of data consistency.

Optimization ideas:

  • Independent maintenance count, from acquiring update complexity O (n) to O (1)

We know that as the number of messages in a single uid increases, and the logic complexity of Count (message_id) is O (n), the cost of getting this count increases. How can I change the result to O (1 )? In fact, it is very simple. We only need to maintain this count separately. Here is a simple example:

Suppose we have a field, we need to frequently get and update the length of this field, reference redisbook (http://www.redisbook.com /) the description of the SDS used by redis to store the key value seems to be a simple description of this matter.

"

For example,Hello WorldIt can be expressed"Hello world \ 0".

This simple string can meet requirements in most cases. However, it does not support length calculation and append operations efficiently:

  • Calculate the string length (Strlen (s)).
  • To append a string for n times, you must re-allocate the string for n times (Realloc).

Struct sdshdr {

Len = 11;

Free = 0;

Buf = "Hello world \ 0"; // The actual Buf length is Len + 1

};

PassLenAttribute,SdshdrIt can be used to calculate the length of θ (1.

"

  • Which of the following is better?

Front-end maintenance: When message_id is increased or decreased, the client submits a transaction, Index Update, and count update. The advantage is that the logic is relatively simple, that is, a single write turns into multiple writes, but the disadvantage is that it is easy to cause data inconsistency.

Backend maintenance: When the database receives a message index increase or decrease, corresponding operation logs are generated. For MySQL, BINLOG is used. We increase or decrease the index count based on MySQL insert and delete.

MySQL and MySQL triggers are not recommended for the following reasons:

1. Mysql itself may become the update bottleneck. MySQL update is actually a type of reupdate. When the buffer pool does not hit, you need to load the page with where id = xxx from the disk to the memory, this may become a bottleneck when the hot key of large concurrency is updated.

2. Efficiency of MySQL replication. MySQL replication also becomes a bottleneck when the above data is frequently written and updated. When you send a feed, you will find that the data is updated only after one minute of counting, this must be intolerable.

3. MySQL trigger-related bugs:

The following is a synchronization interruption problem caused by a trigger in our online environment: http://bugs.mysql.com/bug.php? Id = 53079

Backend Maintenance

Detailed visibility @ jackbillow using redis method in http://www.programmer.com.cn/14577/

In fact, the choice of frontend maintenance or backend maintenance depends on the maturity of your company's frontend and backend technologies. If the backend technologies are mature enough, therefore, the backend maintenance saves a lot of development time and reduces the risk of data consistency. On the contrary, it can be maintained at the front end.

 

  • Redis memory capacity optimization

The redis kV data structure is as follows:

For a simple K-V scenario, the best optimization idea is to remove the dict (16 bytes) and redis object (16 bytes) created for universality in the figure, the extra SDS bytes are removed. Assume that your key occupies 10 bytes, and the value occupies 4 bytes using INT:

The previously occupied memory is removed:

Memory size of string type = number of key values * (dictentry size 16 bytes + redisobject size 16 bytes + SDS size containing key + SDS size containing value) + number of buckets * 4

After removal:

Memory size of string type =Number of key values *(Key size: 10 bytes + 4 bytes of value) + number of buckets * 4

When you have hundreds of millions of keys, the saved content capacity will be considerable. Of course, the actual optimization scenario will certainly be more complex than the several formulas listed here, and there will be loss.

  • Imagination in future counting scenarios

Redis data cannot smooth the old data over to the disk, and cannot be applied to high-performance devices such as SSD.

Http://nosql.mypopescu.com/post/3745773666/redis-and-hbase-for-mozilla-grouperfish-storage this article gives a good way of thinking.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.