A micro-blogging system that easily creates tens data volumes with NoSQL

Source: Internet
Author: User
Tags message queue unique id redis cluster

Original: http://www.cnblogs.com/imxiu/p/3505213.html

In fact, Weibo is a relatively simple structure, but the amount of data is a very large product. The headline is that the Tens data volume is not 10 million microblogging information, but tens subscription relationship between the release. Before reading this article, most people have seen Sina's speech at the Weibo development Conference of Yang Weihua Daniel. I'm not a repeat of this machine, pick the key to tell you.

We all know that the difficulty of Weibo is the star membership issues, what is the Star membership issue, is Andy Lau to our this open micro bo, he has millions of of fans subscribers, he sent a microblogging information, it would be a moment to publish the micro-blog information to millions of of the fans, if the dawn, Aaron Kwok and other four kings have come to open Weibo, The station is not dead. So this is the message queue. In my architecture there is an asynchronous publish cluster, and publish's tasks go to zeromq queue Read queue. ZEROMQ is the fastest-growing source of information that is currently known. Specific about ZEROMQ can own Google. One problem with ZEROMQ is that data cannot be persisted, and this is done on its own. Back to that topic, the star members ' fans are graded as "active". "Activity" is based on the frequency of landing, time, the release of micro-blog and other factors are broadly divided into hardcore fans, indifferent, half-dead three categories divided into different distribution cluster. Hardcore fan type of asynchronous publishing cluster, publishing speed is certainly the fastest. The information for Weibo is saved to MySQL using the handler socket. This information ID, which is a 64-bit integer unique ID that is spliced with a rdtsc+2-bit random integer, prevents the problem of multi-server ID consistency that occurs with the self-increment ID. In publish, the cluster simply sends the ID of the microblog message to the subscriber of Redis. So this data is very fast. And only the ID is stored in the list of subscribers. The memory occupancy rate is not very high.

Let me show you my MySQL and REDIS data structures.

Another important role in my structure is "key GPS Server" (abbreviated as: KGS) In short, this is the central Index server for distributed data storage. All data are stored and retrieved through KGS. KGS supports multiple servers, multiple backup storage rooms. KGS is a socket server that is stored in the hash db of Tokyo cabinet. Record the correspondence between key and server. Kgs's task is to tell the key which servers to store on, or tell the key which servers are stored on, and do not do other services. This greatly reduces the pressure of Kgs.

Again, Redis cluster, Redis is running in pure memory mode, turning off the hot spare (Redis is not so good). I wrote a backend server. The backend socket process runs on each machine running Redis, and the backend process is stored as a hash db of TC. Backs up Redis data for the current server. When the Redis restarts, all data is loaded from the native Bakcend db. Redis clusters are distributed using user-level segmentation

Now it's time to MySQL, in this architecture, basically eliminate the cache side of this side of the problem. Because every service in this cluster is running at high speed. The only cache is the Eaccelerator local cache on the PHP side. Eaccelerator is based on shared memory, which is much faster than the socket type-based cache. Eaccelerator caches the user top N's microblog information and results from the KGS query. See here someone asked, you put the user information and microblogging information in MySQL, how can not use the cache. Hey, because I used the handler socket. HS is a MySQL plugin written by Little Japan. HS avoids the MySQL communication protocol and reads the MySQL engine directly. In multi-core, large memory, INNODB engine environment, performance straight ultra-memcached. HS can read and write MySQL engine directly in Key-value mode

Summarize

Google's chief scientist has spoken of a word that is a big complex system that should be broken down into a lot of small services. My architecture is also a small cluster to work together to deal with big data volume release data. Some people why not use MongoDB, because MongoDB is a popular distributed NoSQL db, we have our own Key distribution strategy, not very suitable for mongodb. Students who do not understand the storage relationship of Redis can refer to Retwis first, Retwis is a simple microblog implemented with pure Redis.

The specific structure diagram, flowchart, ppt file.   Please download the attachment to read. http://code.google.com/p/php-tokyocabinet/downloads/detail?name=micro-blog-qiye.tar.bz2&can=2&q=# Makechanges

My qq:531020471 mail:lijinxing#gmail.com.

Two related items: Retwis-py RETWIS-RB (Project with Python and Ruby to implement this idea)

From:

Http://www.oschina.net/question/12_36573?sort=default&p=1

Http://bbs.chinaunix.net/thread-1835220-1-1.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.