In 2 b Enterprise Services, cloud computing, mobile Internet, professional cloud platform services, distributed technology for the support platform for the normal operation of key technologies. From the perspective of commercial profits and operational costs, every effort to squeeze dry the server's performance to a large extent affect the business value of the site, so the pursuit of performance, distributed architecture system is a very important consideration indicators; From the perspective of the user, especially the enterprise users as the main source of revenue, Ensuring the correctness of business processes and uninterrupted service (high availability) is an important source of support for user confidence. High performance, high availability and correctness become the key technical factor of distributed architecture system.
For the website product's architecture system, may choose the open source or the independent research and development; If you embrace open source, then the first development cycle may shorten about 1/3, but pay the price is to thoroughly understanding these open source solution for me to use, if necessary to modify the source code, which is difficult to estimate the cost, There is also a risk that if the choice of open source solution in the future will find some features are not satisfied, it will have to be overturned; if independent research and development, the project development cycle will be extended, and seems to have repeated the suspicion of making wheels. For open source tools, important open source tools include Zookeeper,solr,openfire,redis (a distributed NoSQL database cluster is developed on the basis of it), Nginx,haproxy,keepalived,mysql ( The SHARDDB developed on the basis of it; distributed middleware includes Distributed File System (CLOUDFS), distributed Instant Messaging (CLOUDIM), distributed Message Queuing (CLOUDMQ), distributed task Scheduling (Cloudjob), Distributed retrieval Platform (CLOUDINDEX), Distributed NoSQL Database cluster (Cloudredis).
Distributed File System (CLOUDFS): Repeated research Hdfs,tfs,gridfs (MONGODB), FASTDFS based on the development of distributed File system. The storage schema is similar to the FASTDFS, including data nodes (node), data sets (group), partition (Region) level three, data nodes (node) as final physical storage nodes, and different nodes of the same data set (group) for real-time data synchronization. That is, the same group of node data in the final agreement;? files within the same partition (Region) are automatically weighed, that is, files of the same content will only have one copy in the same partition (Region). CLOUDFS uses Message Queuing (CLOUDMQ) to synchronize data sequences between the same set of nodes to ensure final consistency, which is similar to Fastdfs binlog bidirectional Synchronization. Cloudfs uses zookeeper to monitor storage node state maintenance and changes, uses Cloudredis to store file index information, and uses Nginx as a file download server. Excellent performance, a single server online monitoring of data, file upload ioPS can reach 1200, upload rate of 25MBps, the replication file TPs reached more than 9000, create and delete the file of TPs reached more than 30000. Support the dynamic addition and exit of storage nodes, support online expansion, data backup, node dynamic load balancing, the maximum theory can support more than hundreds of billions of documents, supporting more than 3000 knots.
Distributed Instant Messaging (Cloudim): platform-level middleware that hosts push services. Based on OpenFire, the new transformation has been done in addition to maintaining the basic connection and communication between nodes. Cloudim provides a convenient API for Third-party applications that developers can use Cloudim APIs to easily implement instant messaging (for real-time functions such as instant messaging, collaborative office, instant reminders) without considering long connection retention, line failure, server load, User status change notification and other complex requirements. The Cloudim has excellent performance and provides all of the API TPs between 15000-35000. A single server on-line monitoring can keep the number of connections more than 70,000, message latency less than 50MS, the cluster can support thousands of nodes. Support node dynamic join or exit, Support online expansion.
Distributed Message Queuing (CLOUDMQ): A distributed Message Queuing (MQ) implementation, designed to refer to the Apache Open source project Kafka and Taobao Open source project Metamorphosis, inherited the distributed and high-performance high throughput characteristics. CLOUDMQ implementation is simple, no broker design, can maintain message order, the use of pure pull plus notification mechanism to almost avoid consumption delay, the use of multiple partitioning mechanism to ensure the increase in system throughput, the maximum number of messages is billions of levels, and also support delay messages (such as 100 changes in 5 minutes of the event , push only one final result and drastically reduce duplicate messages. The production message TPS is between 35000-45000 and the TPS for the consumer message is around 40000.
Distributed task Scheduling (cloudjob): A scheduling framework that assigns tasks to multiple nodes according to specific rules (load balancing, geographic principles, etc.). Support Crontab standard task repetition and timing strategy, support mass timing task (TENS), ensure the real-time and order of task processing, support real-time Query task status or abort task. Task scheduling throughput up to 20,000 per second.
Distributed retrieval Platform (CLOUDINDEX): A massive real-time retrieval system, which carries the task of word segmentation retrieval. The main use of SOLR and CLOUDMQ implementation, using CLOUDMQ to ensure the update performance and to ensure that the cluster node to obtain the same update sequence, using SOLR to achieve segmentation and real-time retrieval. Support index fragmentation (including hash method and digital interval method), custom participle, node load balancing. Index read-write latency is less than 200MS, and the size of a single index can reach hundreds of billions of levels.
Distributed NoSQL Database Cluster (CLOUDREDIS): A database cluster based on Redis, all data structures compatible with Redis and most of the command sets. The client uses the consistency hash algorithm to execute the request according to the hash request of the key to the different nodes in the cluster, and uses the Binlog operation sequence synchronization method to ensure the data consistency of different service nodes; When the service node changes, the client proactively discovers the node change, recalculates the hash, The other service nodes in the cluster are aware of the node change and ensure that the binlog has been consumed before continuing to provide the update service, thus ensuring the consistency of data in the case of node change. Excellent performance, non-batch operation read and write commands can achieve more than 100,000 per second processing speed, beyond the native Redis, can support 1 billion level or higher data storage.