MySpace DataRelay distributed data cache source code analysis

Source: Internet
Author: User
Tags net serialization node server

MySpace. one of the most successful cases of the NET architecture in the Internet platform, among which the very important system datarelay distributed data cache is also open-source, DataRelay provides a high-performance cache system and message processing mechanism, it also supports custom computing Component components, supports Cluster, and has a complete Replication and load balancing mechanism. All components are in the form of windows Services and can be flexibly deployed, the client communicates with the server through Socket. In addition, it can easily expand various custom components, such as Memcached, which can be used for caching, and Redis, which has become popular recently.

Although MySpace is open-source, datarelay does not have a good document to help you learn. The following describes the complete code analysis, so that you can fully understand DataRelay, a rare boutique on the. net platform.

CodePlex code: http://datarelay.codeplex.com

A presentation on MIX 10: Robots at MySpace: Massive Scaling a. NET Website with the Microsoft Robotic Studio http://ecn.channel9.msdn.com/o9/mix/10/pptx/EX04.pptx

DataRelay Architecture Analysis

This section mainly analyzes the DataRelay architecture, including the features of DataRelay, the physical deployment architecture of the system, the internal structure of the system, and the interface Analysis Based on Component specifications, this section describes in detail the architecture and implementation of the DataRelay system.

1. Basic Features of DataRelay

DataRelay is a distributed cache system designed and implemented under the. NET platform system based on various data cache functions and design concepts. It has the following features:

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> use the existing Cache solution to complete the local Cache function. The existing Berkeley DB, Memcached, and local Cache modules can all be used as plug-ins to access the system and serve as the local Cache mechanism.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> Custom serialization and deserialization interfaces reduce storage space and provide network transmission efficiency.

<! -- [If! SupportLists] --> 3) <! -- [Endif] --> simple service deployment and hot swapping of service nodes.

<! -- [If! SupportLists] --> 4) <! -- [Endif] --> the module that complies with the DataRelay Component Interface Definition supports dynamic update of server components through the unified Component Interface Management module.

<! -- [If! SupportLists] --> 5) <! -- [Endif] --> standardized component development interfaces greatly simplify component development and improve scalability.

<! -- [If! SupportLists] --> 6) <! -- [Endif] --> the network message distribution and synchronization combines the Replicated Cache and Distributed Cache modes to ensure the reliable operation of the system.

<! -- [If! SupportLists] --> 7) <! -- [Endif] --> manages asynchronous, concurrent, coordinated, and failed message processing with the Microsoft Cr component (Concurrency and Coordination Runtime, this ensures system efficiency and stability.

2. Physical architecture of DataRelay

The physical architecture 1 of DataRelay indicates the position of DataRelay in the entire website system. DataRelay is in the middle layer of the entire website system. Unlike the general middle layer design, the Web server connects to the database server and the middle layer at the same time. This design prevents single points of failure. If the Web server only connects to the middle layer, once the middle layer server is down, the entire website will not work. Instead, we use the figure design scheme. Once the middle layer server is down, the Web server can also directly access the database server, so it will not work. When the Web server requests the cached business object, it first requests the DataRelay system. If the data exists in the DataRelay system, it will be directly returned to the Web server. If the data does not exist in the DataRelay system, the system redirects the request to the database request. The request to the data first saves the data to the DataRelay system and then returns it to the Web server.


<! -- [Endif] -->

Figure 1 Website physical architecture

As shown in deployment 2 of the entire DataRelay cluster, the organization structure of the DataRelay server is defined as follows:

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> Groups

<! -- [If! SupportLists] --> l <! -- [Endif] --> different groups store different data. In the DataRelay system, you can define multiple groups and set the access mode for the groups.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> Clusters

<! -- [If! SupportLists] --> l <! -- [Endif] --> multiple clusters exist in one group. The data objects of the Cache service are allocated according to the Distributed Cache mode and the cluster address to be saved is selected.

<! -- [If! SupportLists] --> 3) <! -- [Endif] --> Servers

<! -- [If! SupportLists] --> l <! -- [Endif] --> servers in the DataRelay cluster use the Replicated Cache mode to synchronously save data between servers in each cluster.

Figure 2 DataRelay cluster deployment

In this structure, the servers in each Cluster synchronize data and save the same data backup. When the Web server requests data, it obtains the algorithm of the data server node:

Cluster Index = ObjectID % (# Cluster)

Server Node = Random (Cluster Index)

Note:ObjectID indicates the type ID of the stored data. # Cluster indicates the number of clusters in a Group.

After the Cluster Index is determined, an available node server is randomly taken from the Cluster to process data requests.

3. DataRelay internal modules

DataRelay coordinates various modules to ensure the normal operation of the system. The design of each module has its own responsibilities. DataRelay consists of three internal modules. Its main responsibilities are as follows:

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> DataRelay. Client: The interface provided by the entire system to the Client. The Client completes data operations through this interface.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> DataRelay. Server: The management component of the Server, controls the service lifecycle, and hot swapping of extended components.

Figure 3 Internal module structure of DataRelay

<! -- [If! SupportLists] --> 3) <! -- [Endif] --> DataRelay. Transports. Socket: Manage the TCP connection pool between the client and the server.

<! -- [If! SupportLists] --> 4) <! -- [Endif] --> DataRelay. Common: it encapsulates Common operations and interface definitions in the DataRelay system, including:

<! -- [If! SupportLists] --> a) <! -- [Endif] --> RelayComponent. Interface defines the DataRelay component Interface specification, which must be implemented by the extension component.

<! -- [If! SupportLists] --> B) <! -- [Endif] --> RelayMessage defines the type of message between the server and the client, which is the basis for communication across the entire system.

<! -- [If! SupportLists] --> c) <! -- [Endif] --> RelayConfiguration Schemas verifies the format of the configuration file in the system to ensure the configuration accuracy.

<! -- [If! SupportLists] --> 5) <! -- [Endif] --> DataRelay. Components: component module, including basic modules and extension modules

<! -- [If! SupportLists] --> a) <! -- [Endif] --> Storage is the place where the Cache is actually stored. There are a variety of Storage media. It uses Berkeley DB to store data persistently. It can also use memory to store the Cache for high performance. This part adopts the DataRelay component design specification, and can expand the appropriate storage component Module Based on the cached data type and data operation mode.

<! -- [If! SupportLists] --> B) <! -- [Endif] --> Forwarding: network message distribution component. This component module is the core component of DataRelay. It is responsible for RelayMessage transmission and message processing, it consists of the following core modules:

<! -- [If! SupportLists] --> l <! -- [Endif] --> Cr is an asynchronous programming component provided by Microsoft. In Forwarding, it manages asynchronous, concurrent, coordinated, and failed message processing.

<! -- [If! SupportLists] --> l <! -- [Endif] --> NodeManager manages DataRelay server nodes. Forwarding can well allocate and call nodes to distribute and synchronize network messages.

<! -- [If! SupportLists] --> l <! -- [Endif] --> PerfCounter performance counter [10] is mainly responsible for monitoring the service status of each node on the server.

<! -- [If! SupportLists] --> 6) <! -- [Endif] --> DataRelay. Logging: records DataRelay logs.

4. serialization and deserialization

DataRelay implements custom serialization and deserialization for business cache objects to improve serialization efficiency. The custom serialization data structure is very compact, as shown in figure 4, 32-bit integer (int32) only occupies 4 bytes, Boolean (bool) occupies 1 byte, a 16-bit integer array with a length of 2 (int16 [2]) occupies a total of 8 bytes, And the array length occupies 4 bytes. Each 16-digit occupies 2 bytes. It can be seen that the serialized data structure of DataRelay self-encoding is quite compact.

Figure 4 custom serialized data structure

Through the implementation of serialization and deserialization, a comparative test is conducted to include a series of systems. in Int32 data objects, use. NET serialization system. The byte stream generated by serialization is 190 KB. If custom serialization is used, only 14 KB is generated, and the byte stream is reduced by more than 85%, in addition, the serialization time is reduced by 14.4 s, and the byte stream is reduced, the network transmission volume and serialization time are shortened, and the network transmission performance is significantly improved.

5. DataRelay message (RelayMessage)

RelayMessage is the communication data basis of the DataRelay framework. It is responsible for carrying the data to be cached and interacting between the server and the client. The RelayMessage design has the following features:

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> standardize the type definition of a message, including get, update, save, and delete. As the framework expands, add extended types.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> in order to provide transmission performance and reduce the amount of network transmission, messages are serialized into Byte arrays and stored on the server. The reverse sequence is required after the client obtains the data.

<! -- [If! SupportLists] --> 3) <! -- [Endif] --> each message has a unique ID. If the ID cannot be determined to be unique, it can also be used in combination with ExtendedID.

<! -- [If! SupportLists] --> 4) <! -- [Endif] --> message TypeID. A TypeID is assigned to each type of message to locate the cached data location.

6. DataRelay Components

DataRelay is a component-based architecture. network message distribution is a component, persistent storage is a component, and memory storage is a component. In DataRelay, any function development is a component, this provides good system scalability.

Of course, the components themselves have strong autonomy. Each component can define its own configuration file and generate instances that process its own configuration information through reflection in the configuration file, for example, the Berkeley DB storage component designed by DataRelay has complicated configurations. Therefore, DataRelay separately manages the configurations of this component,

In the DataRelay Component Interface Definition, it mainly defines the interface for the component to process messages and the information of the component's own runtime. Features:

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> the service framework depends on the Component Interface to operate RelayMessage.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> a configuration file that can be customized by a component. The service framework obtains the component configuration information through reflection.

<! -- [If! SupportLists] --> 3) <! -- [Endif] --> when the component configuration file changes, the service framework automatically reads the configuration information again.

7. DataRelay component container

The DataRelay system is based on component modules and requires an environment for component operation. DataRelay provides component containers. The main responsibility of component containers is to maintain the life cycle of components, and the scheduled message is transmitted in the component. This class implements two interfaces: IRelayNode and IDataHandler.

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> IRelayNode: this interface defines the lifecycle and configuration information of component nodes in the container. Through this interface, we can obtain the current running status of each component in the container, and Related configuration information.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> IDataHandler: this interface is the interface definition for message transmission. It also needs to be integrated in the component interface definition. This interface defines message transmission throughout the system.

The server sends all received messages to the component container for message distribution. Therefore, in the design of the RelayNode class, a large number of highly concurrent messages are, it is also managed using the Cr component.

8. DataRelay network message distribution mechanism

Network message distribution is completed by the Forwarding component module in DataRelay. Forwarding is a core module of DataRelay and must be used on both the server and client. It completes the network message distribution and synchronization of the DataRelay distributed cache system. Message distribution and synchronization mechanisms are divided into two methods: Real-Time Message operation and asynchronous message operation. Real-time operations are required to obtain Cache data. asynchronous operations can be selected for updating, storing, and deleting Cache data based on business scenarios.

The Distributed Cache implemented by DataRelay in the system is a combination of Replicated Cache and Distributed Cache:

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> the Distributed Cache method is used to store cached objects in a cluster in the same group. The storage cluster location is located based on Mod calculation.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> for the Cache data distribution on nodes and machines in the same cluster in the same group, Replicated Cache is used, this means that the Cache data contained by each node in the same cluster in the same group is consistent.

The Forwarding component module processes Cache data in two aspects: obtaining data and updating data. 10 represents the logical process of saving and obtaining. Assume that the current DataRelay system has four server nodes and is divided into two clusters. In the same App group, there are two cache service object data to be processed, the IDS of the two data objects are 120 and 121, respectively. The following describes the logical process of data acquisition and storage.

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> get the cached data with the Object Id 121:

<! -- [If! SupportLists] --> a) <! -- [Endif] --> get the group name set in the configuration description of Object Id 121: App.

<! -- [If! SupportLists] --> B) <! -- [Endif] --> select Cluster Index: Calculate Cluster Index = 1 (121% 2 = 1) Based on the Cluster Index algorithm)Mod (Business Object ID, number of clusters).

<! -- [If! SupportLists] --> c) <! -- [Endif] --> randomly select a service node from the Cluster to obtain data.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> Save the cached data whose object Id is 120:

<! -- [If! SupportLists] --> a) <! -- [Endif] --> get the group name set in the configuration description of Object Id 120: App.

<! -- [If! SupportLists] --> B) <! -- [Endif] --> select Cluster Index: Calculate Cluster Index = 0 (120% 2) based on the Cluster Index algorithm ).

<! -- [If! SupportLists] --> c) <! -- [Endif] --> randomly select a server node from the Cluster to save data

<! -- [If! SupportLists] --> d) <! -- [Endif] --> the server node asynchronously sends network messages and synchronizes the cached data to other nodes in the Cluster.

Figure 10 network message distribution model

DataRelay has the following features in the Forwarding design:

<! -- [If! SupportLists] --> 1) <! -- [Endif] --> combined with the characteristics of the Replicated Cache and Distributed Cache, the data distribution and synchronization of Cache data on the cluster are well handled.

<! -- [If! SupportLists] --> 2) <! -- [Endif] --> the integration of the Cr component module enables efficient and reliable network message processing.

<! -- [If! SupportLists] --> 3) <! -- [Endif] --> through configuration, You can package messages in batches and submit messages at one time to reduce network communication.

9 DataRelay service deployment

Datarelay is also unique in service deployment. net appdomain, which can be hot-swappable. When designing this function, DataRelay system framework and component modules are used to load different AppDomains, all component Module assembly is loaded into the component container in an independent AppDomain, so that DataRelay can dynamically uninstall the AppDomain when the component dll and configuration file are added or updated, then, the new AppDomain is created, and the current component is loaded into it.

In this way, you do not need to restart the DataRelay service management component update. Online O & M is very convenient. Imagine that if dozens of Relay servers need to update components, the deployment is convenient and efficient.

Author of programmer Issue 1: Zhang Qinghua

Original article: http://www.tita.com/blog/tech/myspace-datarelay-distributed data storage source code analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.