Next, I will discuss the restructuring of the enterprise's core business data based on my recent thoughts, and work with memcached to reconstruct the existing architecture. My opinion in this article is purely a family statement, which may be subject to a narrow vision and a low degree of business. You are welcome to discuss it together, and you are also welcome to make a picture!
Background
The company is engaged in the research and development of GPRS car terminal products. In coordination, software is mainly used to develop vehicle management information systems to provide comprehensive services for terminal vehicles.
Speaking of this, I think everyone should understand that the core here is vehicle. Most of the functions are centered around vehicles. It is the main value of the system to provide effective management and control of vehicles for manufacturers who purchase automotive terminal devices.
To meet this demand, we need such a system for the Service Department of the vehicle factory, in order to improve their office efficiency when handling vehicle repairs (I am currently involved in the development of the system, such as the system interaction model ).
The most important thing is the service process, the owner holds the mobile phone terminal to provide fault report information-> create a task for the Web terminal-> dispatch the task on the Web terminal-> the maintenance personnel have another system that can work together at any time-> the owner you can also keep track of service progress at any time, and evaluate the service. We need to [service vehicles] to provide repair services for [engineering vehicles] (both vehicles will be equipped with our own vehicle terminals, however, it is obvious that maintenance and management of both types of vehicles are undertaken by other systems, and they do not belong to the same product line, different service vehicles will also be in different product lines, and engineering vehicles of different manufacturers will also be in different databases ).
What's the matter?
To put it bluntly, the management of service processes is not tricky. The difficulty is that the vehicle, location, and other related information are obtained. For the current access method, see my previous blog. Some people think that as long as there is interaction, Web Service can do everything. Yes. In some cases, when you need to get services from other systems, you can use web services. However, I think this is not a panacea, especially if the foundation of the system and the acquisition of core data depend on it, the efficiency of the entire system can be imagined. Because the core data is similar to the cornerstone of the system, you can never rely heavily on Web Services. It brings about not only performance problems, but also data synchronization and security issues. If you find that sometimes you have to build a system like this, I think this is filled with the "bad taste" of design ".
Independent political or incomplete
When the business develops to a certain extent, I think restructuring is inevitable to some extent-the company's protocol parsing and data processing functions for various business systems all come down to a unified platform; in the past, the registration of Automotive devices by various business systems was also managed by a dedicated system in a unified manner. To some extent, it is a trend, but this is not an absolute direction. However, we have to say that correct adjustments will help to make the structure clearer.
Disguised thinking of Database Design-relationship-oriented and object-oriented
I have read a lot about database design experience before. Relational databases have been around for a long time and are self-evident. Many experienced developers or designers often feel this way-you only need to tell me what you need and what fields are in the table as soon as I turn around it? Which table's primary key is used as the foreign key of another table. This is actually a certain degree of abstraction. The business abstraction is mapped to the relational table. There are about 50 or 60 fields in our existing vehicle table, and dozens of blank fields are reserved, so that new business needs may be possible in the future (in fact, the vehicle table design for each product line is similar ). No business is involved. I logically divide the vehicle-related attributes into three categories:
(1) Basic Attributes of vehicles
(2) extended attributes of vehicles
(3) unknown business attributes
Into these three categories, the structure becomes very clear.
Basic vehicle attributes: The most important thing to do with system interaction is that each platform has a common attribute.
Vehicle extension attributes: To be honest, this is almost a common attribute of each platform. However, the interaction with the system is normal, and the possibility of modification is much less than the possibility of reading is much less than the basic attribute of the vehicle.
Unknown business attribute: This applies to the reserved blank fields we mentioned above. These blank fields may come from business expansion or system implementation needs (this is also the main difference among various business systems ).
I think that for large objects with heavyweight attributes such as vehicles, all attributes have a single pot, which leads to performance loss. Here we will talk about a major principle of object-oriented design-combination is better than inheritance. I personally think that the design of the vehicle table should adopt a combination based on the above classification, which is more lightweight and flexible. That is to say, similar vehicle tables can be split into three tables. To cope with the attribute classification above, we call it table 1, table 2, and table 3. Since Table 1 and Table 2 are common attributes of vehicles on each platform, the structure of these tables is theoretically consistent across all platforms. Table 3 generally comes from various business platforms. For details, see:
Benefits?
(1) optimize database operations
(2) bring a clear cache structure for memcached
(3) Facilitate data interaction with other platforms
(4) It is helpful for memcached to implement different cache failure policies
I will analyze the benefits of this split one by one:
(1)Optimize database operationsThe above classification reflects the read/write frequency of different fields in the database, which can achieve read/write separation. Obviously, Table 1 interacts most frequently with applications, especially when some operations lock the table. This isolation optimizes database access to a certain extent.
(2)Providing a clear cache structure for memcached, Refer to the following structure:
Physical multi-database multi-table, logical classification single table
Distributed isolation ensures the transparency of external access. Logically, a lot of data is a set in the memory. For different business platforms, data access is likely to coexist in isolation and interaction, and distributed provides such a fit.
Isolation: First, data is physically isolated. Although it is a big data set logically, it is completely isolated by the uniqueness of the basekey used to identify the platform to which it belongs.
Interaction: Logically, it is a big data set, which brings us a physical advantage of convenient interaction. So how should we interact? Here they are connected with basekey. In normal cases, users access their own data (using their own basekey to access their own data sets) without interfering with each other. If you need your system to support your business, or a unified service platform that I am currently working on. If I obtain the basekey of your system through authorization, what is the difference between your data and my data? With basekey, data is shared.
(3)Facilitate data interaction with other platforms, Which has been explained in the interaction in (2.
In fact, what we did for the vehicle table above is the vertical cutting technology. Horizontal cutting is a special feature of our business-trajectory. The advantage of horizontal and vertical cutting for data tables is to make the table more lightweight. Vertical cutting is beneficial for read/write splitting, while horizontal cutting is beneficial for optimizing queries (sometimes the query requires full table scanning, which greatly reduces the pressure on the database, we can also imagine the pressure to upload a track table with longitude and latitude data every dozens of seconds ).
(4)It facilitates memcached to implement different cache failure policies, I will talk about it below!
Notes
The three split vehicle information tables that can only be blamed for the same database have a common generated guid as their primary key to establish association. You don't have to worry about data insertion. Theoretically, table 1 is an inherent attribute of a vehicle. It is used to perform the insert operation when the vehicle is registered, in this case, only table 1 needs to be involved (in this case, the guid used to identify the vehicle in Table 1 has been generated). Data in Table 2 may be inserted into the relevant supplementary information after registration, the same is true for table 3.
Splitting data involves maintaining consistency. Obviously, we cannot allow the vehicle data in Table 1 and Table 2 to be deleted, and the data in Table 3 has such a tragedy. In fact, for the base table of the core business such as the vehicle table, the possibility of deletion is not very high (it is associated with this too much business information ). If you really want to delete a piece of data (such as the so-called car stop or stop the service), you can only perform logical deletion (using a specific ID field ), data inconsistency is not allowed.
Memcache Usage Analysis
You may have discovered that two types of tables are most frequently accessed in the system: (1) Relational Tables (such as the Relational Tables of vehicles and institutions); (2) entity table (as we mentioned here, vehicle table, trajectory table, etc ). Based on the potential relationship between the frequency of access, we believe that the frequency of access to IDS is greater than the frequency of access to a single entity table. [In fact, you will find that, your access to an object table is pulled from a relational table ]. Therefore, for Relational Tables, we can use the cache mechanism of vector cache to cache the Id set of Relational Tables. Next, we will create a row cache for different side policies based on the vehicle table above (the vectorcache and row cache mechanisms can also be applied to the track table ). Of course, a typical case of memory-based Cache is the open-source project memcached. Our vector cache and row cache have a hit rate of almost 100,000 for tens of thousands or even hundreds of thousands of vehicles in the zone!
According to the above analysis, we can simply analyze the followingCache type:
(1) relationship table (for example, the vehicle-institution relationship table and the Organization-user relationship table)
(2) Common core business data (table 1, table 2, and trajectory data analyzed above)
(3) Business query (select vi_sim, vi_terminal, vi_guidfrom vehicleinfo where vi_type = 2)
For their keys, refer to the following generation mechanism (here the key analysis corresponds to the above cache category ):
First, assume that each platform has an identifier of its own platform, which is simply called basekey [generated based on a certain algorithm]
(1) view the link type
1. [one-to-one] There is no need to create a separate relationship
2. [One-to-multiple]-key: MD5 (basekey _ the ID of one party); value: the Id set of multiple parties
3. [many-to-many]-key: MD5 (basekey _ table name); Value: link set
(2) Key: MD5 (basekey _ primary key); Value: a data record corresponding to the primary key
(3) Key: MD5 (basekey _ Business SQL); Value: business data set
NOTE: For the third key, too many keys are generated to avoid similar logic. For example, for an SQL statement, you only need to find the terminal number of a car. For another SQL statement, you only need to find the SIM card number of a car. You can refer to the ORM method to use a unified select * operation for these operations and encapsulate them into an object. Because the data comes from the memory, the data size and data access speed can be balanced with the traditional method of database access.
Cache failure policy analysis-another benefit of table sharding
The table 1/2/3 mentioned below refers to the vehicle information table split above.
For Table 1, the basic attributes of a vehicle may be frequently added, modified, or queried (the deletion is not discussed here, because I think logical deletion is more appropriate ). This operation determines the "strong consistency" between the memory data and the data in the database ". Therefore, in this case, the cache invalidation policy to be adopted is -- force invalidation. For example, when performing the update operation, force the key-value pair of the related select table to expire directly. In the next query from the database, add it to memcache again.
For Table 2 and extended attributes of vehicles, this table may have more select and fewer update operations, in this way, the cache invalidation time of the table in the memory can be extended compared to Table 1 (as I have explained above, although Table 2 stores the attributes of vehicles, but it is usually maintained to the database later, and the query operations on the table are not so frequent ).
For Table 3, its cache policy almost reached the level of setting a long period of automatic expiration.
I have the same concerns with you-Security
Sometimes the system environment requires you to consider whether our data should be within the boundaries we have set. Here, factors of business nature and industry nature may be significant. You have to consider whether you can hold things out of your full control! I personally think that our tracks and vehicle-related data can be held within the company as long as their collective boundaries can be defined. Some information does not work if you lose the scenario and relationship (unless you get everything done ). In step 2, all the information comes from the service data provided by the company to the customer. This is a company's resource. The company uses ten resource pools or a large resource pool to store data. This is transparent to customers, and everything aims to provide correct data to customers.
Are you always talking about security? The data stored in memcache is only a backup in your real table, or a snapshot. Besides, I didn't allow you to use the same basekey for the above three forms of cache. However, the authorization design must be carefully considered!
Below I have conceived the architecture prototype of the entire system:
Note:
Node 1-node N: A single node of the deployed distributed memcache;
[Business DB1-Business DBN]: Is the database of each business platform;
[Memcache server]: serves as an external access point for all nodes to shield physical deployment details;
[Listener/Interactive Server]: it has two responsibilities: First, it uses an approximate listening port (which can be simply understood as polling), and keeps hit memcache server, key Required for interaction by other systems. If the business data for interaction does not exist, the key is crawled from the specific business database and written to the memcache server. Second, as a volatile storage device, the memory is not fully trustworthy. If memcache server is found to be down, it takes over the responsibilities of the interactive system for data access and directly provides data access. Note that the server has certain permissions on all data, for example, only querying is enabled, do not grant deletion permissions (depending on the role of the account you used to access your database for the server );
[Authentication/authorization server]: this function is used when systems need to interact with each other. It grants an access token (the basekey mentioned above) to systems that need to obtain business data from other platforms );
[Web/Application Server]: As the memcache client, there are two data access scenarios: (1) access to the system data, first access memcacheserver, if hit, directly obtain data, if you cannot hit (or perform similar insert or update operations), you can directly operate on the database. If you need to delete the inconsistent cache in the memcache server, delete it, (2) access to other system data, which follows the red line in the figure. First, request access token from the authentication/authorization server. After authorization, you can directly access the memcache server, the access token obtained can access the data of a specific key. If the key is not hit (in many cases, the memcache server is down, because the primary role of the listener/Interactive Server is to ensure that the specific key-value of the memcache server is normal ). Of course, whatever the cause is, when data cannot be obtained, we can directly request data from the listener/interactive server through the accesstoken.
[Mobile Terminal (Supplement)]: as the figure is not very good, we will briefly talk about the access to services from mobile terminals. For mobile terminals, http get or Web Service is still required. Therefore, we need to publish all the web services used to the interactive server, which registers all services. In this case, the access method of the mobile terminal is almost the same as that of (2) in Web/Application Server. Authorization is performed first, but it directly requests the listener/Interactive Server, it is up to you to access memcached or a separate database on each business platform.
In fact, the model diagram only distinguishes between logic and responsibility. In actual situations, some devices can be combined to consider economic factors. For example, business dB can be merged with node, and the listening/interaction server and the earnest/authorized server can be built on the same physical machine.
Reconstruction cost-complexity and workload
Unless it is determined that no interactive service is provided for private data, there is nothing unacceptable for independent governance. However, when you generate such a business, it is obvious that restructuring the core data of the platform can provide a more convenient way to capture information for such businesses. This restructuring will be a huge workload. Obviously, the only method of implementation can only be used in parallel in two ways, and is gradually replaced.
Summary
Taking some specific data tables of the company's business as an example, this article shows how to reconstruct the data based on it and explains the benefits of restructuring the data structure, in combination with memcache, this article describes how the reconstructed data structure provides interactive services. Next I will talk about the three-tier architecture of the modified version (consider adding a data adaptation layer or some configurable routing form ), it can block details from multiple data storage forms (memory distributed, DB, nosql) and access data with a unified API.