This article aims to describe the decision-making process of the architecture design by analyzing the case of the architecture design process of the highway network toll operation management platform in a province.
1. Business background
Highways in a province are divided into nearly road sections. Different road sections belong to different companies for construction and operation. As a result, when vehicles cross different road sections, they need to stop the toll station for payment and card replacement, this reduces the vehicle traffic efficiency of highways.
With the development of information technology, the conditions for Internet-based highway toll collection in the province have been mature. After the transformation, vehicles are driving on the highway, and cards are obtained at the time of departure at the high speed, you only need to pay for the whole process when the destination is high-speed. With the advancement of information technology, in the future, vehicles will be charged only once for highway traffic nationwide.
In order to adapt to the transformation of the province's online charging system, it is urgent to focus on the flow of paid transactions in all road sections, search for high-definition bayonet flows and images, and re-split transaction flows, video surveillance and anti-evasion inspection of high-speed vehicles in the province. The greatest improvement compared with the previous one is that data is distributed across all road sections of the system, centralized to the provincial center, and provides information sharing query and statistical analysis reports.
2. Key requirements
Shows the data flow between different levels of systems based on business needs:
1) The provincial road network has about 100 road section centers. Each road section center corresponds to several toll stations ranging from dozens to dozens. Each toll station has an average of four entrance lanes and two exit lanes, in total, there are about 2000 entrance lanes and 1000 exit lanes in the province.
2) When a vehicle enters the toll station at a high speed, the incoming traffic flow and high-definition bayonet information (including images) are generated. When the vehicle is paid at the exit of the toll station, the outgoing traffic flow is generated. The status information of the devices in each lane is regularly sent to the center of the road section.
3) data of each road section center should be collected to the provincial operation management platform in a timely manner. When a vehicle pays a fee at the exit of the toll station, the toll station system needs to go to the provincial operation platform to query the entrance flow data of the vehicle and the high-definition snap-in flow data. If necessary, You need to retrieve the high-definition snap-in image for visual comparison.
4) The Provincial toll settlement center splits the fee based on the transaction flow provided by the road section center and delivers the splitting result to the provincial operation platform. The provincial operation platform then delivers the splitting results to all road sections and centers.
5) The Provincial operation platform issues basic rate, device control instructions, and operation parameter information to the road section center. The road section center is then delivered to the toll station step by step.
The system has the following features:
1) The requirements for real-time performance are not high in most scenarios. Only the transaction flow at the query entry needs to respond within 1 second.
2) The system has low concurrent access traffic. The number of common users is less than 2000. The number of concurrent queries on the application interface is less than 500. (Without considering the public applications of mobile apps)
3) There are few complex business logic, and system requirements are concentrated on querying and analyzing data and managing information. Only transaction flow splitting rules are complex.
4) Querying the incoming transaction flow and querying the high-definition bayonet flow requires high availability. Otherwise, the toll station program may not be able to calculate and process the charges for vehicles. manual intervention will greatly reduce the charging efficiency, it is easy to cause traffic jams.
5) The transaction flow and HD bayonet flow data volume is large, with an average of million and more than million per day. In terms of business management, it is required that the transaction flow be archived for more than one year, and the flow of HD card ports be kept for more than three months. The size of each HD bayonet image is 200 K, and the centralized storage on the provincial platform will be huge.
3. Major decisions
To make major decisions in the system design process, you must make a clear analysis of the main functional requirements and quality requirements, and combine the existing system status quo to make economic costs, technical team quality, and network environment, balance between business management constraints and schedule requirements.
During the implementation of the system, the greatest practical constraint is that the construction period is busy. Many project implementers are involved, and the development and testing time is compressed to within two months, the progress must be well grasped and cannot affect the joint debugging tests of other units. In architecture design, it is necessary to combine the existing knowledge and skills of the technical team. The new technology pre-development requires costs and introduces risks. Therefore, you must be cautious.
3.1 High Performance 3.1.1 divide different databases by business and partition large tables
The entire system has many business functions and is divided into more than 15 subsystems. Put information such as public users, organizations, roads, toll stations, and basic dictionaries in a database (schema). separate databases that cannot be shared by other business systems. A database is dedicated to key business functions. Allocate an independent database user to each business system for ease of use and management.
Make full use of the Partition Table Mechanism of the Oracle database to partition large tables such as entry flow, exit flow, and HD bayonet flow by date (for example, by 3 days and 10 days, ensure that the data volume in each partition is less than 0.1 billion. In this way, when querying a single table in the same partition, as long as the database index is properly established and the server resources are sufficient, Oracle can ensure response within 1 second. At the same time, you can directly Delete partition files when clearing historical records, which greatly improves the performance of deleting large amounts of data.
3.1.2 use a memory database to improve query performance
Although the database has been optimized, the entrance flow and high cleaning flow must be queried for the paid software when the vehicle is paying for the egress. If the number of concurrent queries increases, the database may not be able to respond within 1 second when processing concurrent queries. The response performance of this function is a pain point of the user, which directly determines that the project is not recognized.
Therefore, you must consider introducing a nosql database to load the flow data of the last three days into the memory, process concurrent query requests from the toll station, and ensure that the query is within 10 ~ Completed within 100 milliseconds.
In our business scenarios, data has been persisted in Oracle. The introduction of distributed cache or nosql products is just to directly retrieve data from the memory to improve performance. As a result, I have a general understanding of several open-source distributed cache and memory database products.
Memcached is suitable for Distributed caching, but it does not provide retrieval and query functions. It stores all data in the memory and does not implement the file persistence mechanism. During cluster deployment, nodes do not communicate with each other. Once a node goes down, all data on the node is lost. To implement the query and retrieval function, application development requires a lot of work and is not suitable for our scenarios.
Both redis and MongoDB provide a persistent file-to-file mechanism, and redis has better query performance than MongoDB. However, MongoDB provides a lot of convenient functions similar to SQL statement query on the query API. At the same time, it can automatically implement data sharding and disaster transfer during cluster deployment, application Development requires less work. Two-phase comparison: individuals tend to introduce MongoDB. However, due to a short schedule, there is no human or time for in-depth pre-research. If the research is not in-depth enough, it will increase the risk.
Later, VMWare recommended a memory database product called gemfire, saying it had achieved good results in 12306 website applications. So they brainwashed us during pre-sales and demonstrated some successful cases, which increased our confidence in this product. What impressed me most is that this product is positioned as a memory computing platform solution that can be directly computed on data nodes (similar to pre-defined functions in the database and called in the application ), merge the computing results and return them to the application. This is what distributed computing means.
Because commercial products can provide reliability assurance and save our manpower investment, the project budget is sufficient. The subsequent implementation proves that this product has good performance and stability, and can reduce the query time to dozens of milliseconds. Of course, it also consumes memory and CPU (three servers, 10 Gb memory for each server ).
3.1.3 purchase high-configuration hardware network equipment
Due to the large scale of servers, Internet applications must be de-IOE-optimized to save costs, and a large number of low-cost servers and open-source software are used. Enterprise applications are small in size. In contrast, server resources are not that large, and high-configuration hardware costs are sufficient for enterprises. This project uses IBM server, emcstorage, Cisco switch, and F5 hardware Load balancer.
This way of improving performance through hardware can save effort when cost permits. With dedicated storage, it has already implemented a redundancy mechanism to provide a disaster recovery solution. At the application level, you don't have to worry about data disaster recovery. If the hard disk breaks down, just change the new hard disk and plug it in, because raid10/5 ensures that data is not lost. After several months of study, we finally dared to use the HDFS in the project. The project has ended.
3.2 reliability and high availability 3.2.1 use commercial middleware to improve reliability
The Oracle Database RAC cluster provides high Database Availability.
Use IBM Websphere to deploy important business systems, such as the interface query subsystem. It provides soft load balancing and cluster functions. It can centrally manage and monitor application servers through the Web interface. In this way, the high availability of the application layer is also solved. (In our important business scenarios, interface queries are stateless, and sessions do not need to be copied, that is, they can be pasted ).
3.2.2 use message-oriented middleware for communication
In business scenarios, the provincial operation platform also needs to issue control commands, operation parameters, and other information to each road section. At the same time, a large amount of flow data uploaded by road sections can be buffered to the Message Server first, then the splitting sub-system can directly read data from the Message Server, multi-thread processing, or multiple copies deployed in a single thread, the deployment mode is flexible. The introduction of message-oriented middleware can also reduce the coupling between subsystems.
The open-source products include activemq and rabbitmq. for commercial use, I also learned about IBM MQ. What is surprising is that the concurrency Performance Test Indicators of ibm mq are very poor, which is weaker than the previous two.
Finally, we adopted the open-source rabbitmq, focusing on its concurrent performance and support for high availability and Disaster Tolerance. Currently, the server is deployed on the provincial operation platform, and only clients are used to connect to the road section. In this way, considering deployment and O & M costs, the technical level of partners in the road section is limited, and scattered deployment is not conducive to fault diagnosis.
3.3 scalability 3.3.1 divided into multiple subsystems for decoupling
Generally, systems of larger points need to be divided into multiple subsystems, which are easy for concurrent development by teams and Fast deployment and upgrade. For core business systems, we can deploy more instances in clusters to provide more hardware resources. Deploy unimportant business systems, configuration management, or even a single instance directly to Tomcat. At the same time, some background services are a Java Process and related to databases. Only one instance can be deployed during the design.
Subsystems and subsystems can be decoupled through message servers. A sub-system goes down without affecting the operation of another sub-system.
Throughout the implementation process of the system, we introduced portal portals, unified user management, and single point of login for page-level integration between different web pages.
3.4 O & M 3.4.1 The infrastructure adopts virtualization technology to facilitate O & M management
Drawing on the company's successful private cloud construction experience (based on VMWare's product components), we believe that we can make full use of resources by virtualizing servers, reducing costs, and facilitating centralized O & M management.
During the implementation process, we virtualized three luxury configuration servers to nearly servers, and left two high-configuration servers for Oracle clusters.
If you do not perform virtualization, you may need more servers and deploy multiple applications on one server at the same time, which makes O & M monitoring very troublesome. After virtualization, server resources can be dynamically allocated based on business scenarios, which is also a feature of cloud computers.
It is worth noting that when deploying cluster applications on virtual machines, ensure that nodes are distributed to different physical machines to ensure high availability.
3.5 Comprehensive considerations 3.5.1 HD images are not centrally stored on provincial platforms
As mentioned above, large data volumes of HD images consume storage capacity if they are centrally stored on provincial platforms. Using dedicated storage to store these images is a waste. If images are transmitted from road sections to provincial platforms, a large amount of communication bandwidth will be wasted.
In actual business scenarios, the toll station calls the image query interface only in some specific scenarios, with a low access volume. That is to say, only a very small number of images need to be read. Therefore, we defined the HTTP interface specification, which is implemented by the service end provided by the road section and called by the provincial platform. Because the HTTP interface is simple and easy to implement, the workload for each section is very small.
That is to say, we sacrifice the image query performance and save bandwidth and storage.
3.5.2 extract data directly from the link intermediate Database
The interface methods of existing systems are implemented through the intermediate database. Considering the introduction of new interface methods and too many road sections, the technical capabilities and integration workload of the partner companies will be greatly increased. So we followed the previous method and defined the intermediate table specification. Each section is responsible for providing an intermediate database and writing data to it.
Considering that various types of databases exist in the existing system, kettle is used for data migration. At the same time, the data size extracted from the database is large, and different service table structures. If the interface program is developed for implementation, the workload will be large and the stability will be hard to guarantee.
To ensure data extraction efficiency, each 10 road sections are divided into one group and a kettle extraction program is deployed for migration.
That is to say, considering the actual technical conditions, we can use the simplest and most suitable method to implement business functions. After implementation, this data extraction method saves a lot of time for the project.
3.5.3 sub-systems cache user information
User and organization information is stored in the public database. However, user roles, accessible resources, resource URLs, and other information are different in each subsystem. To improve performance, each subsystem caches user and organization information to the memory. for authorization and access control, the code level (class file or jar package) is used).
In fact, if the number of subsystems is increasing, the functions of user authentication and authorization and user information query can be serviced into an HTTP rest service or RPC call service. Considering the project duration and development costs, we didn't use this technology to improve performance by directly using database access interfaces and caching user information.
Note that you must set the timeliness when caching user information, for example, 5 minutes.
4. overall architecture
The following section describes the overall architecture of the system, which is used to describe the division and hierarchy of subsystems and business components. It is not yet detailed enough to guide development.
4.1 logical architecture
1) The infrastructure layer includes servers, switches, routers, firewalls, security protection, tape backup, and other hardware network devices. It also includes virtualization software and operating systems.
2) The middleware layer provides commercial databases, application servers, memory databases, open-source message middleware, and data extraction software.
3) The service component layer is a reusable service component and a secondary development framework.
4) the business application layer is for each business subsystem.
4.2 component dependency
1) kettle saves the extracted flow data to the database and writes it to the Message Server rabbitmq.
2) gemfire loads the flow data from the streamline database in the last few days to the memory.
3) interface query service call gemfire client API query entry flow.
4) The transaction streamline splitting service reads data from the Message Server for processing and writes the splitting results to the Business Database.
5) Each web sub-business system must rely on the CAS service for single-point logon.
6) The unified user management system centrally manages user and organization information.
7) The Single Sign-On service queries user information from the user database for centralized authentication.
8) the portal Portal integrates various web sub-business systems.
4.3 physical deployment
4.3 subsystem Design
Since subsystem design involves specific business of the company, we will not describe it further here. During the design, you only need to extract the data according to the general functions. Different service functions can be divided into different subsystems.
Generally, the subsystem design should clearly describe the specific functional module design, business processing process, development kit division, domain model, key class diagram, internal interface and external interface specifications.
5. Conclusion
Friends who have played real-time strategic electronic games all know when to mine, how many farmers will find the greatest efficiency, when to develop troops to restrain each other, and when to investigate and expand, the best advantage can be achieved only when multiple types of troops are involved. The architecture design is the process of implementing the Strategy. The specific technology is a weapon of war, and the model theory is a variety of tactics.
Architecture Design Case Analysis-Highway Toll Management Platform