Zhu Ye's experience in Internet Architecture S1E2: A proven architecture three carriage

Source: Internet
Author: User

Zhu Ye's experience in Internet Architecture S1E2: A proven architecture three carriage

"Download this PDF for reading"

The three carriages mentioned here refer to microservices , message Queues , and timed tasks . As shown, here is a three-horse carriage co-driven by the architecture of a stereoscopic Internet project. Regardless of whether the project is large or small, the pattern of this schema template will not change after the stereotypes, the difference is that we have more services have more complex calls, more complex message flow, more jobs, the whole architecture is extensible, and will not be deformed, this architecture can be a long time without large adjustments.

The dotted box on the diagram indicates that the module or project is not too much of a business logic, it is purely a layer of skins (it invokes the service but does not touch the database). The arrows for black lines represent dependencies, and the green and Red Arrows are the direction of MQ's send and subscribe message flows, respectively. This will be explained further in detail later.


MicroServices are not a very new concept, I began to practice this architecture style 10 years ago, in the four companies to fully implement micro-services in the project, more and more firmly believe that this is very suitable for Internet projects an architectural style. Not that our service must be remotely invoked across physical machines, but rather that we have a deliberate design that allows our business to be segmented by domain at the very beginning, which allows us to understand the business more fully, allowing us to easily work on different business modules in subsequent iterations, Can make our project development more and more relaxed , easy to come from several aspects:

1. If we can do microservices, then we must go through a more comprehensive product requirements discussion and domain division, each service carefully design their own field of table structure, this is a very important design process, also determines the entire technical architecture and product architecture is matched, The architecture of All-in-one tends to omit this process, where the requirements go to where the code is written.

2. Our division of services and the positioning of responsibilities if it is clear, for the new requirements, we can know where to change the code, there is no copy paste there are a lot less pits.

3. Most of our business logic has been developed and reused directly, and our new business is just an aggregation of existing logic. After the PRD review, the development concludes that only a combination of the XYZ methods that call the ABC three service, and then modify the Z method in the C service to add a branching logic, can build up a new logic, a refreshing feeling unimaginable.

4. when there is a significant bottleneck in performance, we can increase the capacity of some services to add more machines, and because of the division of services, we are more aware of the bottleneck of the system, from 10000 lines of code to a line of performance problems of the code is more difficult, But if these 10000 lines of code are already made up of 10 services, then locating a service with performance problems and then analyzing the service reduces the complexity of the location problem.

5. If the business has a relatively large change to the downline, then we can be sure that the underlying public services will not be eliminated, the offline corresponding business of the aggregation of business services to stop the traffic entrance, and then the relevant basic services related to the offline part of the interface can be. If you have a perfect service governance platform, you don't even have to change the code.

Here too, we are required to do the principles of several aspects:

1. The granularity of the service needs to be well controlled. My habit is to first divide by field, not wrong, as the project progresses slowly more granular split. For example, the Internet financial peer-to business, the beginning can be divided into:

  • A tripartite cooperation service Partnerinvestservice: the traffic of the three-party management platform of the docking cooperation
  • b General Investment Services Normalinvestservice: The mainstream of the most common forms of assets
  • C Appointment of investment product service Reserveinvestservice: The main course of the assets that need to be reserved for investment
  • D Recurring Plan product service Autoinvestservice: The main flow of financial products that will be regularly and automatically re-invested
  • E Investor Trading Service TradeService: specializes in dealing with investors ' trading behavior, such as investment
  • F Borrower Transaction Service Loanservice: specifically responsible for dealing with the borrower's trading behavior, such as repayment
  • G User Service UserService: Process user Registration Login, etc.
  • H Asset Service Projectservice: Dealing with assets and underlying related
  • I account Accounting Service Accountservice: Processing the accounts and accounts of the user account
  • J Marketing Activity Service Activityservice: to deal with various activities, users of the integration system
  • K Membership System service Vipservice: processing the user's membership growth system
  • L Bank Depository Service Bankservice: Dedicated to docking Bank depository system
  • M Electronic signature service Digsignservice: Specially used for docking three-party digital seal system
  • N Message Push service Messageservice: dedicated to docking three-way SMS channel and push SDK

2. Service must be three-dimensional, not on a level, such as, our services have three levels:

    • Converged Business Services: high-level business services that string up the entire process with a complete business form. Unlike basic business services, this is a complete description of the business, which is often assembled from a variety of basic business. Different forms of cooperation with different external partners, to provide users with different service forms of the product, have determined that the aggregation of business services will have business process differentiation, if such services are delegated to the basic business services, then the basic business services will have a variety of if-else logic (according to the product type, User type for various if-else), as the business cooperation does not cooperate, demand changes, the basic business services will be very corrupt, in order to avoid this situation, we put the change of the multi-aggregated business logic into the independent business services. In general, converged business services do not invoke each other because they represent independent business processes, but they are bound to invoke a large number of types of basic business services. In the 1th, the a~d, which are marked with blue lettering, are such services. The business logic of this level of service is more to express the complexity and difference of business process, does not relate to how to deal with account information, accounting information, user information, how to deal with the specific investor and borrower's transaction. For example, the appointment of such business forms, it is concerned about the first to reserve assets, and then by the system of automatic investment, the bottom is entirely dependent on the investor trading services to do the entire transaction process.
    • Basic Business Services: a business-related service in one domain. Such services are allowed to call each other, such as the investor transaction services and the borrower transaction services inevitably need and user services, asset Services, account accounting services to communicate the user information related to the query, the subject of information inquiries, bookkeeping and other business operations. The reason why investors and borrowers trading services are positioned as basic business services is because they are dealing with a specific aspect of the business, not the whole process, at this level of abstraction, the business is not so easy to change, for complex business forms (such as scheduled transactions, automatic re-investment transactions, Equal principal and interest transactions) will form a converged business service on top of these services. In the 1th, the e~k, which are marked with green fonts, are such services. At this level of service although there is a lot of business logic, but actually have enjoyed a large number of public infrastructure services reuse, and their business coupled with weak additional logic often does not accumulate in the service, by more dedicated basic business services to assume this part of the logic.
    • Public basic services: responsible for a certain aspect of the underlying business (there is no field of business logic in the inside), can be autonomous to deal with a certain aspect of the basic business, or external communication to achieve a certain aspect of the function, the service will not call each other, but will be aggregated business services and basic business service calls. In the 1th, the L~n, which is marked with an orange font, are such services. If the cooperation between the future and the external changes, because we have defined the external service contract, can easily replace the service to replace the cooperation of the third party, the rest of the system almost no need to modify. All three-way docking is recommended to independently out of the public basic services, if the same business docking multiple three-way channels, such as push docking Aurora and a push, and even public infrastructure services can be an abstract aggregation of the push service, and then routed to the specific Aurora push and push service.

Hope here to clarify this matter, how to divide the service how to divide three levels of service is a very interesting and necessary thing, after the service division it is best to have a clear document to describe each service responsibility, so that we can not need to read the API to be able to locate the service of the business, The whole complex system becomes straightforward.

3. Each service docking of the underlying data table is independent of no cross-correlation, that is, the data structure is not directly external, the need to use other services must be done through the access interface. Benefits are the benefits of encapsulation in object-oriented design:

    • It is easy to reconstruct the underlying data structure or even the source, as long as the interface is not changed and the external is not perceived.
    • Performance problems in the case of the need to add cache, table, library, archive is more convenient things, after all, the data source has no external dependencies.

Well, that's my data. I'm in charge, I want to get out of the way, it's important to refactor or do some high-level technical architectures (such as live offsite) without the underlying data being relied upon. Of course, the downside or the trouble is that the cross-service invocation makes the data operation impossible to complete in a database transaction, which is not a big problem, because our splitting method does not make the granularity too thin, most of the business logic is done in a business service, The second is to mention that cross-service invocations, whether through MQ or direct invocation, will have compensation for eventual consistency.

4. consider the significant differences in the stability of service invocations across machines across processes. Method calls inside the method, we need to consider the situation of the call exception, but almost no need to consider the situation of time-out, almost no need to consider the situation of request loss, almost no need to consider the situation of repeated calls, for remote service calls, these points need to focus on, otherwise the whole system is basically available, The test environment is not a problem, but it is in a state of trouble on the line. This requires a few more questions about the availability and invocation of each service, carefully considering the fact that the network problem method is not performing multiple executions or partial execution:

    • When we provide services to the outside, not only to inform the User Service to provide the business capabilities, but also to inform users of the characteristics of the service, such as whether it is idempotent (for the order type of operation services, the same orders the same operation is strongly recommended to be idempotent, so that callers can rest assured that the retry or compensation) Whether external compensation is required (here you may say why the need for external compensation, the service can not compensate for itself, for the internal sub-logical service of course, it is possible to compensate for their own, but sometimes because the network reasons for the request is not to the server, the server is ignorant of the call of course not to compensate) , whether there are restrictions on frequency control, whether there are restrictions on permissions, how to deal with the downgrade, etc.
    • In turn, we call other services to ask a few more features of the target service, and to design the corresponding compensation logic, consistency processing logic and downgrade logic. We have to take into account that sometimes it is not the service side, but the request does not reach the server at all.
    • The service itself often has complex logic, as the client's identity calls a large number of external services, so the role of the server and the client is not fixed, when there are many clients inside our service to invoke the service side, for each of the sub-logic we need to carefully consider each link. No, what happens is that the service is part of a logical power, or part of the logic is ultimately consistent.

If you say that so many services, I am very difficult to consider at the time of implementation of these points, I do not consider the distribution of transactions, idempotent, compensation (no exaggeration to say, sometimes we spend 20% of the time to implement the business logic, and then spend 80% of the time to achieve the external logic of these reliability), OK? is not not, then the business online running on the time will be riddled with, if the entire business of the processing of the reliability of the requirements are not high or business is not facing the user will not be complained, this part of the business is to temporarily do not consider these points, But businesses that do not allow inconsistencies, such as the order business, need to take these points into full consideration.

5. consider the significant differences in service data transfer across machine cross-process calls. For local method calls, if the arguments and return values are the objects, then for most languages, the pointer (or copy of the pointer) is directed to the allocated object in the heap, the cost of the object on the data transfer is almost negligible, and there is no overhead of serialization and deserialization. For cross-process service invocations, this cost is often not negligible. If we need to return a lot of data, the definition of interfaces often requires special modifications:

    • By using paging, the client pulls more data on demand by returning a fixed amount of data at a time.
    • You can pass a data structure similar to Enumset in the parameters, let the client tell the server what level of data I need, such as GetUserInfo interface can be provided to the client Basicinfo, Vipinfo, Investdata, Rechargedata, Withdrawdata, the client can take basicinfo| on demand from the service side Vipinfo.

6. the problem of method granularity is also deduced here, for example, we can define getuserinfo to return different data combinations by passing in unused parameters, or we can define Getuserbasicinfo, Getuservipinfo, Getuserinvestdata and so on fine-grained interfaces, the granularity of the interface is defined by how the consumer will use the data, more likely to use single or composite types of data at a time, and so on.

7. then we need to consider the issue of interface upgrade, the interface changes are best compatible with the previous interface, if the interface needs to retire the downline, it is necessary to ensure that the caller has been transformed to a new interface, to ensure that the caller traffic for 0 to observe a period of time before the old interface from the code offline. Once the service is open, it is not so easy to make an interface definition adjustment or even go offline. So the external API design needs to be cautious.

8. Finally, I have to say that after the entire company has started micro-services, some cross-departmental service calls in the agreed API will inevitably have some of the phenomenon of wrangling, whether I pass to you or you pull, this data is useless to me why should I stay here? Aside from the non-technical aspects of the matter, these wrangling is also some technical means to resolve:

    • Explicit service responsibilities also make it clear that services should perceive what should not be perceived.
    • The interface definition for cross-departmental service interactions can be very light, with an interface with only one order number or an MQ notification + data pull-back policy (who has more data on who provides the data interface, without having to push the data downstream).
    • Data providers can build a common data interface that can meet the needs of multiple departments without the need for customized processing. Even on the interface can provide ground and non-landing two properties of the transmission.

You may see this feeling dizzy here, why microservices need extra consideration for so many things, the complexity of achieving a sudden rise. What I'm trying to say is that we need to look at this in a different perspective:

1. We do not need to consider all logic at the outset, overriding core process core logic. Because cross-service becomes the provider and consumer of the service, the equivalent of myself, there are many other people who will be able to relate to my service ability, people will ask all kinds of questions, which is good for designing a reliable method.

2. even when we stack all the logic together without cross-service invocation, it does not mean that these logic must be transactional, implemented tightly, and that cross-service invocation tends to amplify the likelihood of the problem.

3. We also have a service framework, the service framework often in the monitoring tracking level and operation and maintenance system together to provide a lot of integrated functions, which will be closed in the internal method of the logical break exposed, for a perfect monitoring platform of the micro-service system, When troubleshooting a problem you tend to lament that this is a remote service call.

4. The biggest bonus is that when we form a three-dimensional service system with clear business logic, any requirement can be dissected into a very small number of code modifications and some combination of service invocations, and you know that I do not have any problems, Because the underlying service ABCDEFG are historically tested, this refreshing experience will be enjoyable once in a while.

However, if the service granularity division unreasonable, the hierarchy division unreasonable, the underlying data source has the intersection, did not consider the network call failed, did not consider the data volume, the interface definition is unreasonable, the version upgrade is too reckless, the whole system will have a variety of extension problem performance problems and bugs, this is very headache, This also requires that we have a perfect service framework to help us locate a variety of unreasonable, in the later talk about the middleware article will be specifically focused on service governance this piece.

Message Queuing

The use of Message Queuing MQ has several benefits, or we tend to consider introducing MQ in these purposes:

1. Asynchronous Processing: a process like an order can typically define a core process that processes the state machine of the core order, which needs to be synchronized as soon as possible, and then around the order will derive a series of user-related inventory related to the subsequent business processing, These processes do not require a card to be processed at the instant the user clicks the submit order. The next order is just a confirmation of the legal processing of orders, the follow-up of many things can be slowly in dozens of modules in the flow, this process even consumes 5 minutes, users do not need to feel.

2. Traffic flood Peak: one of the characteristics of the Internet project is that some time will do some TOC promotion, there are some traffic peaks, if we introduce the message queue between the modules as a buffer, then backend service can be in their own existing comfortable frequency to passively consume data, Will not be overwhelmed by the flow of pressure. Of course, good monitoring is essential, the following to elaborate on the monitoring.

3. module decoupling: As the project complexity increases, we will have a variety of events from internal and external projects (user registration login, investment, withdrawal events, etc.), these important events may continue to have a variety of modules (marketing module, activity module) need to care about, Core business System to call these external system modules, so that the entire system in the internal entanglement is obviously inappropriate, this time through the MQ decoupling, so that a variety of events in the system of loose coupling flow, the module between each other do not perceive each other, this is more appropriate practice.

4. message mass: There are some messages that will have multiple receivers, the number of receivers or dynamic (similar to the nature of the chain of accusations is also possible), at this time if the upstream and downstream of a pair of more coupling will be more troublesome, for this situation is more appropriate to use MQ decoupling. Upstream just send a message that what is happening now, downstream no matter how many people care about the news, upstream is not aware of.

These requirements are essential in Internet projects, so the use of Message Queuing is a very important architectural tool. There are several points of note in use:

1. I prefer to be independent of a dedicated listener project (instead of merging in the server) to specifically listen to the message, and then the module does not have too much logic, but only after receiving the specific message to call the corresponding service API for message processing. Listener is capable of starting multiple copies to do a load balancer (depending on the MQ product used), but since there is little pressure here, not 100% must. Note that not all services are required to have a matching listener project, and most of the public base service is often not listener because it is independent and does not need to perceive other business events outside of it. There are some similar reasons for basic business services that do not need to have listener.

2. for important MQ messages, the appropriate compensation line should be used as a backup, in which the MQ cluster is properly trapped and as a back when the MQ cluster is paralyzed. I have used RABBITMQ in tens of thousands of days of orders, although the QPS in hundreds of thousand, far less than the rabbitmq to withstand the tens of thousands of QPS, but there is a total of one out of 10,000 lost message probability (I also used Ali's ROCKETMQ, But because the small amount is not currently observed to have a similar problem, these discarded messages are immediately processed by the compensation line. In extreme cases, the RABBITMQ has an entire cluster outage, a service sent messages can not reach the B service, this time the compensation job began to work, regularly from a service bulk pull messages to the B service, although the message processing is a batch, but at least ensure that the message can be handled normally. It is important to do this backup because we cannot ensure that the middleware is available at 100%.

3. The realization of compensation is without any business logic, and we'll comb it out to compensate for it. If the A service is the provider of the message, B-listener is the message listener, and when the message is heard, the specific method Handlexxmessage (Xxmessage message) in B-server is invoked to execute the business logic, and when MQ stops working, There is a job (configurable compensation time and the amount of each pull) to periodically invoke a service-provided proprietary method Getxxmessages (LocalDateTime from, LocalDateTime to, int batchsize) to pull the message, It is then possible to call the B-server handlexxmessage (which can be concurrent) to process the message. This compensated job can be reused to be configurable, without having to hand-write a set of messages each time, the only thing that needs to be done is a service that needs to provide an interface to pull the cancellation. Then you might say, I a service here also need to maintain a set of database-based Message Queuing, this is not a set based on the passive pull message queue? In fact, the message here is often just a conversion work, a must have landed in the database over a period of time has changed the data, as long as the data into a message object to provide out. B-server Handlexxmessage because is idempotent, so does not matter whether the message is repeated processing, here is only in the emergency situation in the past a period of time without brain data processing.

4. the processing side of all messages is best for the same message processing implementation idempotent, even if some MQ products support message processing and processing only once, on their own power and so on to make things easier.

5. There are scenarios where there is a need for deferred messages or deferred Message Queuing, such as RABBITMQ, ROCKETMQ, are implemented in different ways.

6. MQ messages generally have two types, one is (preferably) only consumed by one consumer and consumed only once, and the other is that all subscribers can handle it without limiting the number of people. There are different implementations of MQ middleware for both forms, sometimes using message types, some using different switches, or using group partitioning (different group can repeat the same message). In general, these two implementations are supported. When using specific products, be sure to study the relevant documents, do a good experiment to ensure that the two messages are handled in the correct manner, so as to avoid the occurrence of monsters.

7. need to do a good job of monitoring the message, the most important thing is to monitor whether the message has accumulated, some need to enhance the downstream processing capacity (plus machine, plus threading), of course, do better points can be a hot map of the flow of all messages to the flow rate at a glance you can see which messages are currently under pressure. You might think that since the messages are not lost in the MQ system, there is nothing wrong with the backlog of messages. Yes, the message can be properly stacked, but not a lot of accumulation, if the MQ system has storage problems, a large number of accumulated messages are lost is more troublesome, and some business systems for message processing is time, late arrival of the message will be considered a business violation ignored.

8. The figure draws two MQ clusters, an internal set of external. The reason is that the internal MQ cluster we control on the authority can be relative weaknesses, the external cluster must be clear every topic, and topic need to be fixed by the people to maintain can not be arbitrarily deleted in the cluster topic cause confusion. Hard isolation of internal and external messages is also good for performance, and it is recommended to isolate the MQ cluster internally and externally in the production environment.

Scheduled Tasks

There are several types of requirements for timed tasks:

1. As mentioned earlier, the MQ notification inevitably has an unreachable problem with cross-service invocation, and we need some mechanism to compensate.

2. Some of the operations are driven by task tables, which are described in detail below for the design of the task table.

3. Some of the business is scheduled to be processed regularly, do not need real-time processing (such as notifying users of the red envelope is about to expire, and the Bank of the final reconciliation, to the user billing, etc.). The difference with 2 is that the time and frequency of the tasks here are varied, and 2 is generally a fixed frequency.

Explain in detail what the task driver is all about. In fact, in the database to do some task table, with these table drivers as the entire data processing core system, this passive mode of operation is the most reliable, than the MQ driver or service-driven two forms of reliable, inherently must be load-balanced + idempotent processing + compensation to the end, the task table can design the following fields:

    • Self-Increment ID
    • Task type: Indicates the specific task type, and of course you can do multiple task tables directly with different task types.
    • External order number: associated with a unique ticket for the external business logic.
    • Execution Status: unhandled (pending processing), processing (Prevent other job preemption), success (final success), failure (temporary failure, will continue to retry), manual intervention (will never change, must be manually processed, need alarm notification)
    • Number of retries: processing too many times or failed can be categorized as dead letter, by a dedicated dead-letter queue task to retry a number of separate retries no, then the alarm manual intervention
    • Processing history: Each time the result of processing, the JSON list is saved here for reference
    • Last processing time: Last Execution time
    • Last processed result: Last execution result
    • Creation Time: Database maintenance
    • Last Modified: Database maintenance

In addition to these fields, it is possible to add some of the business's own fields, such as order status, user ID, and so on as redundancy. Task table can be archived to reduce the amount of data, the task table plays the nature of Message Queuing, we need to have monitoring can be on the data backlog, access team imbalance processing, dead-letter data and so on, such as the situation to alarm. If our process processing is a task ABCD sequence to deal with, each task because of its own check interval, the system may waste a little time, not through MQ real-time concatenation so efficient, but we have to consider that the processing of the task is often bulk data acquisition + parallel execution, and MQ based on a single data processing is not the same, the overall throughput will not be too much difference, the difference is only a single data execution time, considering the task table-driven execution of the passive stability, for some business, this is an option.

Here are some of the design principles of the job:

1. job can be driven by a variety of scheduling frameworks, such as elasticjob, quartz and so on, need to separate project processing, can not be mixed with services, deployment of more than the start of a problem often. Of course, the implementation of a task scheduling framework is not a very troublesome thing, in the implementation of the time to decide which machine to run the job, so that the entire cluster resource use more reasonable. Plainly, there are two forms, one where the job is deployed to be triggered by the framework, and just where the code is, from the framework to the process.

2. The job project is just a layer of skins, with up to some configuration consolidation, there should be no actual business logic, no touching of the database, and most of the scenario is invoking the API interface of the specific service. The job project is responsible for configuration and frequency control.

3. compensation class job pay attention to the number of compensation, to avoid the whole task by dead-letter data stuck problem.

The three carriages are finished, so, finally, let's comb the module of the whole project under such a set of architectures:

  • Site:
    • Front
    • Console
    • App-gateway
  • Façade Service:
    • partnerinvestservice-api
    • partnerinvestservice-server
    • partnerinvestservice-listener
    • normalinvestservice-api
    • Normalinvestservice-server
    • normalinvestservice-listener
    • Reserveinvestservice-api
    • reserveinvestservice-server
    • Reserveinvestservice-listener
    • autoinvestservice-api
    • autoinvestservice-server
    • autoinvestservice-listener
  • Business Service:
    • tradeservice-api
    • tradeservice-server
    • tradeservice-listener
    • loanservice-api
    • loanservice-server
    • Loanservice-listener
    • userservice-api
    • userservice-server
    • Projectservice-api
    • projectservice-server
    • accountservice-api
    • Accountservice-server
    • Accountservice-listener
    • Activityservice-api
    • activityservice-server
    • Activityservice-listener
    • Vipservice-api
    • vipservice-server
    • vipservice-listener
  • Foundation Service:
    • Bankservice-api
    • Bankservice-server
    • Digsignservice-api
    • Digsignservice-server
    • Messageservice-api
    • Messageservice-server
  • Job:
    • Scheduler-job
    • Task-job
    • Compensation-job

Each of these modules can be packaged into a separate package, all projects are not necessarily in a project space, can be split into 20 projects, the service Api+server+listener in a project, which is actually beneficial to the CICD disadvantage is to modify the code when you need to open N projects.

As I said at the beginning, using this simple architecture can be a great way to expand, not to say much more in terms of complexity or workload than the All-in-one architecture, and you may not agree with this view here. In fact, this is to see the accumulation of the team, if the team are familiar with this architecture system, play micro-service for many years, then in fact, many problems will be in the process of coding directly into consideration, many times the design can also be considered as a live practice, do a lot of nature know what should put where, How to divide and how to close, so there will not be too much extra time cost. These three carriages constitute a simple and practical architecture solution I think can be applied to most Internet projects, but some Internet projects will be more biased in one aspect of the weakening on the other hand, I hope this article is useful to you.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

Tags Index: