How does a high-frequency transaction system choose between multi-thread and port communication?

Source: Internet
Author: User
Tags thread logic
For example, if you exchange data between tickdata processing and subsequent transaction signal analysis, the latency of a multi-threaded system is small, but it is not conducive to code function differentiation; the code for port communication between multiple programs is clearer, but the delay is obviously increased. How to choose? For example, if you exchange data between tick data Processing and subsequent transaction signal analysis, the latency of a multi-threaded system is small, but it is not conducive to code function differentiation; the code for port communication between multiple programs is clearer, but the delay is obviously increased. How to choose? Reply content: if the communication mode is the same, for example, shared memory is used, the main difference between multithreading and multi-process is only the isolation feature when the program crashes, and the multi-thread crash is all done, multiple processes only affect some of them.

If we still consider the significance of software engineering, different processes usually mean different programs or even different projects, and the operational risks will be reduced. This is an important reason why many transaction systems do two processes instead of one process, the risk control process and policy process are isolated from each other in the software engineering sense. Modifying the policy will not increase the possibility of disorderly reporting.

The smaller the number of processes, the better. The less synchronization between different logical streams (threads), the better. With low computing workload, the lowest latency is generally obtained by a single thread logic.

For low latency, we need to consider the time it takes for I/O blocking to be awakened for execution. We should avoid using Socket and other services that require syscall and context switch.

Of course, this is all based on high-frequency transactions. More accurately, it is in the field of Low-latency transactions. Not all electronic trading systems are in this field, and the architectures in other fields may vary according to different requirements. First of all, I agree that there are many methods for software decoupling. Splitting by process is only one of them, but not necessarily the best one. In fact, to improve development efficiency and reduce maintenance difficulty is an art that requires constant exploration based on the technical characteristics of team members. You do not have to limit your thinking model in advance, try other methods (such as OO and Functional) for better results.

Based on my (years of experience), the transaction system is also in line with the development model of the surgical team recommended in the Mythical man-month. The development of such high-precision systems has a negative effect. The best way is to design and develop one or two high-level programmers, and add a few more troops to the classroom. So there is an ultimate solution to this problem: let the boss raise his salary and hire first-class technical experts. When you have a top-notch expert, the solution has changed from multi-person collaboration to a problem of personal taste. You only need to select a solution based on the taste of the attacker.

Of course, I understand that white people are in the rivers and lakes and cannot help myself. The above suggestions are obviously far from reaching thirst for projects that have left many or half of their history. So let's give some practical suggestions.

First, inter-process communication (IPC) There are many methods, and port communication (Socket) is definitely not the fastest. Both Named pipe and shared memory can be improved. In fact, the fastest way is to share memory + exclusive CPU core. For details, refer to Java-Chronicle. . This method achieves Communication latency of dozens of sodium seconds. See fastest (low latency) method for Inter Process Communication between Java and C/C ++. .

Read this article to understand why port communication is slow: Know your TCP system call sequences , See:
Even if you are using a local interface (localhost), you need to switch to the kernel space through system call, which is a waste of time such as numerous memory copies and Protocol checks. These problems do not exist in the shared memory solution. So a simple practice is to rewrite your communication modules and adopt these faster communication methods. Even if you are using a local interface (localhost), you need to switch to the kernel space through system call, which is a waste of time such as numerous memory copies and Protocol checks. These problems do not exist in the shared memory solution. So a simple practice is to rewrite your communication modules and adopt these faster communication methods.

If your mess is so bad that you cannot afford to modify the communication code, there is also a secret called replacing the system call. In short, you can change the system call entry to your own program by modifying the function pointer, for example, using the shared memory to implement port communication APIs. In this way, you can change the communication mode without modifying the existing program to increase the communication speed. For details, refer to the popular open source implementation OpenOnload. Although it mainly replaces the TCP/IP protocol stack, the idea is the same.

In fact, this time is the best embodiment of Java and other virtual machine-based bytecode language advantages. If you use Java, it is easy to replace the system call. You can use JMockit directly if it is too troublesome. The usage of such libraries is very simple (please allow me to express my sympathy for those struggling in C ++ coding ). I looked back at my answer and paid too much attention to the modular programming perspective. I didn't combine it with the industry's needs. I thought it was a bit simple and rude. I 'd like to modify it.
@ Dong Keren @ Lu wangbin answered in great detail and mentioned the handling methods of legacy systems and the comparison of the advantages and disadvantages of multi-thread analysis and communication.

In addition, the subject has little specific information about the high-frequency system and is not clear about whether to connect to the exchange. Therefore, most people can only answer the question based on the low-latency system, I am a little divergent here, and I will introduce some relevant information for your in-depth discussion.

First, the modularization of various business functions of the system is not in conflict with the deployment and running status (multithreading or multi-process) adopted by the main program. On the premise that each department uses the same programming language, both can be easily obtained at the same time. This is also the decoupling mentioned in the answer. However, if a system is developed in multiple languages, data communication is required between them, or multiple policies need to share a front-end data source, for example, the exchange only allows one connection, and multiple policy systems are used. It may not be able to be deployed on one machine. In this case, network communication is inevitable, it can be upgraded through internal network and machine hardware. In other words, you must analyze and optimize specific problems.

Second, we need to analyze the main cause of latency caused by the program, or where the main cause of low latency is from the perspective of demand. If we need to analyze the cause, we have to pay a price to satisfy any requirements, in particular, some non-functional requirements sometimes lead to a sharp increase in the cost of the entire system. In high-frequency transactions, to what extent is the data transmission latency between networks the biggest bottleneck? Whether it is on the Intranet or the internet. If it is on the internet, hosting machines can eliminate network transmission latency. If it is on the Intranet, it is also feasible to improve network devices and machines, these related links are all issues that need to be analyzed. One of our customers had been entangled for a long time in this aspect, but later we found that if we made some improvements to the algorithm, the system reduces the latency of hundreds of ms at once, so grasping the main contradiction is the key to solving the problem.

Again, let's go back to the specific problem of the subject. If we only focus on this problem, tick data already exists, and how to exchange data between the tick data module and the tick data collection module is analyzed, in fact, the transmission of high-frequency transaction data is one-way. tick data is collected by producers and analyzed by consumers. It is a very simple computing model and can be solved by means of shared memory, because exchange data is constantly pushed over, the data analysis part must be processed before the next data arrives. Otherwise, the processing method of the buffer zone may be accumulated or data is lost, we recommend that you use a two-step processing mode similar to that of the operating system for hardware interruptions. You can break down the analysis task and determine when the data is sufficient to start the analysis and make a balance between the Data Reading and processing, the key is to improve the algorithm in terms of data analysis and reduce its latency. If its computing time is less than the interval between tick data, a lot of complicated optimization work can be saved.

Finally, it is recommended that the policies written at the R language level only be single-threaded, the backend needs to support the market data collection sources and transaction Channel interfaces of multiple exchanges. It must be multi-threaded, And the frontend and backend are connected by using C ++ R extension packages, shared memory data is used for communication for reference.

------- The following is the original answer ------------

As a programmer, I do not understand the problem of the landlord. How can I distinguish code functions with multiple threads? Modularization is not related to the use of multiple threads or the deployment of multiple programs. It is mainly because of the programmer's understanding of the business, and whether it can be achieved elegantly using programming technology.
Only one program, even if the source code has only one file, can be well modularized. Share some experience:
1. The transaction system or high-frequency transaction system must be multi-threaded. The reason is omitted. I will modify and add it if I have time.
2. share my experience in writing high-frequency programs directly (try to share it without affecting my strategy) for your reference only.

All of the following layers operate independently, with no Dependencies between each other and data zone sharing.


A: unidirectional data acquisition layer (depth, transaction volume, etc .. source: Network, delay time: uncertain), basically each data item opens an "infinite" loop to fetch data into the public data zone (memory data, if the data volume is huge, you can consider a memory-based database .)

B: Data Processing Layer: data source, memory (A), latency, basically no latency, data output --> still public data zone, memory.

C: main logic layer: data source, memory (data processed by B, avg .. spread .. top value .. and so on), according to your strategy to make sales and other actions (enable thread D,), the main logic layer still does its own thing.

D1: order tracking layer: data source, network, tracking order status and subsequent actions ..
D2 :............
D3 ...........
D4 ...........
P.S .. if the landlord refers to the local port to the network, rather than the port between threads. That ordinary PC has tens of thousands. If not. We will discuss it again. 1. How does one feel that the technical level of the problem on the topic of high-frequency transactions is so low? Illusion?

2. Since the main requirement is extremely high performance and extremely low latency, the design principle of the entire system must be: How high performance is achieved, how simple it is. Therefore:

3. Code function differentiation (in fact, modular, low coupling, and low hard coding) and port communication between multiple programs (is it port multiplexing? Or is TCP/UDP-based inter-process communication ?) No need to wait for these technologies that will obviously increase latency.

Even... or even ......

4444444. it is best to use FPGA and DSP to replace low-performance ordinary PCs, workstations or servers, and research institutes or companies with R & D technical strength can even use HyperIFR51 (51 single-chip microcomputer based on optical quantum technology) for speeding computing. Conditional units can use microwave communication to replace common optical fiber networks, high-end enterprises with a large amount of money can even use their core contacts to rent the latest and most secure military-level P3OEMod25k (25 k express Dialing Network Based on the nuclear resonance technology) for speeding communication. ....

The subject should be The Art of Unix Programming, which is called a process. The sight and object orientation are poisonous.

1. Do not overpursue "distribution architecture"
2. Be sure not to pursue excessively
3. Split your code into different modules.
3.1 you can learn how C works. Even non-OO languages can also be cracked.
4. Dry time, martin fowler's microservices: Microservices
4.1 understand thoroughly. You also know which processes should be "independent" and which should be put in one process for multi thread processing.

PS: I do not like the following answers. I think this is a good question from my experience:
Coordination between market data and policy order is always one of the difficulties.> Multithreading systems have low latency, but it is not conducive to code function differentiation.
Why is it not conducive to code function differentiation?

> The Code of port communication between multiple programs is clearer, but the delay is obviously increased.
How much latency does it increase? Is it unacceptable? Does your system need to be subtle?

Your goal is to solve the problem, rather than proposing some non-existent problems. The system we developed is basically based on socket, and the development cost is indeed taken into account. When the data volume is large, other problems such as thread mutex occur occasionally. This is also impossible, and the Exchange data is slow. Your local increase of dozens of milliseconds has not seen much improvement on the overall system. In other ways, you need to perform various tests. In summary, socket is easy to use and cheap. Some problems can be optimized by algorithms. As long as they are stable, they are basically okay in China, and data in China is slow, you don't have to think too much. All process shoshould be single thread.
You shoshould use cup isolation to pin a process to a core.
They communicate thru shared memory if they are execution related processes, otherwise shocould be multicast, like risk related processes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.