Build a high-concurrency and Low-latency System

Source: Internet
Author: User

First, declare that the "high concurrency" here is relative, relative to the hardware, rather than absolute high concurrency. The latter must be implemented in a distributed manner. We will not discuss it here. This article focuses on the high concurrency of a single machine.

Recently, a voice communication system was developed, requiring online users and 1 K concurrent calls. The hardware is two servers, core and multi-core, 4G memory, and 1g NIC (the best hardware I have ever used, so it should not be a problem ).

Another indicator of the system is call latency and voice latency. This is the key to this system. In the end, when our system obtained the user's on-site test, the effect may be a little good, and the other party did not believe the test. In fact, it should be difficult to reduce latency as long as you have a good grasp of it in a few places. Here is a summary.

 

1. Overall Structure:

The entire structure is separated by control and bearer. The control part is responsible for the process control part, including process establishment, processing, and voice resource management. It is the core part of the system. The bearer part is mainly responsible for speech processing, including speech codec, encryption and decryption, and forwarding of recordings. The advantage is: 1) reduce the overall complexity of the system. 2) Improve system scalability. Especially if the number of users goes up, this structure is better extended.

This is a typical Softswitch structure in communication. Two servers, one for control and the other for bearing. Control and bearer communication through the network.

A control program is a process that can manage multiple bearer programs.

2. Process:

To reduce latency, the key point is the design of the function implementation process. It is necessary to reduce unnecessary links and interactions between network elements. If the data can be notified at a time, do not interact twice. When necessary, you can sacrifice the protocol standards for latency and use private protocols (at least from the current perspective, this system is an end-to-end closed system .).

3. Development language:

The control layer is implemented using python. The process logic of the control part is complex, while python is good at describing the logic. I was a little worried about Python's running efficiency, but it was not necessary: the whole system is under load, not the control part, and the control part won't put too much pressure. In addition, the CPU is powerful enough, the latency bottleneck lies in I/O. Moreover, Python also reused the Protocol codec library we previously implemented using C.

The bearer part is implemented in C.

4. Use multiple cores

Use multiple processes to use multiple cores. On the bearer server, two processes are run concurrently, each of which processes 500 calls respectively. The thread switching cost may be lower, but the programming complexity is high. For threads, I only use the simplest model.

There is no multi-process in the control part, and it seems that the multi-core server cannot be used. However, at present, it is not necessary because it can meet the needs well now.

5. Network Communication

A large part of the load on the server is caused by network communication. According to our function, 1000 concurrent speech means that at least 1000 voice packets are processed in less than 20 milliseconds (the worst case is 2000 voice packets, including sending and receiving packets ).

Libevent is open-source and is known as lightweight, high-performance, and widely used. It may be a good choice. However, in my opinion, it is still a little huge. I cannot use many features (cross-platform and multiple communication models.

The Linux epoll interface is simple and easy to use. The interface provides a parameter to set user data, so that I can put some data, including function pointers, so as to easily construct an event-driven network module. It ensures that the code is simple enough.

6. file read/write

The entire system involves two major file reads and writes: Recording and logs. Our commonly used file operation interfaces are all blocking, and processes (threads) will be suspended, waiting for the read and write to complete, and then continuing to execute. As we all know, disk operations are much slower, so this is a bottleneck of request latency.

Asynchronous Io can solve this problem. Reference: http://www.ibm.com/?works/cn/linux/l-async /. However, some people on the Internet say that the AIO interface has a bug. There is not much time, and there is no time for in-depth research. It is still conservative to give up this idea. New technologies are risky and should be used with caution.

Libeio should also be an option. Reference: http://rdc.taobao.com/blog/cs? P = 1524. It uses a thread pool to simulate asynchronous Io. The problem is that our program mainly writes files and generally does not need to know the results. In this case, what is the necessity to use libeio?

Our final solution is to refer to libeio and directly apply for a thread for the bearer process to write files. The main thread is responsible for speech codec and forwarding, completely non-blocking, to ensure low latency. The involved file write operations are implemented by sending an interface to another thread to call the blocking IO Interface. The interface between threads is very simple. Add a path name to the content to be written.

7. Database Operations

Our database uses MySQL. Like reading and writing files, database operations are also a bottleneck of request latency. Throughout the process, we read and write data multiple times. Our practice is: after the system is started, read all the data used during running into the memory, and then directly view the memory. Fortunately, the data is not big, and this work is also simple. If any new modification or deletion involves data. The other thread completes related operations and then notifies the main thread to update the memory.

The final result is that the main thread is completely non-congested and involves blocking operations, all of which are moved to another thread. The two threads do not share any global data and only interact with each other through FIFO.

Redis may be an option to consider. Its data is stored in the memory, and the read/write efficiency is also very good. However, it is still a bit complicated and nosql, which is not familiar to our developers. The "least surprising principle" applies not only to program interfaces, but also to systems.

 

After all these considerations and optimizations, the goal can be basically achieved, and it is simple enough.

 

How to squeeze the server:

After some optimization above, it can basically meet the user's needs. But I know that I have not fully utilized the capabilities of the server (including CPU and IO ). To further squeeze out the server's capabilities, you can double the processing capacity of each process on the server that hosts the server. Each process processes 1000 rows. You can also consider running several more processes.

The control server does not fully utilize multiple cores. You can consider running two bearer programs on the Control Server.

In this case, it is estimated that the number of registered users can be increased to at least and the number of concurrent calls can be increased to at least 3 K.

Increase absolute capacity and concurrency:

The high-concurrency solution in the communication industry may be different from that in the Internet industry due to different business characteristics. The telephone system we use is an example. It is implemented by combining the powerful routing capabilities and end-to-end extremely standard protocols of local clients distributed across the country.

Using a similar solution can also increase the capacity and concurrency of the system. However, the current system capacity can meet the needs of our company's market for a few years, and there is no need for further improvement, keep it simple.

 

Summary:

In many cases, using open-source software is a very good idea to avoid "repeating the wheel of invention ". It also has the temptation to add extra points to your resume.

But sometimes, what you need may not be a "Wheel". Think about the effect of installing a car wheel for a skateboard.

 

What kind of solution is a good solution?

1. meet current needs and 50% of future needs. When a foreseeable demand has more than half the probability of occurrence, it is necessary to consider scalability for it. Otherwise, it will overdesign and impact simplicity.

2. Keep it simple.

 

Finally, I will mention it again.KISS Principle, keep it simple and stupid!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.