Best practices for low-latency systems

Last Update:2015-06-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Select the correct language

Scripting languages can't be used, although they can run faster and faster, and when you're looking for a few milliseconds of delay you can't afford to explain the language, you want to have a powerful memory model that can be programmed without lock, with Java Scala and C 11 or go in the language of choice.

2. Put everything in memory

I/O kills your latency and ensures that all your data is in memory, which means you manage your data structure yourself and maintain a persistent log so that you can rebuild the original memory state after the machine restarts, and the choice of persistent logs is: Bitcask, Krati, LevelDB and Bdb-je, of course, you can also run a locally persisted memory database such as Redis or MongoDB (Memories >> data), note that the background when synchronizing data to disk may cause some data to crash and need to be loose (Loose).

3. Let the data and processing the same bit colocated

Network hops is faster than disk seeks, which is faster than the disk track, but even so, they add a lot of overhead. Ideally, your data should fit completely into the memory on a single host. If you need to run on more than one host, you should make sure that your data and requests are properly partitioned and that all the necessary data that satisfies a particular request is available locally.

4. Let the system not be fully utilized

Low latency requirements always have resources to handle requests. Do not attempt to put your hardware/software at full load limit operation state. Leave some positions open for use.

5. Minimize context Switching

When you use limited resources for more complex calculations, the CPU is busy switching between limited resources. You want to limit the number of threads based on the number of CPU cores so that each thread can work for its core.

6. Keep the Order of reading

All forms of storage space, whether based on flash or memory, can be significantly improved by sequential usage performance. When a continuous read memory is emitted, prefetching at the memory level is triggered as if it were at the CPU cache level. If done properly, the next data will always be present in the L1 cache before you need it. This simple way can help handle a large number of arrays or the weight level of the original type used. Further, it should be avoided at all costs by using a linked list or an array of objects.

7. Let your write operations batch quantization

This may sound counterintuitive, but you can actually get a noticeable improvement in performance by bulk writing. However, there is a misconception that this means that the system should have a pause waiting time before any number of batch operations. Instead, a thread rotates the spin to perform I/O tightly. Upon completion of a batch of write operations, a batch of data writes will occur immediately, which is a very fast and adaptive system.

8. Respect your cache

In all of these optimizations, memory access will quickly become a bottleneck. Pin your thread to your core to help reduce CPU cache pollution, and sequential I/O can also help preload the cache. In addition, you should keep the original data type used at maximum capacity so that more data is put into the cache. The tuning cache algorithm ensures that all data is in the cache.

9. Be as non-clogging as possible

With non-blocking 0 the waiting data structures and algorithms become friends. Every time you use a lock, the stack will go deep into the operating system to mediate, each lock is a huge overhead. Typically, if you know what you are doing, you can bypass the lock by understanding the JVM,C11 or go memory model.

10. As asynchronously as possible

Any processing, especially I/O, is not an absolute necessity for building a response, and should be implemented asynchronously beyond the critical execution path.

11. Parallel as possible

Any processing, especially I/O, can occur in parallel, if possible in parallel. For example, if your high availability policy includes transactions to disk and sends transactions to the secondary server, these can occur in parallel.

Best practices for low-latency systems

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Best practices for low-latency systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Best practices for low-latency systems

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support