Best practices for low-latency systems

Source: Internet
Author: User

1. Select the correct language


Scripting languages can't be used, although they can run faster and faster, and when you're looking for a few milliseconds of delay you can't afford to explain the language, you want to have a powerful memory model that can be programmed without lock, with Java Scala and C 11 or go in the language of choice.


2. Put everything in memory


I/O kills your latency and ensures that all your data is in memory, which means you manage your data structure yourself and maintain a persistent log so that you can rebuild the original memory state after the machine restarts, and the choice of persistent logs is: Bitcask, Krati, LevelDB and Bdb-je, of course, you can also run a locally persisted memory database such as Redis or MongoDB (Memories >> data), note that the background when synchronizing data to disk may cause some data to crash and need to be loose (Loose).


3. Let the data and processing the same bit colocated


Network hops is faster than disk seeks, which is faster than the disk track, but even so, they add a lot of overhead. Ideally, your data should fit completely into the memory on a single host. If you need to run on more than one host, you should make sure that your data and requests are properly partitioned and that all the necessary data that satisfies a particular request is available locally.


4. Let the system not be fully utilized


Low latency requirements always have resources to handle requests. Do not attempt to put your hardware/software at full load limit operation state. Leave some positions open for use.


5. Minimize context Switching


When you use limited resources for more complex calculations, the CPU is busy switching between limited resources. You want to limit the number of threads based on the number of CPU cores so that each thread can work for its core.


6. Keep the Order of reading


All forms of storage space, whether based on flash or memory, can be significantly improved by sequential usage performance. When a continuous read memory is emitted, prefetching at the memory level is triggered as if it were at the CPU cache level. If done properly, the next data will always be present in the L1 cache before you need it. This simple way can help handle a large number of arrays or the weight level of the original type used. Further, it should be avoided at all costs by using a linked list or an array of objects.


7. Let your write operations batch quantization


This may sound counterintuitive, but you can actually get a noticeable improvement in performance by bulk writing. However, there is a misconception that this means that the system should have a pause waiting time before any number of batch operations. Instead, a thread rotates the spin to perform I/O tightly. Upon completion of a batch of write operations, a batch of data writes will occur immediately, which is a very fast and adaptive system.


8. Respect your cache


In all of these optimizations, memory access will quickly become a bottleneck. Pin your thread to your core to help reduce CPU cache pollution, and sequential I/O can also help preload the cache. In addition, you should keep the original data type used at maximum capacity so that more data is put into the cache. The tuning cache algorithm ensures that all data is in the cache.


9. Be as non-clogging as possible


With non-blocking 0 the waiting data structures and algorithms become friends. Every time you use a lock, the stack will go deep into the operating system to mediate, each lock is a huge overhead. Typically, if you know what you are doing, you can bypass the lock by understanding the JVM,C11 or go memory model.


10. As asynchronously as possible


Any processing, especially I/O, is not an absolute necessity for building a response, and should be implemented asynchronously beyond the critical execution path.


11. Parallel as possible


Any processing, especially I/O, can occur in parallel, if possible in parallel. For example, if your high availability policy includes transactions to disk and sends transactions to the secondary server, these can occur in parallel.

Best practices for low-latency systems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.