1. Select the correct language
Scripting languages can't be used, although they can run faster and faster, and when you're looking for a few milliseconds of delay you can't afford to explain the language, you want to have a powerful memory model that can be programmed without lock, with Java Scala and C 11 or go in the language of choice.
2. Put everything in memory
I/O kills your latency and ensures that all your data is in memory, which means you manage your data structure yourself and maintain a persistent log so that you can rebuild the original memory state after the machine restarts, and the choice of persistent logs is: Bitcask, Krati, LevelDB and Bdb-je, of course, you can also run a locally persisted memory database such as Redis or MongoDB (Memories >> data), note that the background when synchronizing data to disk may cause some data to crash and need to be loose (Loose).
3. Let the data and processing the same bit colocated
Network hops is faster than disk seeks, which is faster than the disk track, but even so, they add a lot of overhead. Ideally, your data should fit completely into the memory on a single host. If you need to run on more than one host, you should make sure that your data and requests are properly partitioned and that all the necessary data that satisfies a particular request is available locally.
4. Let the system not be fully utilized
Low latency requirements always have resources to handle requests. Do not attempt to put your hardware/software at full load limit operation state. Leave some positions open for use.
5. Minimize context Switching
When you use limited resources for more complex calculations, the CPU is busy switching between limited resources. You want to limit the number of threads based on the number of CPU cores so that each thread can work for its core.
6. Keep the Order of reading
All forms of storage space, whether based on flash or memory, can be significantly improved by sequential usage performance. When a continuous read memory is emitted, prefetching at the memory level is triggered as if it were at the CPU cache level. If done properly, the next data will always be present in the L1 cache before you need it. This simple way can help handle a large number of arrays or the weight level of the original type used. Further, it should be avoided at all costs by using a linked list or an array of objects.
7. Let your write operations batch quantization
This may sound counterintuitive, but you can actually get a noticeable improvement in performance by bulk writing. However, there is a misconception that this means that the system should have a pause waiting time before any number of batch operations. Instead, a thread rotates the spin to perform I/O tightly. Upon completion of a batch of write operations, a batch of data writes will occur immediately, which is a very fast and adaptive system.
8. Respect your cache
In all of these optimizations, memory access will quickly become a bottleneck. Pin your thread to your core to help reduce CPU cache pollution, and sequential I/O can also help preload the cache. In addition, you should keep the original data type used at maximum capacity so that more data is put into the cache. The tuning cache algorithm ensures that all data is in the cache.
9. Be as non-clogging as possible
With non-blocking 0 the waiting data structures and algorithms become friends. Every time you use a lock, the stack will go deep into the operating system to mediate, each lock is a huge overhead. Typically, if you know what you are doing, you can bypass the lock by understanding the JVM,C11 or go memory model.
10. As asynchronously as possible
Any processing, especially I/O, is not an absolute necessity for building a response, and should be implemented asynchronously beyond the critical execution path.
11. Parallel as possible
Any processing, especially I/O, can occur in parallel, if possible in parallel. For example, if your high availability policy includes transactions to disk and sends transactions to the secondary server, these can occur in parallel.
Best practices for low-latency systems