10 Commandments to stifle server performance (turn, connect to the previous article)

Source: Internet
Author: User
Tags copy data structures connect requires thread
Server | Performance should allocate and release multiple objects

You should try to avoid excessive allocation of memory because memory allocation can be costly. Freeing blocks of memory can be more expensive because most assignment operators always attempt to connect the adjacent freed blocks of memory into larger chunks. Until Windows NT? 4.0 Service Pack 4.0, in multithreading, the system heap usually runs badly. The heap is protected by a global lock and is not extensible on multiprocessor systems.

You should not consider using the processor cache

Most people know that the hard page error caused by the virtual memory subsystem is costly and is best avoided. But many people think that other memory access methods are no different. This view has been wrong since 80486. Modern CPUs is much faster than RAM, RAM requires at least level two memory cache, high speed L1 cache saves 8KB of data and 8KB instructions, and slower L2 cache holds hundreds of KB of data and code, which is mixed with code. A reference to an area of memory in the L1 cache requires a clock cycle, and a reference to the L2 cache requires 4 to 7 clock cycles, while the reference to the main memory requires many processor cycles. The latter figure will soon be more than 100 clock cycles. In many ways, caching is like a small, high-speed, virtual memory system.

The basic memory unit associated with caching is not a byte but a cache column. The Pentium cache column has a width of 32 bytes. The Alpha cache column is 64 bytes wide. This means that there are only 512 slot in the L1 cache for code and data. If multiple data is used together (a time position) and not stored together (space location), performance can be poor. Arrays are well positioned, while interconnected lists and other pointers based data structures tend to be poorly positioned.

Packaging data into the same cache column can often improve performance, but it can also disrupt the performance of multiprocessor systems. The memory subsystem is difficult to reconcile the cache between processors. If a read-only data used by all processors is shared with a cached column with data that is used by one processor and frequently updated, caching will take a long time to update the copy of the cached column. This ping-pong high-speed game is often referred to as "caching sloshing". If the read-only data is in a different cache column, you can avoid sloshing.

Space optimization of code is more efficient than speed optimization. The less code you have, the less the page your code occupies, and the less you will need to run settings and the resulting page errors, and fewer cache columns to occupy. However, some core functions should be optimized for speed. You can use Profiler to identify these functions.

Never cache frequently used data.

Software caching can be used by a variety of applications. When a computational cost is high, you save a copy of the result. This is a typical space-time tradeoff: sacrificing some storage space to save time. If done well, this method can be very effective.

You must cache correctly. If the error data is cached, the storage space is wasted. If you cache too much, there will be little memory available for other operations. If you cache too little, the efficiency is low because you have to recalculate the data that is missing from the cache. If the time sensitive data is cached for too long, the data will become obsolete. Generally, servers are more concerned with speed than space, so they have more cache than desktop systems. Be sure to periodically remove unused cache, otherwise there will be run Setup problems.

Multiple threads should be created, the more the better.

It is important to adjust the number of threads that are working in the server. If the thread is i/o-bound, it will take a lot of time to wait for the I/O to complete-a blocked thread is a thread that does not do any useful work. Adding additional threads can increase the flux, but adding too many threads will degrade the performance of the server, because context switching will be a significant overhead. The context exchange rate should be low for three reasons: Context switching is a simple overhead that does nothing to benefit the application's work; the context Exchange runs out of valuable clock cycles; Worst of all, context swapping fills the processor's cache with useless data, which is costly to replace.

There are a lot of things that depend on your threading structure. One thread per client is absolutely inappropriate. Because for a large number of users, it is not scalable. The context exchange became unbearable and Windows NT ran out of resources. The thread pool model works better, and in this way a worker thread pool will process a request column because Windows 2000 provides the appropriate APIs, such as QueueUserWorkItem.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.