10 secrets to killing IIS server performance

Last Update:2013-11-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Each rule below will effectively affect the performance and scalability of the Code. In other words, try not to follow the rules! Next, I will explain how to break them down to improve performance and scalability.

1. Multiple objects should be allocated and released.

You should try to avoid excessive memory allocation, because memory allocation may be expensive. Releasing memory blocks may be more expensive, because most allocation operators always attempt to Connect neighboring released memory blocks to become larger blocks. Until Windows NT? 4.0 service pack 4.0: in multi-thread processing, the System Heap usually runs badly. The heap is protected by a global lock and cannot be extended on a multi-processor system.

2. processor cache should not be considered

Most people know that hard page errors caused by the virtual memory subsystem are expensive and it is best to avoid them. However, many people think that there is no difference between other memory access methods. This is incorrect since 80486. The modern CPUs is much faster than RAM. RAM requires at least two levels of memory cache. The high-speed L1 cache can store 8 KB data and 8 KB instructions, the slow L2 cache can store hundreds of KB of data and code, which are mixed with the code. A reference in the memory area of the L1 cache requires a clock period. The reference in the L2 cache requires 4 to 7 clock periods, while the reference in the main memory requires many processor clock periods. The next digit will soon have more than 100 clock cycles. In many ways, the cache is like a small, high-speed, virtual memory system.

The basic memory units related to the cache are not bytes but cache columns. The Pentium cache column is 32 bytes wide. The Alpha cache column contains 64 bytes in width. This means that there are only 512 slots in the L1 cache for code and data. If multiple pieces of data are used together (time location) but not stored together (space location), the performance will be poor. The space position of the array is very good, while the position of the list connected to each other and other pointer-based data structures is often very poor.

Packaging data into the same cache column usually improves performance, but it also damages the performance of the multi-processor system. It is difficult for the memory subsystem to coordinate the cache between processors. If a read-only data used by all processors and a data frequently updated by one processor share a cache column, the cache will take a long time to update the copy of the cache column. This Ping-Pong high-speed game is often called "cache sloshing ". Sloshing can be avoided if the read-only data is in a different cache column.

Code space optimization is more efficient than code speed optimization. The fewer the code, the fewer pages the code occupies. In this way, fewer running settings and page errors are required, and fewer cache columns are occupied. However, some core functions should be speed optimized. You can use profiler to identify these functions.

3. Never cache frequently used data.

Software caches can be used by various applications. When a computing cost is high, you save a copy of the result. This is a typical time-space compromise: sacrifice some storage space to save time. If it is done well, this method may be very effective.

You must cache the data correctly. If error data is cached, storage space is wasted. If the cache is too large, the memory that other operations can use will be very small. If the cache is too small, the efficiency will be very low, because you must recalculate the data that is missing from the cache. If sensitive data is cached for too long, the data will become obsolete. Generally, servers are more concerned with speed rather than space, so they need more caching than the desktop system. You must remove unnecessary caches on a regular basis. Otherwise, running settings may occur.

4. Create multiple threads. The more threads, the better.

It is important to adjust the number of threads that play a role in the server. If the thread is I/O-bound, it will take a lot of time to wait for the completion of I/O-a blocked thread is a thread that does not do any useful work. Adding additional threads can increase the throughput, but adding too many threads will reduce the server performance, because context switching will become a major overhead. There are three reasons why the speed of context switching should be low: Context switching is a simple overhead, which does not have any benefit for the work of the application; Context switching uses up valuable clock cycles; the worst is, context switching fills up the cache of the processor with useless data, which is costly to replace.

There are many things that rely on your threaded structure. One thread on each client is definitely not suitable. The scalability of a large number of clients is poor. Context switching becomes unbearable, and Windows NT uses up resources. The thread pool model works better. In this method, a worker thread pool processes a request column, because Windows 2000 provides the corresponding APIs, such as QueueUserWorkItem.

5. Use global locks for Data Structures

The simplest way to secure data threads is to put it in a big lock. For simplicity, everything uses the same lock. This method has a problem: serialization. To get the lock, every thread to process the data must wait in queue. If a thread is blocked by a lock, it is not doing anything useful. This problem is not common when the server load is light, because only one thread may need to be locked at a time. When the load is heavy, the fierce competition for locks may become a major problem.

Imagine an accident on a multi-lane highway, and all the vehicles on the highway were redirected to a narrow road. If there are few vehicles, the effect of this conversion on the traffic speed can be ignored. If there are a lot of vehicles, when the vehicles are slowly incorporated into the single channel, the traffic jam will extend several miles.

Several technologies can reduce lock competition.

· Do not over-protect the data. That is to say, it is not necessary to lock the data. The lock is held only when necessary, and the time should not be too long. It is important not to use locks around large code blocks or frequently executed code.
· Split the data so that it can be protected with an independent lock. For example, a symbol table can be separated by the first letter of the identifier, so that when you modify the value of the symbol whose name starts with Q, the value of the symbol whose name starts with H will not be read.
· Use the APIs Interlocked series (InterlockedIncrement, InterlockedCompareExchangePointer, and so on) to automatically modify data without the need for a lock.
· When the data is not frequently modified, you can use the multi-reader/single-writer lock. You will get better concurrency, even though the lock operation cost will be higher and you may starve the author to death.
· Use the cyclic counter in the key part. See the SetCriticalSectionSpinCount API in Windows NT 4.0 service pack 3.
· If you cannot get the lock, use TryEnterCriticalSection and do some other useful work.
High competition leads to serialization and serialization, which leads to lower CPU utilization. This forces users to join more threads and the result is worse.

6. Do not pay attention to multi-processor machines

Your code runs worse on a multi-processor system than on a single-processor system, which may be disgusting. A natural idea is that running N times on an N-dimensional system will be better. The reason for poor performance is competition: Lock competition, bus competition, and/or cache column competition. All processors compete for the ownership of shared resources, rather than doing more work.

If you have to write multi-threaded applications, you should test the strength and performance of your applications on the multi-processor box. A single processor system executes threads in multiple parts of time to provide an illusion of concurrency. Multi-processor boxes have real concurrency, making it easier to compete in a competitive environment and competition.

)

7. Always use modular calls; they are interesting.

Using synchronous modular calls to perform I/O operations is suitable for most desktop applications. However, they are not good at using the CPU (s) on the server. I/O operations require millions of clock cycles, which can be used better. With asynchronous I/O, you can get a significant increase in user request rate and I/O throughput, But it increases extra complexity.

If you have modular calls or I/O operations that take a long time, you should test how many resources are allocated to them. Do you want to use all the threads? Generally, it is better to use a limited number of threads. Build a small thread pool and queue, and use the queue to arrange thread work to complete modular calls. In this way, other threads can pick up and process your non-modular requests.

8. Do not perform measurement

When you can measure what you are talking about and use numbers to express it, it means you have a certain understanding of it; but if you cannot use a number table, your knowledge is poor and unsatisfactory; it may be the beginning of knowledge, but at this time you cannot raise your thoughts to the level of science.

-Lord Kelvin (William Thomson)

Without measurement, you will not be able to understand the features of the application. You are exploring in the dark, half relying on speculation. If you do not identify performance issues, you cannot make any improvements or make a workload plan.

Measurements include black box measurements and profiling. Black box measurement means to collect data displayed by performance counters (memory usage, context switching, CPU utilization, etc.) and external detection tools (flux, reflection time, etc. To profile your code, you compile a tool version of the code, then run it under various conditions, and collect statistics on the execution time and process call frequency.

Measurements are useless if they are not used for analysis. Measurements not only tell you where a problem occurs, but they can even help you locate the problem, but they cannot tell you why the problem occurs. Analyze the problem so that you can correct them correctly. It is necessary to fundamentally solve the problem rather than stay on the surface.

After you make the changes, you need to make a new measurement. You need to know whether your changes are valid. Changes may also expose other performance problems, and the measurement-analysis-correction-measurement cycle will start again. You must also perform regular measurements to detect performance degradation problems.

9. A single user and single request test method should be used.

A common problem in writing ASP and ISAPI applications is that only one browser is used to test the application. When they applied their applications on the Internet, they discovered that their applications could not handle high loads, and the amount of traffic and response time were poor.

Testing with a browser is necessary but not enough. If the browser does not respond fast enough, you will know that you are in trouble. But even if it is very fast when using a browser, you do not know how much load it can handle. What happens if a dozen users request at the same time? What about the one hundred? What kind of throughput can your application tolerate? What response time does it provide? What if these numbers are used in light loads? What about medium load? What about heavy load? What will happen to your application on a multi-processor machine? Perform a strength test on your application, which is basic for identifying performance problems in the bugs.
Similar Load Testing considerations apply to all server applications.

10. The actual environment should not be used.

People tend to adjust applications only in a few specific, manual environments (as shown in benchmarks below. Select various situations that correspond to the actual situation and optimize various operations.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

10 secrets to killing IIS server performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

10 secrets to killing IIS server performance

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support