iis| Server | performance
Each of the following commandments will effectively affect the performance and scalability of your code. In other words, try not to follow the precepts as much as possible! Below, I'll explain how to break them to improve performance and scalability.
1. Multiple objects should be allocated and released
You should try to avoid excessive allocation of memory because memory allocation can be costly. Freeing blocks of memory can be more expensive because most assignment operators always attempt to connect the adjacent freed blocks of memory into larger chunks. Until Windows NT? 4.0 Service Pack 4.0, in multithreading, the system heap usually runs badly. The heap is protected by a global lock and is not extensible on multiprocessor systems.
2. Use of processor caching should not be considered
Most people know that the hard page error caused by the virtual memory subsystem is costly and is best avoided. But many people think that other memory access methods are no different. This view has been wrong since 80486. Modern CPUs is much faster than RAM, RAM requires at least level two memory cache, high speed L1 cache saves 8KB of data and 8KB instructions, and slower L2 cache holds hundreds of KB of data and code, which is mixed with code. A reference to an area of memory in the L1 cache requires a clock cycle, and a reference to the L2 cache requires 4 to 7 clock cycles, while the reference to the main memory requires many processor cycles. The latter figure will soon be more than 100 clock cycles. In many ways, caching is like a small, high-speed, virtual memory system.
The basic memory unit associated with caching is not a byte but a cache column. The Pentium cache column has a width of 32 bytes. The Alpha cache column is 64 bytes wide. This means that there are only 512 slot in the L1 cache for code and data. If multiple data is used together (a time position) and not stored together (space location), performance can be poor. Arrays are well positioned, while interconnected lists and other pointers based data structures tend to be poorly positioned.
Packaging data into the same cache column can often improve performance, but it can also disrupt the performance of multiprocessor systems. The memory subsystem is difficult to reconcile the cache between processors. If a read-only data used by all processors is shared with a cached column with data that is used by one processor and frequently updated, caching will take a long time to update the copy of the cached column. This ping-pong high-speed game is often referred to as "caching sloshing". If the read-only data is in a different cache column, you can avoid sloshing.
Space optimization of code is more efficient than speed optimization. The less code you have, the less the page your code occupies, and the less you will need to run settings and the resulting page errors, and fewer cache columns to occupy. However, some core functions should be optimized for speed. You can use Profiler to identify these functions.
3. Never cache frequently used data.
Software caching can be used by a variety of applications. When a computational cost is high, you save a copy of the result. This is a typical space-time tradeoff: sacrificing some storage space to save time. If done well, this method can be very effective.
You must cache correctly. If the error data is cached, the storage space is wasted. If you cache too much, there will be little memory available for other operations. If you cache too little, the efficiency is low because you have to recalculate the data that is missing from the cache. If the time sensitive data is cached for too long, the data will become obsolete. Generally, servers are more concerned with speed than space, so they have more cache than desktop systems. Be sure to periodically remove unused cache, otherwise there will be run Setup problems.
4. Multiple threads should be created, the more the better.
It is important to adjust the number of threads that are working in the server. If the thread is i/o-bound, it will take a lot of time to wait for the I/O to complete-a blocked thread is a thread that does not do any useful work. Adding additional threads can increase the flux, but adding too many threads will degrade the performance of the server, because context switching will be a significant overhead. The context exchange rate should be low for three reasons: Context switching is a simple overhead that does nothing to benefit the application's work; the context Exchange runs out of valuable clock cycles; Worst of all, context swapping fills the processor's cache with useless data, which is costly to replace.
There are a lot of things that depend on your threading structure. One thread per client is absolutely inappropriate. Because for a large number of users, it is not scalable. The context exchange became unbearable and Windows NT ran out of resources. The thread pool model works better, and in this way a worker thread pool will process a request column because Windows 2000 provides the appropriate APIs, such as QueueUserWorkItem.
5. Global locks should be used on data structures
The easiest way to make data thread safe is to put it on a large lock. For the sake of simplicity, all things are locked with the same lock. There is a problem with this approach: serialization. In order to get a lock, each thread that handles data must be queued. If the thread is blocked by a lock, it is not doing anything useful. This problem is not common when the server is lighter, because only one thread at a time may require a lock. In heavy loads, a fierce scramble for locks could be a big problem.
Imagine an accident on a multiple-lane highway where all the vehicles on the freeway were diverted to a narrow road. If there are few vehicles, the effect of this conversion on the rate of traffic flow can be ignored. If there are many vehicles, traffic jams can stretch for several miles when the vehicle is slowly merged into the single channel.
There are several technologies that can reduce lock competition.
· Don't be overly protective, that is to say, it is not necessary not to lock the data. Only need to hold the lock, and time not too long. It is important not to use locks unnecessarily in code around large pieces of code or frequently executed.
· The data is segmented so that it can be protected with a single set of locks. For example, a symbol table can be separated by the first letter of the identifier, so that when you modify the value of a symbol whose first name begins with Q, you do not read the value of the symbol whose name begins with H.
· Use the APIs ' interlocked series (interlockedincrement,interlockedcompareexchangepointer, etc.) to automatically modify the data without the need for a lock.
· Multiple reader/single author (multi-reader/single-writer) locks can be used when data is not often modified. You will get better concurrency, even though the cost of the lock operation will be higher and you may risk starving the author.
· Use the loop counter in the critical section. See the Setcriticalsectionspincount API in Windows NT 4.0 Service Pack 3.
· If you can't get the lock, use tryentercriticalsection and do some other useful work.
High competition causes Serialization,serialization to reduce CPU utilization, prompting users to add more threads, and things get worse.
6. No need to pay attention to multiprocessor machines
It may be disgusting that your code runs worse on multiprocessor systems than it does on a single-processor system. A natural idea is that it would be better to run N times on an n-dimensional system. Poor performance is due to competition: Lock competition, bus competition, and/or cache column competition. Processors are competing for ownership of shared resources, rather than doing more work.
If you have to write multithreaded applications, you should test your application for strength and performance on a multiprocessor box. Single-processor systems provide an illusion of concurrency by executing threads over time. Multi-processor boxes have real concurrency, and competitive environments and competition are more likely to occur.
7. Modular calls should always be used; they're interesting.
Using synchronous modulation to perform I/O operations is appropriate for most desktop applications. However, they are not a good way to use the CPU (s) on the server. I/O operations take millions of clock cycles to complete, and these clock cycles could have been better utilized. With asynchronous I/O you can get a significantly higher user request rate and I/O flux, but add additional complexity.
If you have a modular call or I/O operation that takes a long time, you should test how much resources to allocate to them. Do you want to use all threads or is there a limit? Generally, it is better to use a limited number of threads. Build a small thread pool and queues, using queues to arrange the work of the threads to complete the modular call. This way, other threads can pick up and process your non modular request.
8. Do not measure
When you can measure what you are talking about and use numbers to express it, this means that you have a certain understanding of him, but if you can not use digital expression, your knowledge is poor and unsatisfactory; this may be the beginning of knowledge, but it is impossible for you to raise your mind to the level of science.
-Lord Kelvin (William Thomson)
You cannot understand the characteristics of an application without measuring it. You grope in the dark, half by guessing. If you do not recognize performance issues, you cannot make any improvements or make a workload plan.
Measurements include black box measurements and profiling. The black box measurement means collecting data displayed by performance counters (memory usage, context Exchange, CPU utilization, etc.) and external instrumentation (flux, reflection time, etc.). To profile your code, you compile a tool version of the code, run it under various conditions, and collect statistics about the execution time and the frequency of the process call.
Measurement is not useful if it is not used for analysis. Measurement will not only tell you there is a problem, but it can even help you find out where the problem is, but it can't tell you why there is a problem. Analyze the problem so that you can correct them correctly. To fundamentally solve the problem rather than stay on the surface of the phenomenon.
When you make changes, you need to measure again. You need to know if your changes are valid. Changes may also expose other performance issues, and the measurement-analysis-correction-and-measurement cycle will start over. You also have to do a regular measurement to find out the problem of performance degradation.
9. Single-user, single-request test methods should be used.
One common problem with writing ASP and ISAPI applications is to test the application with only one browser. When they apply their programs on the Internet, they find that their applications cannot handle the high load, and that the flux and response times are pathetic.
Testing with a browser is necessary but not enough. If the browser doesn't react fast enough, you'll know you're in trouble. But even if it's quick to use a browser, you don't know how it can handle the load. What happens if more than 10 users request it at the same time? What about 100? What kind of flux can your application tolerate? What kind of reaction time does it provide? What will these numbers be like in light load? What about the medium load? What about overloading? What happens to your application on a multiprocessor machine? The strength test for your application is fundamental to finding the bugs discovery performance problem.
Similar load test considerations apply to all server applications.
10. The actual environment should not be used.
People tend to adjust applications only in a few specific, artificial environments (below benchmarks). It is important to select the various situations that correspond to the actual situation and to optimize for the various operations. If you don't, your users and reviewers will do it, and they'll judge your application for good or bad.