Original: http://www.javaranger.com/archives/1264
With respect to throughput (throughput) and latency (latency), if you want to perform performance optimizations, these two concepts must be known, and they seem simple but not. Many people have thought that high throughput means low latency, and high latency means less throughput. The following analogy can explain this view as wrong.
We can compare the network sending packets to the ATM in the street to get the money. It takes a minute for everyone to start using the ATM to get the money to end, so the latency here is 60 seconds, what about throughput? Of course it's 1/60 people per second. Now that the bank has upgraded their ATM operating system, everyone can complete the withdrawals in just 30 seconds! The delay is 30 seconds and the throughput is 1/30 people per second. Well understood, but the problems ahead are still there, right? Don't panic, look underneath.
Because there are more people around here, the bank now decides to add an ATM machine here, with a total of two ATMs. Now, a minute can allow 4 people to complete the money, although you go to the queue to withdraw money in front of the ATM or 30 seconds! In other words, the delay does not change, but the throughput increases! Visible, throughput can be increased without the need to reduce latency.
Well, now the bank has made a new decision to improve the service: Every customer who comes to fetch the money has to fill out a questionnaire next to it, which takes 30 seconds. So, now that you're going to get the money, it's 60 seconds since you started using the ATM to complete the questionnaire! In other words, the delay is 60 seconds. And the throughput doesn't change at all! In a minute you can come in 4 people! Visible, the delay increases, and the throughput does not change.
As we can see from this metaphor, the latency measurement is the amount of time each customer (each application) feels, and the throughput is measured by the efficiency of the entire bank (the entire system), which is two completely different concepts.
Just as banks do not only need to improve their efficiency in order to satisfy their customers, but also to minimize the time spent by their clients in the banking business, the system should not only maximize throughput, but also make each request delay as small as possible. This is a two different goal.
Understanding of throughput and latency ...