Scalability (extensibility) is a design indicator of the computational processing power of software systems, high scalability represents a kind of elasticity, in the expansion of the system growth process, the software can guarantee exuberant vitality, through very few changes or even just the acquisition of hardware equipment, can achieve the entire system processing capacity of linear growth, Achieve high throughput and low latency performance.
There is an essential difference between scalability and pure performance tuning, scalability is a combination of high performance, low cost and maintainability, and balanced, scalable, smooth linear performance improvement, more emphasis on the horizontal scaling of the system, distributed computing through inexpensive servers , while the general performance optimization is only a single machine performance index optimization. What they have in common is a focused selection between throughput and latency based on the characteristics of the application system, which, of course, leads to a cap theorem constraint after a horizontal scaling partition.
Software scalability design is very important, but more difficult to grasp, the industry is trying to use cloud computing or high concurrency language and other ways to save developer Energy, but no matter what technology, if the application system is monolithic, such as relying heavily on the database, the system reached a certain scale of access, The load is concentrated on one or two database servers, where scaling is difficult, as Hibernate framework creator Gavin King says: Relational databases are the most extensible. Performance and Scalability What is a performance issue. If your system is slow to access a user, it is a performance problem and what is an extensibility issue. If your system is fast for a user, it will slow down with the user's growing number of high traffic.
Latency and Throughput
Latency and throughput are a pair of metrics that measure scalability, and we want to get a system architecture with low latency and high throughput. The so-called low latency, that is, the user can feel the system response time, for example, a Web page in a few seconds to open, the shorter the lower the delay, and throughput indicates how many users can enjoy this low latency, if the concurrent user volume is very large, users feel that the opening of the page is slow, which means that the system architecture throughput needs to be improved.
The goal of extensibility is to achieve maximum throughput with acceptable latency. Reliability (availability) Objective: To obtain the consistency of data updates with acceptable latency.