1. Features of large web site software system
The large-scale Internet application system has the following features:
- High concurrency, high traffic: the need to face highly concurrent users, large traffic access.
- High availability: System 7x24 hour service.
- Massive data: Need to store and manage massive amounts of data.
- Users are widely distributed, the network situation is complex: many large-scale Internet users to provide services to the global user, a wide range of users around the network situation varies widely.
- The network environment is bad: due to the openness of the Internet, Internet stations are more vulnerable to attack.
- Rapid change in demand, frequent release: and the traditional version of the release frequency of software, Internet products for rapid adaptation to the market, to meet user demand, the frequency of product release is very high.
- Progressive development: Almost all of the large Internet sites are starting from the small web site, gradually developed.
2. Large-scale website architecture evolution and development process
The technical challenges of large-scale websites mainly come from large users, high concurrent access and massive data, and large-scale website architecture is mainly to solve such problems.
- Initial phase of the site architecture
Small sites have less access and require only one server. All resources, such as applications, databases, files, and so on, are on a single server.
- Application data and Data service separation
More and more user access leads to worse performance, and more and more data leads to insufficient storage. At this point, the application and data need to be detached, and the Web site uses three servers: the application server, the file server, and the database server. The application server needs to handle a lot of business logic, which should require faster and more powerful CPUs; The database server requires fast hard disk retrieval and data caching, requiring faster hard disks and larger storage space, and a file server that needs to store a large number of user-uploaded files, thus requiring a larger hard disk. After the application and data separation, the network concurrency and data storage space has been greatly improved.
- Using caching to improve Web site performance
Database stress is too high to cause access delays, which in turn affects the performance of the entire site. Site Access features follow the 28 law: 80% of business access is concentrated on 20% of the data. The cache used by the site can be divided into two types: caching the local cache on the application server and the remote caching of the dedicated distributed cache server. Local cache access is faster but is subject to application server memory limitations, with limited cache data, and scenarios where the application is competing for memory. Remote distributed cache can use clustering to deploy large memory servers as a dedicated cache server. Can be theoretically not limited by memory capacity of the cache service.
- Improve Web site concurrency with Application server clusters
After using the cache, the data access pressure is effectively mitigated, but the single application server can handle the request connection is limited, during the peak site visit, the application server becomes the bottleneck of the whole website. Using clusters is a common means of solving high concurrency and massive data problems. As far as the site architecture is concerned, as long as the load pressure can be improved by adding a single server, the server can continuously improve the performance of the system in the same way, thus realizing the scalability of the system.
The load Balancing dispatch service can distribute the access request from the user's browser to any server in the application server cluster, and if there are more users, add more application servers in the cluster, and the load pressure of the application server will not become the bottleneck of the whole website.
- Database read/write separation
After the use of the cache, the vast majority of data access is not through the database can be completed, but still have a small portion of the read operation and all the write operation still need to access the database, after the site reached a certain size, the database because of high load pressure and become the bottleneck of the system. At present, most of the mainstream database provides master-slave hot-standby function, through the configuration of two master-slave relational database, one database server can synchronize data updates to another database server.
When the application server writes the data, it accesses the primary database, and the primary database synchronizes the data updates to the slave database through the master-subordinate replication mechanism, so that when the application server reads the data, it can retrieve the data from the database. In order to facilitate application access to read-write separated databases, the application server side usually uses a dedicated data access module, so that the database read and write separation to the application transparent.
Large Web site Architecture evolution