Analysis of efficient and stable large-scale website system architecture

Source: Internet
Author: User

With the acceleration of Informatization for large IT enterprises in China, the data volume and access volume of most applications have increased sharply. Large enterprise websites are facing pressure from performance and high data access volume, it also puts forward higher requirements for storage, security, and information retrieval.

 
Websites visited by tens of millions of people at the same time generally have many databases working at the same time. The white point is Database Cluster and concurrency control. Such websites are also relatively real-time. These websites have some common features
Point: the data volume is large, the number of online users is large, the number of concurrent requests is large, the pageview is high, and the response speed is fast. The architecture of each major website is summarized, which mainly improves efficiency and stability, including:

1. Program

Program Development is on the one hand, and system architecture design (hardware + network + software) is on the other hand.

In terms of software architecture, website creation requires a lot of web servers to store static resources, such as movies, videos, and static pages. Do not put static resources together with application servers.

A program written by a good programmer is very concise and has good performance. A junior programmer may make many low-level errors, which is also one of the reasons that affect website performance.

 
Websites must be highly efficient, not just for programmers. Database optimization and program optimization are required. Databases and programs must go hand in performance optimization! Cache also starts at the same time. First, database
Cache and database optimization are completed by DBAs (and this has great potential to be tapped, But we ignore it because we are all programmers ). Second, program optimization, this is very exquisite, compared
For example, it is important to standardize SQL statements and use less in statements.
Multi-purpose or, multi-purpose preparestatement, and avoiding program redundancy, such as searching for data with less dual loops. In addition, we use an excellent open-source framework for support.
Is the most important, you can select spring + ibatis. Because ibatis directly operates SQL and has a caching mechanism. I don't need to talk about the benefits of spring. The IOC mechanism is okay.
Avoid new objects and save the cost. According to my analysis, most of the overhead is generated when new is used and when connected to the database. Please avoid it as much as possible. In addition, you can use some memory testing tools for a demo to illustrate who is faster in Hibernate and ibatis! The front-end can use whatever you want, Struts and webwork. If you think you are good at it, try tapestry.

Using databases may not be able to solve the problem caused by huge access volumes. The addressing time for making a static file hard disk may not be less than the database's search time. Of course, it will take another time to index data. I personally think that portals often have a high click rate of popular materials on the day, and the maximum cache is 1 ~ 2 GB of data, for example:

Take Netease news for http://news.163.com/07/0606/09/3GA0D10N00011229.html

Easy to understand: http: // domain name/year/month/day/category of news/news id.html

The hashtable (key: year-month-day-category-ID, value: News object ), put it in the memory statically (the speed is definitely faster than the hard disk addressing static pages ).

Generally, the Oracle stored procedure and two WebLogic engines are used. The update mechanism is similar. Every time a piece of news is issued, a static page is generated and then sent to the front-end web server, the front-end web is used for load balancing. In addition, a scheduled program is automatically generated every 5-15 minutes. Cache data when publishing news. Of course, the cache will not get bigger and bigger, and expired data will be removed in a specific period of time (such as early morning. Building a large website is far less simple than imagined, and there are hundreds of servers.

In this way, the processing speed of a computer can be greatly increased. If one machine cannot handle the problem, you can use the httpserver cluster to solve the problem.

2. Network
China's networks are divided into China North and South China Telecom and China Netcom. The IP addresses accessed must be differentiated between China North and South China into different networks.

3. Clusters
CDN, gsbl, and DNS Server Load balancer are usually used. Each region has a group of front-end servers, such as Netease And Baidu, which use the DNS Server Load balancer technology. Each channel has a group of front-end servers, one search uses DNS load technology, and all channels share a group of Front-End Server clusters.

Websites use Server Load balancer Based on Linux Clusters and fail to recover, including application servers and database servers, service status detection and High Availability Based on Linux-ha.

Application Server clusters can use Apache + Tomcat clusters and WebLogic clusters. Web Server clusters can use reverse proxy, Nat, or multi-domain name resolution. Squid can also be used, there are many methods, which can be selected as needed.

4. Databases

 
Because tens of millions of people access the website at the same time, many databases usually work at the same time. In other words, database clusters and concurrency control are used to distribute data to different data centers in different geographical locations, avoid occurrence
Power failure. Another point is that the static Web pages of those websites are not true, but the illusion that dynamic web pages are exchanged with static Web pages, which can be opened using urlrewrite.
Source URL er implementation. This kind of website is also relatively real-time, because there is a process when copying data in the database, Hibernate and ecache can be used technically, but such
To make the website work better, you can use a large server such as EJB and WebSphere and WebLogic to support it and use a large database such as oracle.

 
MySQL databases are not recommended for large portal websites unless you are very familiar with MySQL DATA optimization. MySQL database server master-slave mode, using the database
The server synchronizes data between the master and slave servers. The application only writes data to the master server. When reading data, the server selects a slave server or the master server to read the data based on the load, divide data into different services according to different policies
Distribute the database pressure.

For large websites, Oracle is used, and data operations should be performed with as many stored procedures as possible to definitely improve performance. At the same time, DBAs should optimize the database, and the optimized database should be different from the unoptimized one; at the same time, distributed databases can be expanded, and more research will be conducted in the future;

5. Page

From the very beginning, consider using a virtual storage/cluster file system. It allows you to access a large number of concurrent Io resources without any restructuring.

Page data calls must be carefully designed. Some data queries can be performed without the database method, while Lucene can be used for real-time requests. Even if real-time requests are required, Lucene can be used, lucene + compass is excellent.

News websites can use static page storage and the regular update mechanism to reduce the burden on servers. Each small module on the home page can use the Oscache cache so that data is not pulled every time.

 
The front-end Web Accelerator based on static page cache. Its main applications include squid. Squid
It is not true to cache most static resources (images, JS, CSS, etc.) and directly return them to visitors, reducing the load on the application server, instead, dynamic and static Web pages are used.
This can be achieved by using an open-source website er such as urlrewrite. The suffix HTM or HTML does not indicate that the program has generated a static page.
URL rewriting is implemented to improve the coverage of your website in the search engine.

The servers that generate static pages and WWW servers are two different groups of servers. After the pages are generated, they will be sent to the WWW server. Some databases are not relational databases, which is more suitable for information derivation, there are many WWW, mail servers, and routers. Load Balancing is mainly used to solve access bottlenecks.

Disadvantages of static pages:

1) increases the complexity of the program.

2) information management is not conducive

3) The speed is not the fastest

4) hard disk injury

6. Cache

Cache should be used from the very beginning. High-speed cache is a better place to store temporary data, such as tracking temporary files generated by a session of a specific user on a web site, you no longer need to record the data in the database.

 
Instead of using Lucene, you can use cache. Instead, you can use memcached for Distributed caching. If you have money, you can use 10 machines for caching.>
10 Gb of storage space is sufficient. If you have no money, you can work on the page cache and data cache. You can also use Oscache, ehcache, and swarmcache. No
It is said that synchronization is not very good;

You can use memcache for caching and use large memory to cache all the unchanged data. When the modification is made, the cache is notified.
Stage: memcache is a distributed cache product developed by LJ. Many large websites are using this product.
Server and appserver are installed together. Because the cache server does not consume much CPU, and with the support of the cache server, the app
Server does not require too much memory, so it can coexist and use resources more effectively.

The above immature ideas can be gradually refined from a certain level to improve the product performance indicators.

From: http://tech.it168.com/a2008/1231/262/000000262036.shtml

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.