After setting up a dial-up access platform in CERNET, we were engaged in front-end development of search engines in Yahoo & 3721, and processed architecture upgrades in large-scale community sows in mop, at the same time, I have worked with and developed many large and medium-sized website modules. Therefore, I have accumulated some experience in large websites to cope with high-load and concurrent solutions. I can discuss this with you.
A small website, such as a personal website, can be implemented using the simplest HTML static page. With some images for beautification, all the pages are stored in a directory, such websites have very simple requirements on system architecture and performance. With the increasing diversity of Internet services, website-related technologies have been subdivided into many aspects after years of development, especially for large websites, the technology is widely used. From hardware to software, programming language, database, Webserver, firewall and other fields, there are high requirements, it is no longer comparable to the original simple HTML static website.
Large websites, such as portal websites. In the face of a large number of user access and high concurrency requests, the basic solution focuses on the following aspects: use high-performance servers, high-performance databases, high-efficiency programming languages, and high-performance WEB containers. However, in addition to these aspects, it is impossible to fundamentally solve the high load and high concurrency problems faced by large websites.
The solutions provided above also mean a greater investment to a certain extent, and these solutions have bottlenecks and do not have good scalability, I will talk about some of my experiences from the perspectives of low cost, high performance, and high scalability.
1. HTML static
As we all know, the most efficient and least consumed HTML pages are purely static html pages, so we try our best to make the pages on our website adopt static pages, this simplest method is actually the most effective method. However, for websites with a large amount of content and frequent updates, we cannot manually implement them all, so we have a common information publishing system CMS, news channels such as the portals we often visit, and even other channels, are managed and implemented through the information publishing system, the information publishing system can automatically generate static pages based on the simplest information input. It can also provide channel management, permission management, automatic crawling, and other functions. For a large website, having an efficient and manageable CMS is essential.
In addition to portal and information publishing websites, static websites with high interaction requirements are also a necessary means to improve performance, static Community posts and articles in real time, and re-static updates are also a lot of use strategies, such as mop, the same is true for the Netease community. At present, many blogs are also static. WordPress, the blog program I use, is not static yet. Therefore, www.toplee.com cannot afford high-load access.
At the same time, HTML static is also a method used by some caching policies. For applications that frequently use database queries but have little content updates in the system, you can consider using HTML static, for example, the public setting information of the Forum in the forum. Currently, mainstream forums can perform background management and store the information in the database. In fact, this information is frequently called by foreground programs, but the update frequency is very small, you can consider static content during background updates to avoid a large number of database access requests.
When making HTML static, you can use a compromise method, that is, the front-end uses dynamic implementation to regularly perform static and timed judgment calls under certain policies, this can achieve a lot of flexible operations. I developed the billiards website www.8zone.cn to use this method, I set some HTML static time intervals to cache dynamic website content, so as to share most of the pressure on static pages, which can be applied to the architecture of Small and Medium websites. Address of the Origin Site: http://www.8zone.cn, by the way, have friends like billiards a lot of support me this free website :)
2. image server Separation
As we all know, for Web servers, images, whether Apache, IIS or other containers, consume the most resources. Therefore, it is necessary to separate images from pages, this is basically a policy adopted by large websites. They all have independent image servers and even many image servers. This architecture reduces the pressure on the server system that provides page access requests, and ensures that the system does not crash due to image problems.
Different configuration optimizations can be performed on the application server and image server. For example, Apache may support as few loadmodules as possible when configuring contenttype, ensures higher system consumption and execution efficiency.
My billiards website's hometown 8zone.cn also uses the separation of image server architecture. Currently, it is only the separation of architecture and physical isolation, because there is no money to buy more servers :), you can see that the image links on the old people are URLs similar to img.9tmd.com or img1.9tmd.com.
In addition, Lighttpd can be used to replace Apache for processing access to static pages, images, and JS. It provides more lightweight and efficient processing capabilities.
3. Database Cluster and database table hash
Large websites have complex applications, and these applications must use databases. In the face of a large number of accesses, database bottlenecks will soon become apparent. At this time, a database will soon fail to satisfy applications, therefore, we need to use a database cluster or database table hash.
In terms of database clusters, many databases have their own solutions, and Oracle and Sybase all have good solutions. The commonly used MySQL Master/Slave is also a similar solution, you can refer to the corresponding solutions to implement the database.
As the database cluster mentioned above is limited by the DB type used in terms of architecture, cost, and expansion, we need to consider improving the system architecture from the perspective of applications, database table hashing is a common and most effective solution. We install business and application or function modules in the application to separate the database. Different modules correspond to different databases or tables, then, according to a certain policy, conduct a smaller database hash for a page or function, such as a user table and table hash by user ID, in this way, the system performance can be improved at a low cost and the scalability can be improved. Sohu's Forum adopts this architecture to separate the database of Forum users, settings, posts, and other information, and then hash the databases and tables of posts and users according to sections and IDs, the simple configuration in the configuration file allows the system to add a low-cost database at any time to supplement the system performance.
4. Cache
The word cache has been used in many areas. The cache in website architecture and website development is also very important. Here we will first describe the two most basic caches. Advanced and distributed caching are described later.
For architecture caching, anyone familiar with Apache can know that Apache provides its own mod_proxy cache module, or use an additional squid for caching, both methods can effectively improve Apache's access response capabilities.
The memcached provided on Linux is a common cache solution. Many Web Programming Languages provide memcache access interfaces, which are available in PHP, Perl, C, and Java, it can be used in Web development. It can cache data, objects, and other content in real time or cron, with flexible policies. Some large communities use this architecture.
In addition, when developing using the Web language, various languages basically have their own cache modules and Methods. php has the pear cache module and the eaccelerator acceleration and cache module, well-known APC and xcache (developed by Chinese people, supported !) PHP cache module, Java is more,. NET is not very familiar with, I believe there are certainly.
5. Images
Images are often used by large websites to improve performance and data security. The image technology can solve the differences in user access speed caused by different network access providers and regions, for example, the difference between Chinanet and EduNet has prompted many websites to set up image sites in CERNET to regularly update or update data in real time. In terms of image details, I will not elaborate too deeply here. There are many professional solutions and product options. There are also low-cost software implementation ideas, such as rsync on Linux and other tools.
6. Server Load balancer
Server Load balancer is the ultimate solution for large websites to solve high-load access and a large number of concurrent requests.
Server Load balancer has been developing for many years. There are many professional service providers and products to choose from. I personally have some solutions, including two architectures for your reference. In addition, we will not talk much about the primary Server Load balancer DNS round robin and the professional CDN architecture.
6.1 hardware layer-4 Switching
The layer-4 Exchange uses the header information of the layer-3 and layer-4 information packets to identify business flows based on the Application interval and distribute the business flows of the entire interval segment to appropriate application servers for processing. The layer-4 switching function is like a virtual IP address pointing to a physical server. Its transmission services are subject to a variety of protocols, including HTTP, FTP, NFS, telnet, or other protocols. These services require complex load balancing algorithms based on physical servers. In the IP address world, the service type is determined by the TCP or UDP port address of the terminal. The application interval in the layer-4 switch is determined by the same source and terminal IP addresses, TCP and UDP ports.
In the field of hardware layer-4 switching products, there are some well-known products to choose from, such as Alteon and F5. These products are expensive, but value for money, it provides excellent performance and flexible management capabilities. Yahoo China used three or four Alteon servers on nearly 2000 servers.
6.2 software layer-4 Switching
After learning about the principle of the hardware layer-4 switch, the four-layer switch based on the OSI model came into being. Such a solution achieves the same principle, but has a poor performance. However, it is easy to meet a certain amount of pressure. Some people say that the software implementation method is actually more flexible, and the processing capability depends entirely on the familiarity of your configuration.
We can use LVS, which is commonly used in Linux for software layer-4 Switching. LVS is a Linux virtual server. It provides a real-time disaster response solution based on heartbeat to improve system robustness, at the same time, it provides flexible virtual VIP configuration and management functions, and can meet a variety of application requirements at the same time, which is essential for distributed systems.
A typical load balancing strategy is to build a squid Cluster Based on layer-4 software or hardware exchanges. This idea is adopted on many large websites, including search engines, this architecture is low-cost, high-performance, and highly scalable. It is easy to increase or decrease nodes in the architecture at any time. I have prepared a special detail for this architecture and will discuss it with you.
Summary:
For large websites, each method mentioned above may be used at the same time. Michael is a simple introduction here. Many details of the implementation process require you to be familiar with them, sometimes a very small squid parameter or Apache parameter setting will have a great impact on the system performance. I hope you can discuss it together to make it effective.
The web program obtains information mainly by querying the database. When the database is not very large, there will be no major problems. however, with the development of the website, when the database grows in a geometric level, there will be bottlenecks. so the PHP cache technology was born.
. When the PHP cache technology works, when the program queries data, it serializes the corresponding results and saves them to the file. In the future, the same query statement can be used without directly querying the database, instead, it is obtained from the cache file. This improvement greatly improves the program running speed. currently, the popular methods for using PHP cache technology are gold partners such as ADODB and smarty. working principle of PHP Cache Technology: first look at the data cache function provided by ADODB: 1 <? PHP 2 include ('ADODB. inc. PHP '); # Load Code common to ADODB 3 $ adodb_cache_dir ='/usr/adodb_cache '; 4 $ conn = & adonewconnection ('mysql '); # create a connection 5 $ Conn-> pconnect ('', 'userid','', 'agora '); # connect to MySQL, agora dB 6 $ SQL = 'select mermername, customerid from MERs '; 7 $ rs = $ Conn-> cacheexecute (15, $ SQL); 8?> As shown above, each time data is queried, the corresponding results are serialized and saved to the file. In the future, the same query statement can be obtained from the cache file instead of directly querying the database. Let's take a look at the page cache function provided by smarty: 1 <? PHP 2 require ('smarty. Class. php'); 3 $ smarty = new smarty; 4 $ smarty-> caching = true; 5if (! $ Smarty-> is_cached ('index. TPL ') {6 // No cache available, do variable assignments here. 7 $ contents = get_database_contents (); 8 $ smarty-> assign ($ contents); 9} 10 $ smarty-> display ('index. TPL '); 11?> 12 As shown above, each time you access the page, you will first check whether the corresponding cache exists. If it does not exist, connect to the database and obtain the data. After assigning values to template variables, the page is displayed, the cache file is generated at the same time, so that the cache file will play a role the next time you access it, instead of executing the if block data query statement. Of course, there are many things to consider in actual use, such as setting the validity period and cache group. For details, refer to the related sections on cache (caching) in the smarty manual. The two popular php component caching methods have different focuses. For the ADODB cache, it caches data and for the smarty cache, it caches pages. There are many other components that provide the caching function (for example, pear: cache_lite). The specific analysis of the specific solution used in actual programming may also be used in combination. One obvious advantage of using the built-in caching scheme of these components is that their implementation is transparent to the client. You only need to make the necessary settings (such as the cache time and cache directory), instead of having to think too much about the cache details, the system will automatically manage the cache according to the settings. However, its disadvantages are also obvious, because every request still needs to be parsed using PHP, and the efficiency is still greatly reduced compared with pure static, which still cannot meet the requirements in the face of large PVS. In this case, static caching is required because Dynamic Caching is not enough. PHP, a web design scripting language that has emerged in recent years. Due to its powerful and scalability, PHP has developed considerably in recent years. Compared with traditional ASP websites, it has an absolute advantage in speed. If you want MSSQL to convert 60 thousand pieces of data to PhP, it will take 40 seconds, and ASP will take less than 2 minutes. however, due to the increasing number of website data, we are eager to call data more quickly. We do not need to call data from the database every time. We can choose from other places, such as a file, or a memory address, which is the Cache Technology of PHP. database Technology: Database statement optimization, partitioning, configuration of Master/Slave servers, text storage technology, text concurrency is higher than the database performance