Ten Rules for large and high-performance websites
《 Program Member magazine time: original article link [favorites]
Tip: Click to switch to browse
In our company chinanetcloud, we have seen many different types of websites and systems, which are both good and bad. Some of these systems have a good server/network architecture and have been properly adjusted and monitored. However, general systems have security and performance problems and cannot run well, and cannot become more popular.
In China, the open-source LAMP stack is the most popular network architecture. It is developed using PHP and runs on an Apache server. It uses MySQL as a database and runs on Linux. It is a reliable platform that runs well and is now the world's most popular Internet system architecture. However, it is difficult for us to correctly scale and maintain security because each application layer has its own problems, defects, and best practices. Our job is to help enterprises create and run high-performance, scalable, and secure systems at the lowest operating costs. Therefore, we have a wealth of experience with such problems.
The actual situation is that many websites are created quickly and cheaply by developers. Generally, there are no IT staff or managers, but programmers only manage the system. The result is that, although the cost is very low, the website can start to run. However, when there are a large number of users who need to scale up, they usually face real problems. After all, there are 0.3 billion million Internet users in China. If 80 million of them access this site, it will easily lead to 0.01% ~ 0.5 million of page traffic. These problems are generated at all levels. The rules summarized below are an overview of the most common problems, and explain why these rules are so important and what methods are best used to correct them. Websites that follow these suggestions will improve scalability, security, and operational stability.
1. Use appropriate session management
The first way to expand the system is to add more hardware. For example, use two servers instead of one. This sounds reasonable, but may cause potential problems: session management. This is a serious problem for Java programs, and may also cause scalability problems in PHP, especially for database load.
A session is defined as a separate end user login or connection for a period of time. It usually contains multiple TCP/IP HTTP connections and several web pages, it usually includes dozens or even hundreds of page elements, such as frameworks, menus, and Ajax updates. All these HTTP requests need to know who the user is to meet the security requirements and send appropriate content to the user, because these are part of the session. Generally, each session includes associated session data, such as user name, user ID, history, shopping cart, and statistics.
The problem is that when there are two web servers and multiple HTTP connections, user traffic will be allocated and moved between the two servers. It is difficult for the server to know who the user is, and track all the data, because each page or page component may come from different servers. In PHP, a session ID is created and put in the cookie during the first connection or login. Then, the cookie is sent together with each HTTP request.
This poses a problem. Next, each PHP script needs to search for session data based on the ID. Since PHP cannot maintain state during execution (different from Java), the session data needs to be stored somewhere, usually in the database. However, if a complex page needs to be searched ten times during each page loading process (this is often done), it means that each page needs to perform ten SQL queries, this will cause a large load on the database.
In the preceding example of 0.01% Internet users in China, it may be easy to generate hundreds of queries per second for the sake of management sessions. The solution is to always use the session ID in the cookie and use services such as memcached to cache session data for high performance.
Pay attention to the security issues, because hackers can forge the session ID of another user, which is easily found or seen, especially in public Wi-Fi. The solution is to properly encrypt or sign the session ID and bind it with the time interval, IP address, and other key information such as the browser or other details. There are many good examples of good session management on the Internet, you can find the most suitable as needed.
2. Always consider security
Despite writing something like preventing SQL injection and logging securityCodeIt involves many security issues, but unfortunately almost no one has considered security, and those who have considered it do not have a good understanding of it. This article focuses on operational system security. For this type of security, we focus on three security areas: Firewall, operating users, and file access permissions.
In addition to configuring dedicated hardware firewalls (such as Cisco's ASA), all servers should also run firewalls such as iptables, which protect servers from other threats and attacks. These threats and attacks may come from the public Internet, other servers, or local servers. They also include developers and operators who use VPN or SSH channels. We only open the required port to the specified IP address. Iptables may be complex, but there are many good templates that we can usually use to help customers create iptables. For example, the default Redhat or centos firewall configuration is only 10 lines, which is obviously not practical. Our best practice of iptables configuration has about five pages, which includes the most advanced security protection provided by Linux.
All public services should run under special users, such as Apache. Never run with the root user, because this will allow any user that breaks into Apache to take over the entire server. If Apache runs only under the Apache user or the nobody, it is not easy to break into Apache.
Web server running or service files (such as .php and .html files) should not be writable to Web Server users. This means that Apache or nginx users should not have the write permission for the web directory. There are many ways to do this, and the simplest is to use these files as owned by other users, then, users such as Apache and nginx are assigned to a group that can read files with 640 permissions. This prevents almost all hackers and page attacks.
In addition, never use ftp to upload files, especially in public Wi-Fi environments, because hackers can easily steal user names and passwords. Instead, SFTP is more secure. In addition, each employee should have his/her own user ID and random password.
3.Use standard path and installation Configuration
An annoying deployment problem is that developers seldom consider where their software will be deployed to production Web servers and how to deploy it. We have seen many large systems deploy their PHP code in the/home/Xiaofeng or/web/code path. In fact, these two paths are very nonstandard and may cause operational and security problems. When these systems are transferred from the development environment to the test environment and then to the production environment, because each installation configuration is non-standard, problems often occur. In this case, developers need to adjust the configuration to work properly.
You should always use standard installation packages and binary files to install servers such as Apache. Do not start fromSource codeCompile or install tarball because it causes long-term stability and management problems. Installing multiple versions on the server also causes confusion.
The web site should always be tested and deployed under the specified platform and standard path released by Linux, such as the/var/www/html path under RedHat or centos. This facilitates effective permission management, backup, configuration, monitoring, and other operations on the system.
Web server logs should be stored in/var/logs or/var/logs/app_name, rather than in the main code area. The reason for doing so is not only because these standard paths are very important, but also because the server is properly configured to configure/VaR as a separate file system. If the application suddenly writes a large amount of logs and occupies all disk space, the system will not crash or other serious problems because of the above configuration. If the log is located elsewhere, problems may occur.
4. Always use logs
The number of logs made in the web system cannot exceed. All systems should write important data into logs, whether their own logs or system syslog. Cron jobs and other shell scripts or C language programs have corresponding standards and simple functions for logs. In shell scripts, you only need to use the logger command to write logs. Write logs when the script starts/stops, important script execution, and real-time data are generated. In this case, you can view the main system logs to easily see what happened.
Large systems often use specialized tools such as local5 to record logs, and configure syslog or syslog-ng to store logs in separate files, which makes it easier to use. Note that the syslog tool and logger (and any syslog call) Give priority to user. Notice by default. you can adjust it if necessary.
A good system will configure the program to enable or disable logs, and you can choose to apply different levels of logs at each module or function level. This allows us to record very detailed and powerful logs for analyzing and debugging the problems that occurred during production operations.
5. Good database design and SQL
In any system, databases are usually the biggest performance bottleneck. The two biggest problems affecting database performance are database design and SQL code quality. Many systems have good or at least available database designs, but the quality of SQL code is usually poor due to lack of proper performance tests. Such SQL code may run very quickly in the development environment, because there is only a small data set and the minimum load. But when thousands of users read millions of records from the database at the same time, it is likely to crash.
Unfortunately, these problems are not obvious at the beginning until the system grows and suddenly crashes. In the process of increasing, the database system seems to run very fast (because the data is in the memory, and there are few concurrent queries), and the response to the user should be very fast, but in fact, its internal operation efficiency is very low. This is not important. We focus on finding and solving these problems before the system increases and encounters performance problems.
There are a lot of good books and sites for analysis on this issue. The key tools include slow query logs, InnoDB status systems, and MySQL statistics that describe the current performance. We have seen a lot of systems read 500,000 pieces of data per second, which is a clear sign of SQL problems, but companies often do not know anything about it until the server starts to crash.
The MySQL system should use the InnoDB Storage engine for all data, because InnoDB runs faster and more stable than MyISAM, and the management performance and backup work are easier and faster. In the master configuration file, InnoDB should be set as the default database engine, and the system should check from time to see if the MyISAM table is accidentally created.
6. Always have good dB configuration and backup
Many companies do not have a sound backup mechanism and do not know how to do it properly. MySQL dump is not enough, because the best backup method is to use LVM snapshots and InnoDB for hot backup of the system, so as to achieve ultra-fast speed and high reliability.
In addition, compression and encryption are required before all backup files are transferred from the server. In addition, make sure that you have a well-designed MySQL configuration. The default installation and usage of MySQL is only 5 ~ 10 lines of configuration are not suitable for development. The best practice documents we provide to customers are as long as 10 pages. There are about 100 useful settings for security, performance, and stability, including preventing data corruption, many of which are very important.
7. Use read/write database Separation
As systems become larger and larger, especially when they have poor SQL statements, a database server is usually insufficient to handle the load. But multiple databases mean repetition, unless you separate the data. More generally, this means that a Master/Slave replica system is created, in which the program will write all update, insert, and delete change statements for the master database, all select data is read from the database (or multiple slave databases ).
Although the concept is simple, it is not easy to implement it reasonably and accurately, which may require a lot of code work. Therefore, even if you use the same database server at the beginning, you should plan to use a separate dB connection in PHP as early as possible for read/write operations. If the task is completed correctly, the system can be expanded to 2, 3, or even 12 servers with high availability and stability.
8. Use a database cache like memcached
Even with good database design, SQL, and read/write separation, large systems still require faster performance, especially for session statuses, friend lists, and BBS text. To achieve this goal, we can use data caches such as memcached, which is a high-performance simple data cache that has been used by all the largest websites. But be careful not to rely on a memcache server 100% to improve performance, because if the server crashes, it will damage the performance of the entire system. In this case, 2 ~ The three memcache servers form a cluster architecture and selectively include a cache preparation process. If the cache server is restarted and data needs to be re-loaded, it can quickly load the cache.
9. Build a test and development environment
Many companies only have developer desktop systems and their production servers. As the system grows bigger and more complex, testing and managing code can cause serious problems. The best practice is to have two testing systems, one for Integrated Testing of developers' code and functions, and the other for seamless transition to the production environment. Fortunately, cloud computing (or private cloud) can easily achieve this. One 5 ~ In the production environment of 10 servers, it is easy to use one server in the office or IDC for replication for testing, and this server can be used for projects of multiple customers.
10. Use Version Control
Finally, we need to use version control, including testing and production environment deployment. Many developers use SVN or similar methods. Ideally, these methods can be used for all code, scripts, HTML, images, configurations, documents, and tests. Version control should be the only way to transfer code to the test environment, rather than simply copying or using tar files, because both are unreliable. Developers should check everything in, tag, and check them out to the test system. If there is no problem, they will check out the version to the production environment.
Summary
There are many precautions for creating a reliable and high-performance web system during development and operation. This article attempts to discuss the most important points from the perspective of operability and reliability. Do not forget these important questions when you build and manage your site. Following these rules will help ensure the system runs well for a long time.
Author profile:
Steve mushero, Co-Founder, CEO and CTO of chinanetcloud, has over 20 years of technical management experience in the world. He has served as a CTO for many enterprises such as Tudou, intermind, and advanced management systems.
Introduction to translators:
Hou bowei, born in Fengcheng, studied in chuncheng and worked in domestic and Japanese projects. He is now working in an insurance company in Dalian. Willing to study technology, pay attention to business knowledge, be diligent in thinking, and be willing to communicate and share with others.