Security of PHP applications-you must have a deep understanding of security, whether in development, during interviews, or during technical discussions.
Target
This tutorial aims to show you how to protect your own Web applications. Explains how to defend against the most common security threats: SQL injection, GET and POST variable manipulation, buffer overflow attacks, cross-site scripting attacks, browser data manipulation, and remote form submission.
Quick introduction to security
What is the most important part of a Web application? Different people answer questions. Business personnel need reliability and scalability. The IT support team needs robust and maintainable code. End users need beautiful user interfaces and high performance when performing tasks. However, if you answer "security", everyone will agree that this is important to Web applications.
However, most of the discussions are stuck here. Although security is included in the project checklist, it is often considered to solve the security issue before the project is delivered. The number of Web application projects in this way is amazing. After several months of work, developers only add security features at the end, so that Web applications can be opened to the public.
The results are often messy, and even need to be reworked, because the code has been tested, unit tests are combined into a larger framework, and then security features are added to it. After security is added, the main components may stop working. Security Integration adds additional burden or steps to the original smooth (but insecure) process.
This tutorial provides a good way to integrate security into PHP Web applications. It discusses several general security topics and then discusses in depth the main security vulnerabilities and how to block them. After completing this tutorial, you will have a better understanding of security.
Topics include:
SQL injection attacks
Manipulate GET strings
Buffer overflow attacks
Cross-site scripting (XSS)
Data manipulation in the browser
Remote form submission
Web Security 101
Before discussing the security details, we 'd better discuss the security of Web applications from a higher perspective. This section describes some basic principles of the security philosophy, which should be kept in mind no matter what Web applications are being created. Some of these ideas come from Chris Shiflett (his book on PHP Security is an invaluable treasure), some from Simson Garfinkel (see references), and some from years of accumulated knowledge.
Rule 1: never trust external data or input
The first thing that must be realized about Web application security is that external data should not be trusted. External data includes any data that is not directly input by programmers in PHP code. Before taking measures to ensure security, any data from any other source (such as GET variables, form POST, database, configuration file, session variables, or cookies) is untrusted.
For example, the following data elements can be considered safe because they are set in PHP.
Listing 1. safe and flawless code
[Php] $ myUsername = 'tmyer ';
$ ArrayUsers = array ('tmyer ', 'Tom', 'Tommy ');
Define ("GREETING", 'Hello There'. $ myUsername); [/php]
However, the following data elements are flawed.
List 2. insecure and defective code
[Php] $ myUsername = $ _ POST ['username']; // tainted!
$ ArrayUsers = array ($ myUsername, 'Tom ', 'Tommy'); // tainted!
Define ("GREETING", 'Hello There'. $ myUsername); // tainted! [/Php]
Why is the first variable $ myUsername defective? Because it is directly from form POST. You can enter any strings in this input field, including malicious commands used to clear files or run previously uploaded files. You might ask, "isn't it possible to use a client (JavaScript) form validation script that only accepts letter A-Z to avoid this risk ?" Yes, this is always a good step, but as you will see later, anyone can download any form to their machine and modify it, then resubmit any content they need.
The solution is simple: you must run the cleanup code on $ _ POST ['username. Otherwise, $ myUsername may be contaminated at any other time (such as in an array or constant.
A simple method for clearing user input is to use a regular expression to process it. In this example, only letters are allowed. It may be a good idea to limit a string to a specific number of characters, or to require that all letters be in lowercase.
Listing 3. making user input secure
[Php] $ myUsername = cleanInput ($ _ POST ['username']); // clean!
$ ArrayUsers = array ($ myUsername, 'Tom ', 'Tommy'); // clean!
Define ("GREETING", 'Hello There'. $ myUsername); // clean!
Function cleanInput ($ input ){
$ Clean = strtolower ($ input );
$ Clean = preg_replace ("/[^ a-z]/", "", $ clean );
$ Clean = substr ($ clean, 0, 12 );
Return $ clean;
} [/Php]
Rule 2: disable PHP settings that make security difficult
You already know that you cannot trust user input. you should also know that you should not trust the PHP configuration method on the machine. For example, make sure to disable register_globals. If register_globals is enabled, you may do some careless things, such as replacing the GET or POST string with the same name with $ variable. By disabling this setting, PHP forces you to reference the correct variables in the correct namespace. To use a variable from Form POST, you should reference $ _ POST ['variable']. In this way, the specific variable will not be misunderstood as a cookie, session, or GET variable.
The second setting to be checked is the error report level. During development, you want to get as many error reports as possible, but you want to record errors to log files rather than display them on the screen when delivering the project. Why? This is because malicious hackers use error report information (such as SQL errors) to guess what the application is doing. This kind of reconnaissance can help hackers break through applications. To block this vulnerability, you need to edit the php. ini file, provide an appropriate destination for the error_log entry, and set display_errors to Off.
Rule 3: If you cannot understand it, you cannot protect it.
Some developers use strange syntaxes, or organize statements very compact to form short but ambiguous code. This method may be highly efficient, but if you do not understand what the code is doing, you cannot decide how to protect it.
For example, which of the following two sections of code do you like?
Listing 4. easy code protection
[Php] // obfuscated code
$ Input = (isset ($ _ POST ['username'])? $ _ POST ['username']: ");
// Unobfuscated code
$ Input = ";
If (isset ($ _ POST ['username']) {
$ Input = $ _ POST ['username'];
} Else {
$ Input = ";
} [/Php]
In the second clear code segment, it is easy to see that $ input is defective and needs to be cleaned up before it can be processed safely.
Rule 4: "defense in depth" is a new magic weapon
This tutorial uses examples to illustrate how to protect online forms and take necessary measures in PHP code that processes forms. Similarly, even if PHP regex is used to ensure that the GET variable is completely numeric, you can still take measures to ensure that the SQL query uses escape user input.
Defense in depth is not just a good idea. it ensures that you are not in serious trouble.
Now that we have discussed the basic rules, we will study the first threat: SQL injection attacks.
Prevent SQL injection attacks
In SQL injection attacks, you can manipulate the form or GET query string to add information to the database query. For example, assume there is a simple login database. Each record in this database has a username field and a password field. Create a logon form to allow users to log on.
Listing 5. simple logon form
[Php]
Login
[/Php]
This form accepts the user name and password entered by the user, and submits the user input to the file verify. php. In this file, PHP processes data from the login form, as shown below:
Listing 6. insecure PHP form processing code
[Php] $ Okay = 0;
$ Username = $ _ POST ['user'];
$ Pw = $ _ POST ['pw '];
$ SQL = "select count (*) as ctr from users where
Username = '". $ username."' and password = '". $ pw." 'limit 1 ″;
$ Result = mysql_query ($ SQL );
While ($ data = mysql_fetch_object ($ result )){
If ($ data-> ctr = 1 ){
// They're okay to enter the application!
$ Okay = 1;
}
}
If ($ okay ){
$ _ SESSION ['loginokay'] = true;
Header ("index. php ");
} Else {
Header ("login. php ");
}
?> [/Php]
This code looks okay, right? Hundreds or even thousands of PHP/MySQL sites around the world are using this code. Where is the error? Well, remember "user input cannot be trusted ". No information from the user is escaped, so the application is vulnerable to attacks. Specifically, any type of SQL injection attacks may occur.
For example, if you enter foo as the user name and 'or '1' = '1 as the password, the following string is actually passed to PHP and then the query is passed to MySQL:
$ SQL = "select count (*) as ctr from users where
Username = 'foo' and password = "or '1' = '1' limit 1 ″;
This query always returns a count value of 1, so PHP will allow access. By injecting some malicious SQL statements at the end of the password string, hackers can dress up as legitimate users.
To solve this problem, use the built-in mysql_real_escape_string () function of PHP as the package for any user input. This function is used to escape characters in a string, making it impossible for the string to pass special characters such as an apostrophes and allow MySQL to perform operations based on special characters. Listing 7 shows the code with escape processing.
Listing 7. safe PHP form processing code
[Php] $ Okay = 0;
$ Username = $ _ POST ['user'];
$ Pw = $ _ POST ['pw '];
$ SQL = "select count (*) as ctr from users where
Username = '". mysql_real_escape_string ($ username )."'
And password = '". mysql_real_escape_string ($ pw)." 'limit 1 ″;
$ Result = mysql_query ($ SQL );
While ($ data = mysql_fetch_object ($ result )){
If ($ data-> ctr = 1 ){
// They're okay to enter the application!
$ Okay = 1;
}
}
If ($ okay ){
$ _ SESSION ['loginokay'] = true;
Header ("index. php ");
} Else {
Header ("login. php ");
}
?> [/Php]
Using mysql_real_escape_string () as the package for user input can avoid any malicious SQL injection in user input. If you try to pass a malformed password through SQL injection, the following query will be passed to the database:
Select count (*) as ctr from users where \
Username = 'foo' and password = '\' or \ '1 \ '= \ '1' limit 1 ″
There is nothing in the database that matches this password. Simply taking a simple step blocks a major vulnerability in a Web application. The experience here is that user input for SQL queries should always be escaped.
However, several security vulnerabilities need to be blocked. The next item is to manipulate the GET variable.
Prevent users from manipulating variables
In the previous section, users are prevented from logging on with malformed passwords. If you are smart, you should apply the method you have learned to ensure that all user input in the SQL statement is escaped.
However, the user has logged on safely. A user having a valid password does not mean that he will follow the rules-he has many opportunities to cause damage. For example, an application may allow users to view special content. All links direct to template. php? Pid = 33 or template. php? Pid = 321. The part after the question mark in the URL is called a query string. Because the query string is directly placed in the URL, it is also called the GET query string.
If register_globals is disabled in PHP, you can use $ _ GET ['pid '] to access this string. On the template. php page, operations similar to those in listing 8 may be performed.
Listing 8. Sample template. php
[Php] $ Pid = $ _ GET ['pid'];
// We create an object of a fictional class Page
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
//......
//......
?> [/Php]
What's wrong here? First, it is implicitly believed that the GET variable pid from the browser is safe. What will happen? Most users are less intelligent and cannot construct semantic attacks. However, if they notice that the pid in the URL field of the browser is 33, the problem may start. If they enter another number, it may be okay; but if they enter something else, such as an SQL command or the name of a file (such as/etc/passwd), or do other pranks, for example, if you enter a value up to 3,000 characters, what will happen?
In this case, remember the basic rules and do not trust user input. Application developers know that the personal identifier (PID) accepted by template. php should be a number, so they can use the is_numeric () function of PHP to ensure that non-numeric PID is not accepted, as shown below:
Listing 9. use is_numeric () to restrict GET variables
[Php] $ Pid = $ _ GET ['pid'];
If (is_numeric ($ pid )){
// We create an object of a fictional class Page
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
//......
//......
} Else {
// Didn't pass the is_numeric () test, do something else!
}?> [/Php]
This method seems to be valid, but the following inputs can be easily checked by is_numeric:
100 (valid)
100.1 (decimal places should not exist)
+ 0123.45e6 (scientific notation-not good)
0xff33669f (hexadecimal -- dangerous! Dangerous !)
So what should PHP developers with security awareness do? Years of experience show that the best practice is to use regular expressions to ensure that the entire GET variable is composed of numbers, as shown below:
Listing 10. use regular expressions to restrict GET variables
[Php] $ Pid = $ _ GET ['pid'];
If (strlen ($ pid )){
If (! Ereg ("^ [0-9] + $", $ pid )){
// Do something appropriate, like maybe logging \
Them out or sending them back to home page
}
} Else {
// Empty $ pid, so send them back to the home page
}
// We create an object of a fictional class Page, which is now
// Moderately protected from edevil user input
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
//......
//......
?> [/Php]
All you need to do is use strlen () to check whether the variable length is non-zero. If yes, use a full-number regular expression to ensure that the data element is valid. If the PID contains letters, slashes, dots, or anything similar to the hexadecimal format, this routine captures it and shields the page from user activity. If you look at the Page behind the scenes, you will see that security-aware PHP developers have escaped the user input $ pid, thus protecting the fetchPage () method, as shown below:
Listing 11. escape the fetchPage () method
[Php] Class Page {
Function fetchPage ($ pid ){
$ SQL = "select pid, title, desc, kw, content ,\
Status from page where pid ='
". Mysql_real_escape_string ($ pid )."'";
// Etc, etc ....
}
}
?> [/Php]
You may ask, "since you have ensured that the PID is a number, why should we escape it ?" Because the fetchPage () method is used in different contexts and situations. It must be protected in all the places where this method is called, and escaping in the method reflects the meaning of in-depth defense.
What happens if you try to enter a very long value, for example, a buffer overflow attack with 1000 characters? This issue is discussed in more detail in the next section. However, you can add another check to ensure that the entered PID has the correct length. You know that the maximum length of the database pid field is 5 bits, so you can add the following check.
Listing 12. use regular expressions and length checks to restrict GET variables
[Php] $ Pid = $ _ GET ['pid'];
If (strlen ($ pid )){
If (! Ereg ("^ [0-9] + $", $ pid) & strlen ($ pid)> 5 ){
// Do something appropriate, like maybe logging \
Them out or sending them back to home page
}
} Else {
// Empty $ pid, so send them back to the home page
}
// We create an object of a fictional class Page, which is now
// Even more protected from edevil user input
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
//......
//......
?> [/Php]
Currently, no one can insert a 5,000-bit value in the database application-at least this is not the case where the GET string is involved. Imagine a hacker biting his teeth when trying to break through your application and getting frustrated! Moreover, it is more difficult for hackers to conduct reconnaissance because the error report is disabled.
Buffer overflow attacks
The buffer overflow attack attempts to overflow the memory allocation buffer in PHP applications (or, more specifically, in Apache or the underlying operating system. Remember that you may write Web applications in advanced languages such as PHP, but you still need to call C (in Apache ). Like most low-level languages, C has strict rules for memory allocation.
The buffer overflow attack sends a large amount of data to the buffer, so that part of the data overflows to the adjacent memory buffer, thus damaging the buffer or rewriting logic. In this way, it can cause denial of service, damage data, or execute malicious code on a remote server.
The only way to prevent buffer overflow attacks is to check the length of all user input. For example, if a form element requires the user's name to be input, add the maxlength attribute with a value of 40 on this field and use substr () on the backend for inspection. Listing 13 provides a brief example of the form and PHP code.
Listing 13. check the length of user input
[Php] If ($ _ POST ['submit '] = "go "){
$ Name = substr ($ _ POST ['name'], 0, 40 );
// Continue processing ....
}
?>
[/Php]
Why does the maxlength attribute be provided and substr () check be performed on the backend? Because in-depth defense is always good. The browser prevents users from entering super-long strings that PHP or MySQL cannot safely process (imagine someone trying to enter a name up to 1,000 characters ), the backend PHP check ensures that no one remotely or in the browser can manipulate form data.
As you can see, this method is similar to using strlen () in the previous section to check the length of the GET variable pid. In this example, ignore any input value with a length of more than five characters, but you can also easily shorten the value to an appropriate length, as shown below:
Listing 14. changing the length of the input GET variable
[Php] $ Pid = $ _ GET ['pid'];
If (strlen ($ pid )){
If (! Ereg ("^ [0-9] + $", $ pid )){
// If non numeric $ pid, send them back to home page
}
} Else {
// Empty $ pid, so send them back to the home page
}
// We have a numeric pid, but it may be too long, so let's check
If (strlen ($ pid)> 5 ){
$ Pid = substr ($ pid, 0, 5 );
}
// We create an object of a fictional class Page, which is now
// Even more protected from edevil user input
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
//......
//......
?> [/Php]
Note that the buffer overflow attack is not limited to long numeric or serial strings. You may also see a long hexadecimal string (often looks like \ xA3 or \ xFF ). Remember, the purpose of any buffer overflow attack is to drown out a specific buffer zone and place malicious code or instructions in the next buffer zone to corrupt data or execute malicious code. The easiest way to deal with Hex buffer overflow is not to allow the input to exceed a specific length.
If you are allowed to enter a long form partition in the database, you cannot easily limit the data length on the client. After the data arrives in PHP, you can use a regular expression to clear any string like a hexadecimal string.
Listing 15. preventing hexadecimal strings
[Php] If ($ _ POST ['submit '] = "go "){
$ Name = substr ($ _ POST ['name'], 0, 40 );
// Clean out any potential hexadecimal characters
$ Name = cleanHex ($ name );
// Continue processing ....
}
Function cleanHex ($ input ){
$ Clean = preg_replace ("! [\] [XX] ([A-Fa-f0-9 })!", "", $ Input );
Return $ clean;
}
?>
[/Php]
You may find that these operations are a little too strict. After all, the hexadecimal string has a valid purpose, such as outputting characters in a foreign language. You have to decide how to deploy the hexadecimal regex. A better strategy is to delete a hexadecimal string only when a row contains too many hexadecimal strings or the number of characters in the string exceeds a specified number (such as 128 or 255.
Cross-Site Scripting
In cross-site scripting (XSS) attacks, a malicious user often enters information in the form (or in other user input mode, these inputs Mark malicious clients as being inserted into the process or database. For example, assume that there is a simple visitor register program on the site, allowing visitors to leave their names, email addresses, and short messages. Malicious users can use this opportunity to insert things other than short messages, such as inappropriate images for other users, JavaScript redirection to another site, or cookie information theft.
Fortunately, PHP provides the strip_tags () function, which can clear any content surrounded by HTML tags. The strip_tags () function also allows the list of allowed tags, suchOr.
Listing 16 provides an example built on the previous example.
Listing 16. clear HTML tags from user input
[Php] If ($ _ POST ['submit '] = "go "){
// Strip_tags
$ Name = strip_tags ($ _ POST ['name']);
$ Name = substr ($ name, 0, 40 );
// Clean out any potential hexadecimal characters
$ Name = cleanHex ($ name );
// Continue processing ....
}
Function cleanHex ($ input ){
$ Clean = preg_replace \
("! [\] [XX] ([A-Fa-f0-9 })!", "", $ Input );
Return $ clean;
}
?>
[/Php]
From a security perspective, it is necessary to use strip_tags () for public user input. If the form is in a protected area (such as the content management system) and you believe that the user will correctly execute their tasks (such as creating HTML content for a Web site), use strip_tags () it may be unnecessary and may affect work efficiency.
Another question: If you want to accept user input, such as comments on the posts or visitor registration items, you need to display this input to other users, you must put the response in the htmlspecialchars () function of PHP. This function converts a symbol, <, and> symbol to an HTML object. For example, convert the symbol (&) &. In this way, even if the malicious content is removed from the front-end strip_tags () processing, it will be processed by htmlspecialchars () at the backend.
Data manipulation in the browser
A browser plug-in allows users to tamper with header and form elements on a page. Using Tamper Data (a Mozilla plugin), you can easily manipulate simple forms containing many hidden text fields to send commands to PHP and MySQL.
Before clicking Submit on the form, the user can start Tamper Data. When submitting a form, he will see a list of data fields in the form. Tamper Data allows users to Tamper with the Data, and then the browser completes form submission.
Let's go back to the example we created earlier. Check the string length, clear the HTML tag, and delete hexadecimal characters. However, some hidden text fields are added as follows:
Listing 17. hiding variables
[Php] If ($ _ POST ['submit '] = "go "){
// Strip_tags
$ Name = strip_tags ($ _ POST ['name']);
$ Name = substr ($ name, 0, 40 );
// Clean out any potential hexadecimal characters
$ Name = cleanHex ($ name );
// Continue processing ....
}
Function cleanHex ($ input ){
$ Clean = \
Preg_replace ("! [\] [XX] ([A-Fa-f0-9 })!", "", $ Input );
Return $ clean;
}
?>
[/Php]
Note: One of the hidden variables exposes the table name users. You can also see an action field with the value of create. With basic SQL experience, we can see that these commands may control an SQL engine in middleware. To make a big damage, you only need to change the table name or provide another option, such as delete.
Figure 1 illustrates the scope of failure that Tamper Data can provide. Note: Tamper Data not only allows users to access table Data elements, but also allows access to HTTP headers and cookies.
Figure 1. Tamper Data window
The simplest way to defend against such tools is to assume that any user may use Tamper Data (or similar tools ). Only the minimum amount of information required by the system to process the form is provided, and the form is submitted to some dedicated logic. For example, the registry form should only be submitted to the registration logic.
If a general form processing function has been created and many pages use this general logic, what should we do? What should I do if I use hidden variables to control the flow direction? For example, you may specify which database table to write or which file repository to use in the hidden form variable. There are four options:
Without changing anything, we secretly pray that there are no malicious users on the system.
Rewrite function, use more secure special form processing functions, avoid using hidden form variables.
Use md5 () or other encryption mechanisms to encrypt table names or other sensitive information in hidden form variables. Do not forget to decrypt them on the PHP side.
You can use abbreviations or nicknames to blur the meaning of values and then convert these values in the PHP form processing function. For example, if you want to reference the users table, you can use u or any string (such as u8y90 × 0jkL) to reference it.
The last two options are not perfect, but they are much better than making it easier for users to guess the middleware logic or data model.
What are the remaining problems? Remote form submission.
Remote form submission
The advantage of Web is that information and services can be shared. The downside is that you can share information and services, because some people do things without scruples.
Take the form as an example. Anyone can access a Web site and use File> Save As on the browser to create a local copy of the form. Then, he can modify the action parameter to point to a fully qualified URL (not to formHandler. php points to http://www.yoursite.com/formhandler.php, because the table is on this site), make any modifications he wants, click Submit, the server will receive the form data as a valid communication stream.
First, you may consider checking $ _ SERVER ['http _ referer'] to determine whether the request comes from your own SERVER. this method can block most malicious users, but cannot block the best hackers. These people are smart enough to tamper with the reference information in the header so that the form's Remote Copy looks like it was submitted from your server.
A better way to process remote form submission is to generate a token based on a unique string or timestamp and place the token in session variables and forms. After submitting the form, check whether the two tokens match. If they do not match, someone tries to send data from the form's remote copy.
To create a random token, you can use the built-in md5 (), uniqid (), and rand () functions of PHP, as shown below:
Listing 18. defense remote form submission
[Php] Session_start ();
If ($ _ POST ['submit '] = "go "){
// Check token
If ($ _ POST ['token'] = $ _ SESSION ['token']) {
// Strip_tags
$ Name = strip_tags ($ _ POST ['name']);
$ Name = substr ($ name, 0, 40 );
// Clean out any potential hexadecimal characters
$ Name = cleanHex ($ name );
// Continue processing ....
} Else {
// Stop all processing! Remote form posting attempt!
}
}
$ Token = md5 (uniqid (rand (), true ));
$ _ SESSION ['token'] = $ token;
Function cleanHex ($ input ){
$ Clean = preg_replace ("! [\] [XX] ([A-Fa-f0-9 })!", "", $ Input );
Return $ clean;
}
?>
[/Php]
This technology is effective because session data in PHP cannot be migrated between servers. Even if someone has obtained your PHP source code, transfer it to your server and submit information to your server, your server only receives an empty or malformed session token and a previously provided form token. If they do not match, the remote form submission fails.
Conclusion
This tutorial discusses many issues:
Use mysql_real_escape_string () to prevent SQL injection problems.
Use regular expressions and strlen () to ensure that the GET data is not tampered.
Use regular expressions and strlen () to ensure that the data submitted by the user does not overflow the memory buffer.
Use strip_tags () and htmlspecialchars () to prevent users from submitting potentially harmful HTML tags.
This prevents the system from being broken through by tools such as Tamper Data.
Use a unique token to prevent users from submitting forms to the server remotely.
This tutorial does not cover more advanced topics, such as file injection, HTTP header spoofing, and other vulnerabilities. However, the knowledge you have learned can help you immediately increase sufficient security to make the current project safer.