Filter user input based on security rules that cannot be violated by PHP development

Source: Internet
Author: User
Tags ereg form post php form php form processing tainted
As a PHP programmer, especially a newbie, I always know too little about the sinister nature of the Internet. it is often difficult to handle external intrusions, they do not know how hackers intrude, commit intrusions, upload vulnerabilities, SQL injection, and cross-script attacks. As the most basic precaution, you need to pay attention to your external submissions and make the first security mechanism to handle the firewall.
Rule 1: never trust external data or input
The first thing that must be realized about Web application security is that external data should not be trusted. External data includes any data that is not directly input by programmers in PHP code. Before taking measures to ensure security, any data from any other source (such as GET variables, form POST, database, configuration file, session variables, or cookies) is untrusted.
For example, the following data elements can be considered safe because they are set in PHP.

The code is as follows:


$ MyUsername = 'tmyer ';
$ ArrayUsers = array ('tmyer ', 'Tom', 'Tommy ');
Define ("GREETING", 'Hello There'. $ myUsername );
?>


However, the following data elements are flawed.
List 2. insecure and defective code

The code is as follows:


$ MyUsername = $ _ POST ['username']; // tainted!
$ ArrayUsers = array ($ myUsername, 'Tom ', 'Tommy'); // tainted!
Define ("GREETING", 'Hello There'. $ myUsername); // tainted!
?>


Why is the first variable $ myUsername defective? Because it is directly from form POST. You can enter any strings in this input field, including malicious commands used to clear files or run previously uploaded files. You might ask, "isn't it possible to avoid this risk using a client that only accepts letters of A-Z (Javascr into pt) form validation script ?" Yes, this is always a good step, but as you will see later, anyone can download any form to their machine and modify it, then resubmit any content they need.
The solution is simple: you must run the cleanup code on $ _ POST ['username. Otherwise, $ myUsername may be contaminated at any other time (such as in an array or constant.
A simple method for clearing user input is to use a regular expression to process it. In this example, only letters are allowed. It may be a good idea to limit a string to a specific number of characters, or to require that all letters be in lowercase.
Listing 3. making user input secure

The code is as follows:


$ MyUsername = cleanInput ($ _ POST ['username']); // clean!
$ ArrayUsers = array ($ myUsername, 'Tom ', 'Tommy'); // clean!
Define ("GREETING", 'Hello There'. $ myUsername); // clean!
Function cleanInput ($ input ){
$ Clean = strtolower ($ input );
$ Clean = preg_replace ("/[^ a-z]/", "", $ clean );
$ Clean = substr ($ clean, 0, 12 );
Return $ clean;
}
?>


Rule 2: disable PHP settings that make security difficult
You already know that you cannot trust user input. you should also know that you should not trust the PHP configuration method on the machine. For example, make sure to disable register_globals. If register_globals is enabled, you may do some careless things, such as replacing the GET or POST string with the same name with $ variable. By disabling this setting, PHP forces you to reference the correct variables in the correct namespace. To use a variable from Form POST, you should reference $ _ POST ['variable']. In this way, the specific variable will not be misunderstood as a cookie, session, or GET variable.
Rule 3: If you cannot understand it, you cannot protect it.
Some developers use strange syntaxes, or organize statements very compact to form short but ambiguous code. This method may be highly efficient, but if you do not understand what the code is doing, you cannot decide how to protect it.
For example, which of the following two sections of code do you like?
Listing 4. easy code protection

The code is as follows:


// Obfuscated code
$ Input = (isset ($ _ POST ['username'])? $ _ POST ['username']: ");
// Unobfuscated code
$ Input = ";
If (isset ($ _ POST ['username']) {
$ Input = $ _ POST ['username'];
} Else {
$ Input = ";
}
?>


In the second clear code segment, it is easy to see that $ input is defective and needs to be cleaned up before it can be processed safely.
Rule 4: "defense in depth" is a new magic weapon
This tutorial uses examples to illustrate how to protect online forms and take necessary measures in PHP code that processes forms. Similarly, even if PHP regex is used to ensure that the GET variable is completely numeric, you can still take measures to ensure that the SQL query uses escape user input.
Defense in depth is not just a good idea. it ensures that you are not in serious trouble.
Now that we have discussed the basic rules, we will study the first threat: SQL injection attacks.
Prevent SQL injection attacks
In SQL injection attacks, you can manipulate the form or GET query string to add information to the database query. For example, assume there is a simple login database. Each record in this database has a username field and a password field. Create a logon form to allow users to log on.
Listing 5. simple logon form

The code is as follows:




Login






This form accepts the user name and password entered by the user, and submits the user input to the file verify. php. In this file, PHP processes data from the login form, as shown below:
Listing 6. insecure PHP form processing code

The code is as follows:


$ Okay = 0;
$ Username = $ _ POST ['user'];
$ Pw = $ _ POST ['pw '];
$ SQL = "select count (*) as ctr from users where username = '". $ username. "' and password = '". $ pw. "'limit 1 ″;
$ Result = mysql_query ($ SQL );
While ($ data = mysql_fetch_object ($ result )){
If ($ data-> ctr = 1 ){
// They're okay to enter the application!
$ Okay = 1;
}
}
If ($ okay ){
$ _ SESSION ['loginokay'] = true;
Header ("index. php ");
} Else {
Header ("login. php ");
}
?>


This code looks okay, right? Hundreds or even thousands of PHP/MySQL sites around the world are using this code. Where is the error? Well, remember "user input cannot be trusted ". No information from the user is escaped, so the application is vulnerable to attacks. Specifically, any type of SQL injection attacks may occur.
For example, if you enter foo as the user name and 'or '1' = '1 as the password, the following string is actually passed to PHP and then the query is passed to MySQL:

The code is as follows:


$ SQL = "select count (*) as ctr from users where username = 'foo' and password =" or '1' = '1' limit 1 ″;
?>


This query always returns a count value of 1, so PHP will allow access. By injecting some malicious SQL statements at the end of the password string, hackers can dress up as legitimate users.
To solve this problem, use the built-in mysql_real_escape_string () function of PHP as the package for any user input. This function is used to escape characters in a string, making it impossible for the string to pass special characters such as an apostrophes and allow MySQL to perform operations based on special characters. Listing 7 shows the code with escape processing.
Listing 7. safe PHP form processing code

The code is as follows:


$ Okay = 0;
$ Username = $ _ POST ['user'];
$ Pw = $ _ POST ['pw '];
$ SQL = "select count (*) as ctr from users where username = '". mysql_real_escape_string ($ username ). "'and password = '". mysql_real_escape_string ($ pw ). "'limit 1 ″;
$ Result = mysql_query ($ SQL );
While ($ data = mysql_fetch_object ($ result )){
If ($ data-> ctr = 1 ){
// They're okay to enter the application!
$ Okay = 1;
}
}
If ($ okay ){
$ _ SESSION ['loginokay'] = true;
Header ("index. php ");
} Else {
Header ("login. php ");
}
?>


Using mysql_real_escape_string () as the package for user input can avoid any malicious SQL injection in user input. If you try to pass a malformed password through SQL injection, the following query will be passed to the database:
Select count (*) as ctr from users where username = 'foo' and password = '\' or \ '1 \ '= \ '1' limit 1 ″
There is nothing in the database that matches this password. Simply taking a simple step blocks a major vulnerability in a Web application. The experience here is that user input for SQL queries should always be escaped.
However, several security vulnerabilities need to be blocked. The next item is to manipulate the GET variable.
Prevents users from manipulating GET variables
In the previous section, users are prevented from logging on with malformed passwords. If you are smart, you should apply the method you have learned to ensure that all user input in the SQL statement is escaped.
However, the user has logged on safely. A user having a valid password does not mean that he will follow the rules-he has many opportunities to cause damage. For example, an application may allow users to view special content. All links direct to template. php? Pid = 33 or template. php? Pid = 321. The part after the question mark in the URL is called a query string. Because the query string is directly placed in the URL, it is also called the GET query string.
If register_globals is disabled in PHP, you can use $ _ GET ['pid '] to access this string. On the template. php page, operations similar to those in listing 8 may be performed.
Listing 8. Sample template. php

The code is as follows:


$ Pid = $ _ GET ['pid'];
// We create an object of a fictional class Page
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
?>


What's wrong here? First, it is implicitly believed that the GET variable pid from the browser is safe. What will happen? Most users are less intelligent and cannot construct semantic attacks. However, if they notice that the pid in the URL field of the browser is 33, the problem may start. If they enter another number, it may be okay; but if they enter something else, such as an SQL command or the name of a file (such as/etc/passwd), or do other pranks, for example, if you enter a value up to 3,000 characters, what will happen?
In this case, remember the basic rules and do not trust user input. Application developers know that the personal identifier (PID) accepted by template. php should be a number, so they can use the is_numeric () function of PHP to ensure that non-numeric PID is not accepted, as shown below:
Listing 9. use is_numeric () to restrict GET variables

The code is as follows:


$ Pid = $ _ GET ['pid'];
If (is_numeric ($ pid )){
// We create an object of a fictional class Page
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
} Else {
// Didn't pass the is_numeric () test, do something else!
}
?>


This method seems to be valid, but the following inputs can be easily checked by is_numeric:
100 (valid)
100.1 (decimal places should not exist)
+ 0123.45e6 (scientific notation-not good)
0xff33669f (hexadecimal -- dangerous! Dangerous !)
So what should PHP developers with security awareness do? Years of experience show that the best practice is to use regular expressions to ensure that the entire GET variable is composed of numbers, as shown below:
Listing 10. use regular expressions to restrict GET variables

The code is as follows:


$ Pid = $ _ GET ['pid'];
If (strlen ($ pid )){
If (! Ereg ("^ [0-9] + $", $ pid )){
// Do something appropriate, like maybe logging them out or sending them back to home page
}
} Else {
// Empty $ pid, so send them back to the home page
}
// We create an object of a fictional class Page, which is now
// Moderately protected from edevil user input
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
?>


All you need to do is use strlen () to check whether the variable length is non-zero. If yes, use a full-number regular expression to ensure that the data element is valid. If the PID contains letters, slashes, dots, or anything similar to the hexadecimal format, this routine captures it and shields the page from user activity. If you look at the Page behind the scenes, you will see that security-aware PHP developers have escaped the user input $ pid, thus protecting the fetchPage () method, as shown below:
Listing 11. escape the fetchPage () method

The code is as follows:


Class Page {
Function fetchPage ($ pid ){
$ SQL = "select pid, title, desc, kw, content, status from page where pid = '". mysql_real_escape_string ($ pid )."'";
}
}
?>


You may ask, "since you have ensured that the PID is a number, why should we escape it ?" Because the fetchPage () method is used in different contexts and situations. It must be protected in all the places where this method is called, and escaping in the method reflects the meaning of in-depth defense.
If a user tries to enter a very long value, for example, 1000 characters long, and tries to initiate a buffer overflow attack, what will happen? This issue is discussed in more detail in the next section. However, you can add another check to ensure that the entered PID has the correct length. You know that the maximum length of the database pid field is 5 bits, so you can add the following check.
Listing 12. use regular expressions and length checks to restrict GET variables

The code is as follows:


$ Pid = $ _ GET ['pid'];
If (strlen ($ pid )){
If (! Ereg ("^ [0-9] + $", $ pid) & strlen ($ pid)> 5 ){
// Do something appropriate, like maybe logging them out or sending them back to home page
}
} Else {
// Empty $ pid, so send them back to the home page
}
// We create an object of a fictional class Page, which is now
// Even more protected from edevil user input
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
?>


Currently, no one can insert a 5,000-bit value in the database application-at least this is not the case where the GET string is involved. Imagine a hacker biting his teeth when trying to break through your application and getting frustrated! Moreover, it is more difficult for hackers to conduct reconnaissance because the error report is disabled.
Buffer overflow attacks
The buffer overflow attack attempts to overflow the memory allocation buffer in PHP applications (or, more specifically, in Apache or the underlying operating system. Remember that you may write Web applications in advanced languages such as PHP, but you still need to call C (in Apache ). Like most low-level languages, C has strict rules for memory allocation.
The buffer overflow attack sends a large amount of data to the buffer, so that part of the data overflows to the adjacent memory buffer, thus damaging the buffer or rewriting logic. In this way, it can cause denial of service, damage data, or execute malicious code on a remote server.
The only way to prevent buffer overflow attacks is to check the length of all user input. For example, if a form element requires the user's name to be input, add the maxlength attribute with a value of 40 on this field and use substr () on the backend for inspection. Listing 13 provides a brief example of the form and PHP code.
Listing 13. check the length of user input

The code is as follows:


If ($ _ POST ['submit '] = "go "){
$ Name = substr ($ _ POST ['name'], 0, 40 );
}
?>


Why does it provide both the maxlength attribute and substr () check on the backend? Because in-depth defense is always good. The browser prevents users from entering super-long strings that PHP or MySQL cannot safely process (imagine someone trying to enter a name up to 1,000 characters ), the backend PHP check ensures that no one remotely or in the browser can manipulate form data.
As you can see, this method is similar to using strlen () in the previous section to check the length of the GET variable pid. In this example, ignore any input value with a length of more than five characters, but you can also easily shorten the value to an appropriate length, as shown below:
Listing 14. changing the length of the input GET variable

The code is as follows:


$ Pid = $ _ GET ['pid'];
If (strlen ($ pid )){
If (! Ereg ("^ [0-9] + $", $ pid )){
// If non numeric $ pid, send them back to home page
}
} Else {
// Empty $ pid, so send them back to the home page
}
// We have a numeric pid, but it may be too long, so let's check
If (strlen ($ pid)> 5 ){
$ Pid = substr ($ pid, 0, 5 );
}
// We create an object of a fictional class Page, which is now
// Even more protected from edevil user input
$ Obj = new Page;
$ Content = $ obj-> fetchPage ($ pid );
// And now we have a bunch of PHP that displays the page
?>


Note: buffer overflow attacks are not limited to long numeric or serial strings. You may also see a long hexadecimal string (often looks like \ xA3 or \ xFF ). Remember, the purpose of any buffer overflow attack is to drown out a specific buffer zone and place malicious code or instructions in the next buffer zone to corrupt data or execute malicious code. The simplest way to deal with hexadecimal slow-forward overflow is to not allow the input to exceed a specific length.
If you are allowed to enter a long form partition in the database, you cannot easily limit the data length on the client. After the data arrives in PHP, you can use a regular expression to clear any string like a hexadecimal string.
Listing 15. preventing hexadecimal strings

The code is as follows:


If ($ _ POST ['submit '] = "go "){
$ Name = substr ($ _ POST ['name'], 0, 40 );
// Clean out any potential hexadecimal characters
$ Name = cleanHex ($ name );
// Continue processing ....
}
Function cleanHex ($ input ){
$ Clean = preg_replace ("! [\] [XX] ([A-Fa-f0-9 })!", "", $ Input );
Return $ clean;
}
?>


You may find that these operations are a little too strict. After all, the hexadecimal string has a valid purpose, such as outputting characters in a foreign language. You have to decide how to deploy the hexadecimal regex. A better strategy is to delete a hexadecimal string only when a row contains too many hexadecimal strings or the number of characters in the string exceeds a specified number (such as 128 or 255.
Cross-Site Scripting
In cross-site scripting (XSS) attacks, a malicious user often enters information in the form (or in other user input mode, these inputs insert malicious client tags into the process or database. For example, assume that there is a simple visitor register program on the site, allowing visitors to leave their names, email addresses, and short messages. Malicious users can use this opportunity to insert things other than short messages, such as inappropriate images for other users, redirect users to the trusted Cr modification pt of another site, or steal cookie information.
Fortunately, PHP provides the strip_tags () function, which can clear any content surrounded by HTML tags. The strip_tags () function also allows the list of allowed tags, such Or.
Data manipulation in the browser
A browser plug-in allows users to tamper with header and form elements on a page. Using Tamper Data (a Mozilla plugin), you can easily manipulate simple forms containing many hidden text fields to send commands to PHP and MySQL.
Before clicking Submit on the form, the user can start Tamper Data. When submitting a form, he will see a list of data fields in the form. Tamper Data allows users to Tamper with the Data, and then the browser completes form submission.
Let's go back to the example we created earlier. Check the string length, clear the HTML tag, and delete hexadecimal characters. However, some hidden text fields are added as follows:
Listing 17. hiding variables

The code is as follows:


If ($ _ POST ['submit '] = "go "){
// Strip_tags
$ Name = strip_tags ($ _ POST ['name']);
$ Name = substr ($ name, 0, 40 );
// Clean out any potential hexadecimal characters
$ Name = cleanHex ($ name );
// Continue processing ....
}
Function cleanHex ($ input ){
$ Clean = preg_replace ("! [\] [XX] ([A-Fa-f0-9 })!", "", $ Input );
Return $ clean;
}
?>


Note: One of the hidden variables exposes the table name users. You can also see an action field with the value of create. With basic SQL experience, we can see that these commands may control an SQL engine in middleware. To make a big damage, you only need to change the table name or provide another option, such as delete.
What are the remaining problems? Remote form submission.
Remote form submission
The advantage of Web is that information and services can be shared. The downside is that you can share information and services, because some people do things without scruples.
Take the form as an example. Anyone can access a Web site and use File> Save As on the browser to create a local copy of the form. Then, he can modify the action parameter to point to a fully qualified URL (not to formHandler. php points to http://www.yoursite.com/formhandler.php, because the table is on this site), make any modifications he wants, click Submit, the server will receive the form data as a valid communication stream.
First, you may consider checking $ _ SERVER ['http _ referer'] to determine whether the request comes from your own SERVER. this method can block most malicious users, but cannot block the best hackers. These people are smart enough to tamper with the reference information in the header so that the form's Remote Copy looks like it was submitted from your server.
A better way to process remote form submission is to generate a token based on a unique string or timestamp and place the token in session variables and forms. After submitting the form, check whether the two tokens match. If they do not match, someone tries to send data from the form's remote copy.
To create a random token, you can use the built-in md5 (), uniqid (), and rand () functions of PHP, as shown below:
Listing 18. defense remote form submission

The code is as follows:


Session_start ();
If ($ _ POST ['submit '] = "go "){
// Check token
If ($ _ POST ['token'] = $ _ SESSION ['token']) {
// Strip_tags
$ Name = strip_tags ($ _ POST ['name']);
$ Name = substr ($ name, 0, 40 );
// Clean out any potential hexadecimal characters
$ Name = cleanHex ($ name );
// Continue processing ....
} Else {
// Stop all processing! Remote form posting attempt!
}
}
$ Token = md5 (uniqid (rand (), true ));
$ _ SESSION ['token'] = $ token;
Function cleanHex ($ input ){
$ Clean = preg_replace ("! [\] [XX] ([A-Fa-f0-9 })!", "", $ Input );
Return $ clean;
}
?>


This technology is effective because session data in PHP cannot be migrated between servers. Even if someone has obtained your PHP source code, transfer it to your server and submit information to your server, your server only receives an empty or malformed session token and a previously provided form token. If they do not match, the remote form submission fails.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.