Probing into wide byte injection vulnerability and patching principle

Source: Internet
Author: User

0. Preface

Recently, in order to automate the collection of all PHP vulnerabilities, in the collation of injection, found that the use of iconv caused by wide-byte injection of the principle of the loopholes did not really understand, the online article also said is not very clear, so read the Rongo (LXSEC) A previous post /8611.html, together with our two-person discussion, ended up with this in-depth study.

1. Overview

This is mainly due to the use of a wide-byte encoding.

What is a character set.

The mapping relationship between the character graphic displayed by the computer and the binary encoding when the character is saved.

As in ASCII, A (graph) corresponds to the encoding 01000001 (65).

For MySQL databases, the places involved in character sets are roughly divided into storage and transmission, namely:

(1) What code is stored on the server side of the data

(2) The encoding used for data transmission when the client interacts with the server.

2. mysql server-side storage character Set

When data is stored on the MySQL server side, the character set is allowed at the following levels:

(1) server-side character set (Character_set_server)

(2) Library character set

(3) Table character set

(4) field character set

Priority is: Field-----> Table-------> Library--------> Server

The corresponding syntax is:

Create table Test (
name varchar charset GBK, number
varchar (), age
) Engine=innodb charset= Utf-8;

3, client and server interactive data transmission of the character set

The character set at store time has been determined and does not affect the character set of the interaction phase.

In MySQL, there is also a middle-tier structure, which is responsible for the connection between the client and the server, so called the connection layer.

The process of interaction is as follows:

(1) The client to a certain character set generated SQL statements sent to the server side, this "some kind of character set" is in fact arbitrary provisions, PHP as a client connection to MySQL, this character set is the PHP file default encoding.

(2) The server converts this SQL statement into the character set of the connection layer. The question is how MySQL knows what encoding this SQL statement is. At this time mainly rely on two MySQL internal variables to represent, one is character_set_client (client's character set) and character_set_connection (connection layer of the character set). You can use show variables like ' character_set_% ';

As you can see, the client character set here is GBK and the connection layer character set is GBK.

The two are the same, there will be no problem, if not consistent, there will be garbled problems.

These internal variables can be set by using the SET command in MySQL, such as modifying the client code to UTF-8;

Set character_set_client = UTF-8

(1) The server will convert the SQL statements into the server internal code and stored on the server to interact with the data

(2) After the server processing, the results returned to the client, or to the server that the client can recognize the code, such as the GBK of the above, using Character_set_results to determine the return of the client's code.

Normally written in PHP in the set names UTF-8 equivalent to the following three simultaneous execution:

(1) Set character_set_client = UTF-8

(2) Set character_set_connection = UTF-8

(3) Set character_set_results = UTF-8

4, garbled problem principle

Set three character sets the same, this will not appear the true principle of garbled. The Web page is sometimes garbled because the PHP dynamic file will print data to the browser, the browser will also be judged according to a certain character set, if the PHP response data encoding and browser coding consistent, will not appear garbled, otherwise there will be garbled. You can specify the encoding of this response data by using the header () in PHP.

5, the principle of wide byte injection

There are three different forms:

(1) Scenario One: Use Mysql_query ("Set names GBK") in PHP, and specify three character sets (client, connection layer, result set) are GBK encoded.

Scenario Code:

mysql_query ("Set names GBK");
$bar = addslashes ($_get[' bar '));
$sql = "Select password from user where bar= ' {$bar} '";
$res = mysql_query ($sql);

Submitted by: Http://

At this point, the following transformation occurs:

%df%27===== (addslashes) ======>%df%5c%27====== (GBK) ======> '

Bring into SQL as:

Select password from user where bar= ' run '

The single quotation mark was successfully closed. To avoid vulnerabilities, Web sites typically set UTF-8 encoding and then escape filtering. However, because of some casual character set conversion, it can lead to vulnerabilities.

(2) Scenario two:

The UTF-8 character set is specified using the Set names UTF-8, and escape functions are also used to escape. Sometimes, in order to avoid garbled, will some user submitted GBK characters to use the Iconv function (or mb_convert_encoding) first to UTF-8, and then stitching into the SQL statement.

Scenario Code:

mysql_query ("Set names UTF-8");
$bar =iconv ("GBK", "UTF-8", Addslashes ($_get[' bar]);
$sql = "Select password from user where bar= ' {$bar} '";
$res = mysql_query ($sql);

We can see that in order to keep the character set in the SQL statement consistent, it is common to use iconv and other character set conversion functions for character set conversion, the problem is in the GBK to UTF-8 conversion process.

Submitted by: Http://

Transformation process: (e55c converted to UTF-8 to E98CA6)

e55c27==== (addslashes) ====>e55c5c5c27==== (iconv) ====>e98ca65c5c27

As you can see, an extra 5c is used to escape the escape character (the backslash) itself, making the%27 of the rear play a role.

The test is as follows:

(3) Scenario three: Use Iconv for character set conversion, UTF-8 to GBK, and set names character set to GBK. Submit%e9%8c%a6.

The premise of this scenario is to encode and then escape:

e98ca6==== (iconv) =====>e55c===== (addslashes) ====>e55c5c

The same can be more than one backslash to use, no longer detailed here, because the vulnerability conditions are more stringent.

6. Security Plan

One of the best fixes for a wide-byte encoding is:

(1) Use Mysql_set_charset (GBK) to specify the character set

(2) using mysql_real_escape_string to escape

The principle is that mysql_real_escape_string differs from addslashes in that it takes into account the current set of character sets and does not appear in front of the E5 and 5c stitching into a wide byte problem, but how is this "current character set" determined.

is to specify using Mysql_set_charset.

The above two conditions are "and" the relationship between the operation, less than one.



The effect is obvious.

Reprint please indicate the source:


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.