character encoding and SQL injection in white-box auditing

Source: Internet
Author: User
Tags sql error



Although all programs are now calling for Unicode encoding, all websites use UTF-8 encoding for a unified international specification. However, there are still a lot of CMS, including domestic and foreign (especially non-English-speaking countries), still use a set of their own country code, such as GBK, as their default encoding type. There are also some CMS in order to consider the old users, so out of GBK and utf-8 two versions.



We will take the GBK character code as a demonstration, the curtain is opened. GBK is a multi-character encoding, specifically defined by Baidu itself. But there is one place in particular to note:



Typically, a GBK encodes a Chinese character, taking up 2 bytes. A utf-8 encoded Chinese character that occupies 3 bytes. In PHP, we can use the output



Echo strlen ("and");


To test. Output 2,utf-8 When the page encoding is saved to GBK is 3.



All ANSI encodings are 2 bytes apart from GBK. ANSI is just a standard, on the unused computer It may represent a different encoding, such as the Simplified Chinese system ANSI is represented as GBK.



The above is a little bit about multi-byte coding, only we have enough knowledge of its composition and characteristics, in order to better analyze its problems.



Having said so much nonsense, now let's look at the various problems that are caused by character encoding in SQL injection.


0x01 wide-character injection in MySQL





This is an old topic, and has been played countless times. But as the prelude to our passage, it is also the foundation that must be mentioned.



Let's build an experimental environment first. Call it a Phithon content management System v1.0, first create a new database, the following compression package in the SQL file import:



Test code and database: HTTP://PAN.BAIDU.COM/S/1EQMUARW extract Password: 75tu



Later, the Phithon content management system will be refined, but will always use this data sheet.



The source code is very simple (note that you first close your PHP environment MAGIC_QUOTES_GPC):


<?php
/ / Connect to the database part, pay attention to the use of gbk encoding, fill in the database information
$conn = mysql_connect(‘localhost‘, ‘root‘, ‘toor!@#$‘) or die(‘bad!‘);
Mysql_query("SET NAMES ‘gbk‘");
Mysql_select_db(‘test‘, $conn) OR emMsg("Failed to connect to the database, did not find the database you filled in");
/ / Execute sql statement
$id = isset($_GET[‘id‘]) ? addslashes($_GET[‘id‘]) : 1;
$sql = "SELECT * FROM news WHERE tid=‘{$id}‘";
$result = mysql_query($sql, $conn) or die(mysql_error()); //sql error will give an error, easy to observe
?>
<!DOCTYPE html>
<html>
<head>
<meta charset="gbk" />
<title>News</title>
</head>
<body>
<?php
$row = mysql_fetch_array($result, MYSQL_ASSOC);
Echo "<h2>{$row[‘title‘]}</h2><p>{$row[‘content‘]}<p>\n”;
Mysql_free_result($result);
?>
</body>
</html> 



The SQL statement is the SELECT * from news WHERE tid= ' {$id} ', which extracts the article from the news table based on the ID of the article.



In front of this SQL statement, we used a addslashes function to escape the value of $id. This is usually the CMS in the operation of SQL injection, as long as our input parameters in single quotation marks, will not escape the limit of single quotation marks, can not be injected.









So how to escape the addslashes limit? It is well known that the effect of the addslashes function is to let ' become ', so that the quotation marks become no longer "single quotes", just one-off. The general way of bypassing is to try to handle \ ' front \:


1. Find a way to add a \ (or a single number), into \ \, so that \ was escaped, ' escaped the limit 
2. Find a way to get it.



Our wide-byte injection here is a feature of MySQL, which, when used with GBK encoding, considers two characters to be a Chinese character (the previous ASCII code is greater than 128 before the range of Chinese characters). If we enter%DF ' See what happens:




We can see that the error has been made. We see an error stating an error in the SQL statement, and see that the error description can be injected.



Why from just now, just in the ' that is%27 in front add a%df on the error? And you can see that the cause of the error is more than one single quotation mark, and the single quotation mark before the backslash is missing.



This is the feature of MySQL, because GBK is multi-byte encoding, he thinks that two bytes represents a Chinese character, so%df and the back of the%5c become a Chinese character "", and ' escaped out.



Because two bytes represents a Chinese character, we can try "%df%df%27":









Not an error. Because%DF%DF is a Chinese character,%5c%27 is not a Chinese character, still is \ '.



So MySQL how to judge a character is not a kanji, according to GBK encoding, the first byte ASCII code is greater than 128, basically can be. For example, we do not need to%DF, with%A1 can also:









%a1%5c He may not be a Chinese character, but will be considered by MySQL to be a wide character, can let the back of the%27 escape out.



So I can construct an exp out, query the Administrator account password:






The difference between 0x02 GB2312 and GBK



There was a problem that had been bothering me for a long time.



GB2312 and GBK should all be part of a wide-byte family. But let's do a little experiment. Modify the set names in the Phithon content management system to gb2312:









The result is that it cannot be injected:






Some students do not believe, you can also change the database code to gb2312, is also unsuccessful.



Why, this is due to the range of gb2312 encoded values. Its high-level range is 0xa1~0xf7, the low range is 0xa1~0xfe, and \ is 0x5c, is not in the low range. So, 0x5c is not the code in the gb2312, so nature will not be eaten.



So, to extend this idea to all the multibyte encodings in the world, we can assume that wide-character injection can be done as long as the low range contains 0x5c encoding.



0x03 mysql_real_escape_string solve the problem?



Some CMS have an understanding of wide-byte injection, and then seek a solution. In the PHP documentation, you will find a function, mysql_real_escape_string, that the document says, considering the current character set of the connection.









As a result, some CMS replaced Addslashes with mysql_real_escape_string to protect against wide-character injection. We continue to do experiments, Phithon Content Management System v1.2:, with mysql_real_escape_string to filter input:






Let's try to inject:






Like no pressure to inject. Why, obviously I used mysql_real_escape_string, but still can't resist wide character injection.



The reason is that you did not specify a character set for PHP to connect to MySQL. We need to call the Mysql_set_charset function before executing the SQL statement, setting the current connection's character set to GBK.






You can avoid this problem:





The repair of 0x04 wide character injection


In 3 We talked about a fix, which is to call the Mysql_set_charset function to set the connection using the character set as GBK, and then call mysql_real_escape_string to filter the user input.



This way is feasible, but there are some old CMS, in many places using addslashes to filter strings, we can not go to a addslashes all modified to mysql_real_escape_string. Our second solution is to set the character_set_client to binary (binary).



Just specify before all SQL statements that the form of the connection is binary:



mysql_query ("SET character_set_connection=gbk, Character_set_results=gbk,character_set_client=binary", $conn);



What do these variables mean?



When our MySQL receives the data from the client, it will think that his code is character_set_client, then it will be replaced with character_set_connection code, then into the specific table and field, and then converted to the corresponding encoding of the field.



Then, when the result of the query is generated, it is converted from the table and field encoding to Character_set_results encoding, which is returned to the client.



Therefore, we set the character_set_client to binary, there is no wide-byte or multi-byte problem, all data in the form of binary transmission, can effectively avoid wide character injection.



For example, the v2.0 version of our Phithon Content management system is updated as follows:






has not been able to inject:






In the code I've audited, most CMS avoids wide-character injection in this way. This method can be said to be effective, but if the developer adds something to the lily, it will make the previous effort naught.


Fatal consequences of 0x05 Iconv


Many CMS, more than one, I do not mention the name, their GBK version is due to the character encoding caused by the injection. But some students said that they tested these CMS wide character injection, no effect, is not their posture is wrong?



Of course not. In fact, this chapter is no longer a wide-character injection, because the problem is not in MySQL, but in PHP.



A lot of CMS (really a lot of oh, do not believe everyone on the internet to find) will receive data, call such a function, transform its code:



Iconv (' Utf-8 ', ' GBK ', $_get[' word ');



The purpose is generally to avoid garbled characters, especially in the search box location.



For example, our Phithon Content management System v3.0






We can see that it sets the character_set_client to binary before the SQL statement is executed, so you can avoid the problem of wide character injection. But then it called the Iconv to convert the filtered parameters $id to a bit.



Let's try to inject it at this point:






Actually gave an error. Description can be injected. And I just entered a "Kam & #039;". What is the reason for this?



Let's analyze it for a moment. "Kam" This word, its utf-8 code is 0XE98CA6, its GBK code is 0xe55c.






Some students may have grasped it. The ASCII code is 5c. Then, when our Kam-iconv from Utf-8 converted to GBK, turned into a%e5%5c, and the back of the ' was addslashes into%5c%27, so that the combination is%e5%5c%5c%27, two%5c is \ \, just the anti-slash escaped, Cause ' escape out of single quotation marks, resulting in injection.



This is taking advantage of what I said before, the first of two ways to bypass addslashes: to escape.



So, what if I use Iconv to convert GBK into Utf-8?






Let's try it out:






Sure enough, it succeeded again. This time it is directly injected with a wide character, but the problem is actually in PHP instead of MySQL. We know a GBK Kanji 2 bytes, utf-8 Kanji 3 bytes, if we convert GBK to Utf-8, then PHP will convert every two bytes. So, if the characters in front of it are odd, it is bound to swallow \, ' Escape the limit.



So why didn't you use this posture before Utf-8 converted to GBK?



This is related to Utf-8 's rules, UTF-8 's coding rules are simple, only two:


1) For single-byte symbols, the first bit of the byte is set to 0, and the next 7 bits are the unicode code for this symbol. So for English letters, UTF-8 encoding and ASCII are the same.
2) For the n-byte symbol (n>1), the first n bits of the first byte are set to 1, the n+1th bit is set to 0, and the first two bits of the following byte are set to 10. The remaining bits that are not mentioned are all unicode codes for this symbol 


From 2 we can see that for multibyte symbols, its 2nd, 3, 4 bytes of the first two bits are 10, that is, \ (0x0000005c) does not appear in the Utf-8 encoding, so utf-8 conversion to GBK, if there is a \ then PHP will error:






But because the GBK code contains the \, so can still be used, but the use of different ways.



All in all, after we've processed the wide-character injection of MySQL, don't think it's safe to worry about. Be careful when calling iconv and avoid unnecessary hassles.


0X06 Summary


In the gradual internationalization of today, the implementation of UTF-8 coding is a major trend. In terms of security, I also feel that the use of UTF-8 encoding can avoid many multibyte-caused problems.



Not only is GBK, I just habitually put GBK as a typical example in the text with you to explain. There are many multi-byte coding in the world, especially in Korea, Japan and some non-English-speaking countries CMS, there may be a security problem caused by character encoding, we should have a scalable thinking.



Summarize the security issues raised by character encoding mentioned in this article and their solutions:


1. The wide character injection problem caused by gbk encoding, the solution is to set character_set_client=binary.
2. Correcting people's misunderstanding of mysql_real_escape_string, calling set name=gbk and mysql_real_escape_string separately can't avoid wide character injection. You have to call mysql_set_charset to set the character set.
3. Be careful to use iconv to convert string encodings, which is prone to problems. As long as we set the front end html/js/css all encoding to gbk, mysql/php encoding is set to gbk, there will be no garbled problem. You don't have to add a little to call the iconv conversion code, causing unnecessary trouble. 


This article is my own white box audit experience a little summary, but I do in many aspects of the lack of the text mentioned in the posture is inevitably flawed and wrong, I hope that the same hobby classmates can point out with me, common progress.



This article does not resemble the previous XSS, can cite many 0day examples to demonstrate the harm caused by wide characters. There are two reasons:


1. The problem of wide characters is not as common as the rich text xss. The proportion of cms encoded by gbk is also relatively small. It is strange that I am too shallow to learn, and I can't find corresponding examples in every chapter.
?? 

2. The harm of injection is much larger than xss. If it is sent as 0day, the impact is very bad. But I did find a lot of coding problems with cms in the writing of the article and the previous audit process.

So I use the form of experiments, I wrote the PHP small file, to everyone as an example, I hope not because of the lack of examples, affect the effect of learning.



Example PHP file and SQL file package download:



Link: Http://pan.baidu.com/s/1eQmUArw Extract password: 75tu



This document is in PDF version: Link: http://pan.baidu.com/s/1eprLs Password: yoyw



character encoding and SQL injection in white-box auditing


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.