The art of language compilation and parsing-concealed security skills and security issues

Last Update:2013-11-20 Source: Internet

Author: User

Tags mysql injection c hello world

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the afternoon, a friend asked some php stuff and later thought that QZ was writing a php variable SECURITY Article recently.

So I went to the read-through address as follows:

Talking about PHP variable security: http://www.bkjia.com/Article/201110/108389.html
PHP variable safety continued: http://www.bkjia.com/Article/201110/108536.html
Talking about PHP variable security Plugin: http://www.bkjia.com/Article/201110/108551.html

In this article, what QZ mainly wants to express is:

In what scenarios will variables become code execution? You can understand php security through prototype.

By learning the basics of point compilation principles

Let me extend something interesting :)

Bytes ---------------------------------------------------------------------------------------------------------------

Directory

0x01 Language Features

0x02 current web defense Layers

0x03 language security for the next battlefield

Bytes ---------------------------------------------------------------------------------------------------------------

[*] 0x01 Language Features

Unusual versions of the so-called xx language are circulating on the Internet

Well-known

Six abnormal C Hello World programs
Http://www.bkjia.com/kf/201110/108778.html

Abnormal JavaScript code

Http://utf-8.jp/public/jjencode.html

Non alphanumeric code

Http://www.thespanner.co.uk/2011/09/22/non-alphanumeric-code-in-php/

Its originator seems to be that brainfuck has played wargame before.

Http://zh.wikipedia.org/zh/Brainfuck

More unusual metaprogramming

Http://zh.wikipedia.org/zh-cn/%E5%85%83%E7%BC%96%E7%A8%8B

You can generate your own or other languages from the language.

We may encounter the following situation when using webshell in php:

A page with Code Injection vul. php

However, it may have the malicious code detection function.

Checks whether dangerous function names, such as eval system, are passed.

We can use vul. php to write a new php file.

Use fopen fwrite and other unfiltered Functions

Write a new webshell file shell. php.

For eval detection

We can also use string combinations to bypass signature detection.

$ A = 'ev ';

$ B = 'al ';

$ C = $ a. $ B

You can come up with a variety of combination Techniques

From these examples, we can see that

We do not know much about many features of the language as we think.

However, we generally do not pay attention to it.

The reason is as follows:

These writing methods are usually not used in the development field.

This code writing method is hard to read and maintain.

And software engineering principles are intended to violate

But note that we usually look at these language features from the perspective of programmers/developers.

From another point of view, what will happen from the perspective of security personnel?

1. Special writing format Bypass Detection (javascript abnormal writing format bypass pattern detection shellcode explicit character encoding bypass IDS dangerous character detection, etc)

2. Special writing format hiding bypass signature scanning tools (like Tiny Php Webshell encoding methods can bypass many webshell detection tools)

3. metaprogramming Bypass Detection (use the Code itself to generate new code to bypass signature detection)

4. security problems that may be caused by language features (for example, some security problems with variable variables written by QZ)

Bytes ---------------------------------------------------------------------------------------------------------------

[*] 0x02 current web defense Layer

From fuzzing to I know all of it is the ultimate goal of hackers on knowledge.

I used to have an idea for testing based on the Compilation Principle (it was the first time I had a discussion with Dr. Shi)

Http://www.bkjia.com/ebook/201110/29961.html

The idea at the time was to use the black box test of SQL Injection (whether attack or defense)

The general idea is

Currently, web defense is generally based on the pattern detection written by security experience.

If a gorgeous hacker thinks of a special technique that bypasses this pattern detection

Then the web defense will be ineffective.

Malicious user input to form SQL Injection

Then his input will inevitably be a subset of the SQL language

It is a part of the language and can be recognized as the language.

Therefore, it obtains user input through a script and passes it to the database application for interpretation and execution.

In this process, it must undergo lexical scanning and semantic recognition of database applications.

In the same way, if I have a complete set of lexical scanning path maps and semantic trees

I can fully understand how user input is recognized as a word or even a last sentence.

Note that in all cases, I can know whether it is encoding or format.

I know how a piece of data is recognized as a language

On the contrary, I can identify all the data as a language and use a complete set of signatures to block dead and inject attacks.

For example, write a plug-in for the database software.

Between the script language and the database command interpretation and Execution Component

Ability to fully recognize data and code

Block malicious user input

The key to such security problems lies in the recognition of user input.

Data or code

In fact, the current network interaction and communication are based on a variety of protocols, a variety of environments, a variety of languages combined into a system

Language Recognition is everywhere.

Both compiled and interpreted languages

The underlying recognition process of this language is the process of lexical scanning and semantic recognition.

The following figure shows a simple lexical recognition path.

The mainstream web Security Detection we see on the market is pattern detection.

(It is usually used for recognition of language keywords or malicious code, such as detecting whether user input contains the SQL language keyword "select)

The current detection level has not yet risen to the level of data and code Dynamic Identification Detection

Therefore, Code Injection attacks may occur.

Typical Code Injection

SQL Injection (for example, MySQL Injection)

Command Injection (for example, OS Command Injection)

Script Injection (for example, php Code Injection)

Causes of such security issues

1. This type of language/environment allows dynamic transmission and interpretation of execution (For ease of use, this is understandable and unavoidable)

2. There is no data and Code recognition detection mechanism to defend against user Code Injection (not yet available on the market)

If you reclassify a security level

We can have a new way to classify

Protocol Security (security between protocols: ddos caused by icmp)

Environment Security (Operating System Security: overflow Elevation of Privilege System File Format features other special features of the system)

(Application Software's own security: a connection point between system security and language security. Two types of security problems may occur on it)

Language Security (user input security: malicious code injection)

Bytes ---------------------------------------------------------------------------------------------------------------

[*] 0x03 language security for the next battlefield

At present, there seems to be no dedicated security research in the field of language recognition.

Of course, the cost of studying this is indeed very high.

But once someone studies it

It will lead to a new form of security storm.

Let's talk about the cost first.

Or the idea of my testing based on compilation principles

At first, I discussed with Dr. Shi about the feasibility of implementation.

However, as a defense component, it is still unknown whether the load is appropriate in the case of large traffic volumes.

And the cost is unacceptable for a non-research company or department.

What should I do to complete it?

Collect complete lexical scan path charts and semantic trees

We use MySQL injection defense as an example.

MySQL is open-source and relatively feasible.

You can find its lexical files and so on.

However, it takes time to rebuild it into a very large and complete lexical scanning path table and semantic tree.

The cost is not at least 1-2 months (at least for me)

People with compilation principles may be more efficient in doing this.

What about closed-source applications like MSSQL?

To restore the complete lexical scan path table and semantic tree

I'm afraid it's not that simple.

Of course, we can also use the analogy to solve the problem of MySQL first.

MSSQL is at least not too far away

Open-source software such as php MySQL is willing to spend the cost or have an efficient team to do this.

The irony is that the creators of these open-source software are actually the best candidates to restore these resources ......

Naturally, they designed and implemented the most primitive lexical scan path tables and semantic tree resources.

Once restored, as I mentioned earlier, both attack and defense will be fully upgraded.

Because we have understood all the data-> Code Recognition Methods

According to this rule, various variants can be derived to bypass the pattern detection written based on security experience on the market.

The security attack and defense of the entire Internet will enter a brand new field.

Finally, as a person, I am very optimistic about language security.

Welcome to all kinds of shoes and Corrections :)

From: hi.baidu.com/hackercasper

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More