Introduction to dynamic taint Analysis

Source: Internet
Author: User
Tags taint valgrind
Introduction to dynamic taint Analysis
Principle

Dynamic taint analysis is a new solution that has been proposed in recent years to effectively detect various worms and automatically extract signatures for IDS and IPS. The principle is mainly divided into two parts: Dynamic stain mark and illegal operation detection, as well as more accurate extraction of signatures.

  1. Dynamic stain mark and illegal operation Detection:

    Its
    The main principle is to mark data from untrusted channels such as networks as "contaminated, the resulting data generated by a series of arithmetic and logical operations will also inherit the "whether the data is
    This way, once the Contaminated Data is detected as a jump (JMP command), call (call, RET
    ) And the target address for data movement, or other operations that fill the EIP register with contaminated data, will be regarded as illegal operations, the system generates alarms and generates the current memory, registers, and segments.
    Snapshots of network data streams within the time period are passed to the signature generation server as the original data for generating the corresponding signature. At the same time, the service will immediately terminate or continue to capture further intrusion data in the honeypot environment.

In this process, the virtual machine technology is used to perform special processing on specific commands, such as updating the bitmap that records whether the corresponding memory is contaminated or checking whether the redirection is safe.

  1. More precise signature extraction methods:

    Upper
    The original data of the signature extracted in the preceding steps is a snapshot when overflow (or other attacks) occurs, and only extracted Contaminated Data (or even the number near the contaminated EIP)
    Data), instead of the shellcode executed after the overflow is successful.
    Therefore, it is highly fixed and accurate, which is very convenient for the signature extraction subsystem to extract more common and accurate signatures, reducing the probability of false positives.

Collection
The systems implemented using the above principles are representative of taintcheck [/Ref {bib: taint}] and
Argos [/Ref {bib: Argos}]. The details and advantages and disadvantages of the two implementations are discussed in the following sections. As Argos [/Ref {bib: Argos}] is more detailed
After a detailed description of the entire process, we start with Argos.

Argos Implementation Method

Argos consists of three parts: Dynamic stain tracking and alarm, signature extraction and further integration.

The
One part, namely, the dynamic stain tracking and alarm part, is built on the famous open-source Virtual Machine qemu.
It is an extension to the user. Argos uses a one-to-one mapped bitmap to identify whether the corresponding physical memory and registers are contaminated. Each byte memory and each register uses one
Bit or a byte (both implementations) to indicate whether it is clean. Qemu will unify the instruction sets of various processors for translation cost (that is, the current host) of processor commands, Argos work
It is to process the inheritance relationship of the stain mark for different commands (for example,/EN {mov DST, Src}, DST will inherit the stain attribute of SRC), and when JMP appears,
Check the EIP when calling and other commands.
Whether the data will be modified to the stained data or contaminated data is used as a system call parameter. If so, the server is considered to have been attacked, generate an alarm and generate the snapshot and network of the current memory.
The snapshot of the stream. This is what needs to be done for the next step of signature extraction.

The second part is the signature extraction part. The first step is to save the current environment as a snapshot for signature extraction.
And the original data of the subsequent attack. This snapshot includes the values of the current registers.
Process Information, memory-related images, and network data streams transmitted in the recent period. The registers are directly obtained from the virtual machine software. The process information is obtained by injecting a dump into the current process.
Execute and execute the shellcode of process and port information. The memory snapshot is triggered at the moment when the Contaminated Data invades the EIP and finds the EIP directly.
To the sensitive memory zone, and the current process is determined to be in ring0 or
Ring3: only dump marks the corresponding level as contaminated data, which is much smaller than the previous method, there is another program (such
Tcpdump) is used to store network data streams of specific ports for a period of time.

With the original information,
Argos uses another program and the LCS (longest common subsequence) method and their own crest method to generate signatures. The principle of the crest method is to match the original
The same data segment of the data that the EIP points to and the area where the tampered EIP address appears in the network flow as the signature, and a rule in Snort format is generated based on the port and the protocol used.

The third part is the further comprehensive part of the pattern. This part is implemented through a software called sweetbait, which will further extract the pattern generated by multiple attacks through LCS, in addition to the target IP address, generate common detection rules and submit them to IDS/IPS for use.


Advantages and disadvantages of Argos

Argos
The advantage is accuracy, independent source code, and automatic generation of low false positive rate signatures. In the past, most of the programs whose source code needs to be protected can be obtained.
It ensures the security of the server through static checks of Insecure code. This method is unacceptable for commercial software and has a high false positive rate in the past. Argos
The accuracy is reflected in two aspects: accurate detection of attacks and generation of low false positive rate signatures. The cause is described in the Argos implementation description above.

Disadvantages of Argos
It is also obvious: inefficient. According to the author's experiment [/Ref {bib: Argos}], the program running on Argos is faster than running on the actual host.
Degrees will be reduced by 10-30 times, but the author also stressed that Argos
Is designed as a honeypot, so the speed of service is not the first, and because of the complexity of the network environment, such as network latency, the sacrifice of speed in this aspect may not necessarily become the speed bottleneck of the system.
There are many other Optimizations to Argos.

The second disadvantage is in the signature extraction part. Due to my limited knowledge, I cannot fully understand how the author associates "related memory" with "irrelevant ".
Memory is well recognized and matched with the network stream
I think the dump memory will still be very large. If there is another decoding process after the network stream enters the program, will it not match the same piece of data?

Another
In addition, I think another disadvantage of Argos is that the types of attacks detected are too simple, even though the author claims that other types of attacks "are beyond the scope of our
However, at the expense of such a large speed, we can only detect heap, stack overflow, and formatted string attacks, which are indeed quite limited, and [/Ref {bib: other}] indicates
Dynamic taint analysis is not impossible in terms of SQL injection and XSS cross-site attacks.

Taintcheck Implementation Method

Taintcheck
Also based on virtual machines, they are based on another open-source x86 simulator valgrind. valgrind
Machine commands are translated into your own unified instruction set ucode, and then the ucode is passed to taintcheck by taintcheck
Execute operations similar to the Argos principle or trigger an alarm based on the command type.

Taintcheck consists of taintseed, tainttracker, taintassert, and exploit analyzer.

Taintseed
Is responsible for marking all data from untrusted sources as "contaminated", each contaminated byte
There will be a corresponding pointer pointing to the structure that stores the pollution information of this byte (if this byte is not contaminated, it is a null pointer ). The authors say they have used a technology similar to the page table to make these labels occupy a lot of space.
I don't know how it is implemented. Taintseed
It checks each system call to determine which memory will be "contaminated" due to this system call, and then allocates a space for the memory to record the system call number, the snapshot of the current stack, and written data and other information
(A waste of memory ~~). Then the pointer mentioned above points to the data structure. At the same time, taintcheck can also be recorded in less detail, but only to Argos
Record whether the memory is contaminated. In terms of functions, I think taintseed can be regarded as a system call-level taintseed.

Tainttracker is a directive-level taint. It can be implemented to direct the pointer of the newly contaminated memory area generated by the instruction to the source taint memory, you can also point to a new taint structure data structure, and record the command content and stack snapshots. But obviously the latter will consume a lot of memory!

Taintassert
It is responsible for checking various dangerous operations on contaminated data, or commands that affect eip and change EIP to contaminated data, and CALLS printf.
The/% N operation appears in the family function. I think some source code is needed here. Otherwise, how can we implement the warpper mentioned in the article? In addition, the author points out that taintcheck
One disadvantage: if the program implements functions similar to printf, it cannot write wrapper.
This is exactly the case, because the formatting string attack will eventually overwrite the EIP with contaminated data (or I remember wrong ?).

When
When taintassert generates an alert, exploit analyzer will
And related contaminated memory and other information automatically generate a more accurate pattern, provided to IDS and so on. The author just put forward a general research direction here, and there is no specific implementation scheme.


Advantages and disadvantages of taintcheck

Taintcheck has the same advantages and disadvantages as Argos in principle.

However, because taint check uses a taint structure data structure to record taint-related information, it is believed that there will be higher accuracy in attack reproduction and pattern extraction, taint structure also brings higher memory consumption and time cost.

Taintcheck is more efficient than Argos if taintcheck is not just used in honeypot as the author hopes.

An improvement for them

I searched through online databases and Google papers and found a plan for optimizing taintcheck [/Ref {bib: optim}], but he was not detailed, I have not found any further information.


Monitor other types of attacks with dynamic taint tracking

Same
I think dynamc taint analysis should be able
When the attack was detected, I went online to find related papers and finally found [/Ref {bib: Other}]. The author's idea is to change the PHP parser, bash
The source code of the interpreter used to parse the script language and add it to the dynamic stain tracking function. The specific implementation is also to create a memory pollution mark Bitmap (but not a virtual machine ), for the interpreter source file

Each value assignment statement is followed by a statement that updates its tag. At the same time, two stacks are called for the function to transmit parameter contamination information and return value contamination information. For external functions, it is for the function manual
Write a warpper to modify the pollution information returned by the function.

This implementation method is very inefficient at present. It depends on open source code, and generates many false positives. Adding manual operations also reduces the generation efficiency. In addition, this method does not properly identify all kinds of Implicit Function assignments, and may falsely report some statements that use the input jump.


My opinion

The above three implementations have already been described in the analysis of their advantages and disadvantages. Next I just want to talk about my views on this aspect.

As for my current understanding of this field, I think there will be a lot to explore in three aspects:

Automatic Extraction of signatures:
In this paper, [/Ref {bib: taint}] Only points out a direction for signature extraction. [/Ref {bib: argos}] only proposes a simple algorithm for signature extraction. It is the most meaningful direction to study how to make full use of the data that is dynamically tracked to generate more precise signatures.
Further Optimization of stain tracking:
It mainly refers to the study on the improvement of the system's efficiency, how to process the spread of stains and the generation of logs more quickly.
Detection of other types of attacks:
[/Ref {bib: Other}] only proposes a feasible solution, but it is definitely not the best method, this research seems to require further in-depth exploration of the Nature of injection attacks, cross-site attacks, and other attack methods.

References
  1. /Label {bib: taint}James Newsome, Dawn Song.
    Dynamic taint Analysis for automatic detection, analysis, and signature
    Generation of exploits on commodity software. NDSS 2005.
  2. /Label {bib: Argos}Georgios portokalidis, Asia slowinska, Herbert Bos.
    Argos: an emulator for fingerprinting zero-day attacks. eurosys 2006.
  3. /Label {bib: Other}Wei Xu, Sandeep Bhatkar, R. Sekar.
    Practical Dynamic taint Analysis for countering Input Validation attacks on Web applications. seclab-05-04.
  4. /Label {bib: optim}{Mohammed Ajmal, Aly merchant.
    Optimizing taintcheck.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.