Malicious PDF file detection ideas

Last Update:2013-11-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: Cryin
Connection: http://hi.baidu.com/justear/blog

Overview

For PDF File Parsing, you must first familiarize yourself with all types of PDF files. It seems that all official PDF files are in English. In this way, you can't do it. If you are confident in your English, go here and see [1]. In addition, you can only find some relevant materials written in China. After getting familiar with all kinds of PDF files, how can we Parse them? My current method is to find the keyword segment in the PDF file. the drawback is that the content contained in the stream object in the Obj object cannot be searched. In addition, some PDF vulnerability files use obfuscation technology, so there is no good way to parse such PDF files. As follows:

% PDF-1.5

1 0 obj

</#54 #79 P #65 R 0 5 O # 70e # 6e # 41c # 74i # 6fn 3 Pages C # 61ta # 6c # 6f #67>

Endobj

Keywords

Here we will consider the general malicious PDF file, mainly to find and parse the following key fields (I personally think it is irrelevant to the vulnerability), as shown below:

· Obj

· Endobj

· Stream

· Endstream

· Xref

· Trailer

· Startxref

·/Page

·/Encrypt

·/ObjStm

·/JS

·/JavaScript

·/AA

·/OpenAction

·/Terraform

·/URI

·/Filter

·/JBIG2Decode

·/RichMedia

·/Launch

Analysis ideas

In this example, almost every PDF file contains the first seven fields, and may not contain stream or endstream. It is said that some PDF files do not have xref or trailer, but this situation is rare. If a PDF file does not have an xref or trailer keyword segment, you can determine that it is not a malicious PDF file.

The/xref cross-reference table describes the serial number, version, and absolute file location of each indirect object. The first index in the PDF document must start with the 0 object whose version is 65535, and the first number after the identifier/xref is the number of the first indirect object (that is, the 0 object, the second number is the size of the/xref table.

/Page indicates the number of pages of a PDF file. Most Malicious PDF files only have one Page.

/Encrypt indicates that the PDF file has a digital watermark or is encrypted.

/ObjStm is the number of object streams. Here we need to understand that object streams is a data stream Object that can contain other object objects.

/JS and/JavaScript indicate that the PDF file contains JavaScript code. Almost all of the malicious PDF files I have seen are embedded with JavaScript code. Here, JavaScript Parsing Vulnerabilities are usually used or JavaScript is used to implement heap spray ). Of course, you must note that JavaScript code is also found in many normal PDF files.

/AA,/OpenAction, And/terraform indicate that when you view a PDF file or a page of a PDF file, automatic actions are executed with it, almost all malicious PDF files with JavaScript code embedded have the action to automatically execute JavaScript code ). If a PDF file contains a keyword segment for/AA or/OpenAction to automatically execute an action and contains JavaScript code, this PDF file is likely to be a malicious PDF file.

/URI: This keyword field is required if you want to open a webpage in a PDF file.

/The Filter is generally FlateDecode, that is, the zlib compression and decompression algorithm is used. For details, refer to [2].

/JBIG2Decode indicates that the PDF file is compressed using JBIG2. Although JBIG2 compression itself may have a vulnerability (CVE-2010-1297 ). However, the/JBIG2Decode keyword does not indicate whether the PDF file is suspicious.

/RichMedia Flash file

/Launch execution action count

The final task is to check whether the object of the PDF file and the object comply with Adobe's PDF file format specifications. Based on the keyword Fields described above, this article analyzes whether the PDF file may be a malicious file.

Conclusion

Use the above ideas and follow my current test. The accuracy of malicious PDF file detection is quite good, but it cannot be accurate to detect malicious PDF files, especially the analysis of some PDF files that have undergone so-called obfuscation technology or special processing. There is no good solution yet. If you have any good ideas and ideas, please feel free to contact me. I certainly have better ideas and methods to detect malicious PDF files more accurately. I am very curious about how anti-virus software works. Maybe one day I can try to kill a soft company!

Reference

[1] html "> http://www.adobe.com/devnet/pdf/pdf_reference.html
[2] http://www.zlib.com/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Malicious PDF file detection ideas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Malicious PDF file detection ideas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support