Estools assists in anti-obfuscation Javascript

Source: Internet
Author: User
Tags mathematical functions

Estools assists in anti-obfuscation Javascript

0x00 Preface

Javascript is a script language running on the client. Its source code is completely visible to users. But not every js developer wants their code to be directly read, such as the malware makers. To increase the difficulty of code analysis, the obfuscation tool has been applied to many malware (such as 0-day Trojans and cross-site attacks. To expose malware, analysts must first deobfuscate the script.

This article introduces some common obfuscation methods and the introduction of static code analysis by estools.

0x01 common obfuscation Methods Encryption

The key idea of such obfuscation is to encode the code to be executed, restore the valid scripts that can be executed by the browser during execution, and then execute them. It looks like the shelling of executable files. Javascript provides the ability to use strings as Code Execution (evaluate). You can use the Function constructor, eval, setTimeout, and setInterval to pass strings to the js engine for parsing and execution. The most common feature of base62 encoding is that the generated code starts with eval (function (p, a, c, k, e, r.

No matter how code is deformed, it will eventually call functions such as eval once. The decryption method does not need to perform any analysis on its algorithm, but simply finds the final call and changes it to console. log or other methods, output the decoded results of the program according to the string. The automation implementation method has been described in many articles and will not be described here.

Implicit write

Strictly speaking, this cannot be called obfuscation. It just hides the js Code in a specific media. For example, the lowest valid bit (LSB) algorithm is used to embed the image into the RGB channel, hide the EXIF metadata of the image, and hide the HTML blank characters.

For example, this sensational topic: [A picture shows you black: embedding a malicious program in an image]. The PPT shows that it uses the least effective bit plane algorithm. Combined with HTML5 canvas or TypeArray that processes binary data, the script can extract the hidden data (such as code) from the carrier ).

The implicit write method also requires Decoding of programs and dynamic execution. Therefore, the method of cracking is the same as that of the former. In the context of a browser, hijacking replaces key function calls, change to text output to get the code hidden in the carrier.

Complex expressions

Code obfuscation may not necessarily call eval. You can also add invalid commands in the Code to increase Code complexity and greatly reduce readability. Javascript has many crazy features. These features can combine the original simple Literal, MemberExpression, and CallExpression) and other code snippets become difficult to read.

Javascript contains strings, numbers, and regular expressions.

The following is a simple example.

There are two methods for accessing an object: vertex operator and subscript operator. Call the eval method of window, which can be written as window. eval () or window ['eval'];

To make the code more abnormal, The obfuscator uses the second method, and then writes the article on the string literal. Split the string into several parts: 'E' + 'V' + 'al ';

This seems quite obvious. We can use a number conversion technique: 14 .. toString (15) + 31 .. toString (32) + 0xf1. toString (22 );

If you do not do anything, expand the number: (0b1110 ). toString (4 <2) + (''. charCodeAt ()-1 ). toString (Math. log (0x100000000)/Math. log (2) + 0xf1. toString (11 <1 );

Final effect: window [(2*7 ). toString (4 <2) + (''. charCodeAt ()-1 ). toString (Math. log (0x100000000)/Math. log (2) + 0xf1. toString (11 <1)] ('alert (1 )')

Many such reciprocal operations can be found in js. By using a random generation method to combine them, simple expressions can be infinitely complicated.

0x02 static analysis implementation Parse and transform code

In this article, the idea of implementing anti-obfuscation in Javascript is to simulate the predictable results of code execution. Compile a simple script execution engine and only execute code blocks that conform to certain predefined rules, finally, the original lengthy code is replaced with the calculation result to simplify the expression.

If you have a preliminary understanding of the principles of the script engine interpreter, you can know that the interpreter will perform lexical analysis and syntax analysis on the source code to "read" the code, converts the code string to the Abstract Syntax Tree (AST) data format.

Such as this Code:

Var a = 42; var B = 5; function addA (d) {return a + d;} var c = addA (2) + B;

Corresponding syntax tree

(Generated by JointJS's online tool)

Without JIT technology, the interpreter can traverse all nodes of the entire tree in depth first from the root node of the syntax tree and execute the commands analyzed on the nodes one by one, the returned results are not returned until the script ends.

There are many tools to generate an abstract syntax tree using js Code, such as parser in the UglifyJS zip and esprima used in this article.

Esprima provides the following interfaces:

? var ast = require('esprima').parse(code)

In addition, Esprima provides an online tool to parse arbitrary (legal) Javascript code into AST and output: http://esprima.org/demo/parse.html

Combined with several auxiliary libraries of estools, you can perform static code analysis on js:

Escope Javascript scope analysis tool

Esutil auxiliary function library to check whether the syntax tree node meets certain conditions

The estraverse syntax tree traverses the auxiliary library, and the interface has a similar way to parse XML

Esrecurse another syntax tree traversal tool that uses Recursion

Esquery uses the css selector syntax to extract qualified nodes from the syntax tree.

Escodegen and esprima are opposite each other, And the syntax tree is restored to code

The traversal tool used in the project is estraverse. It provides two static methods: estraverse. traverse and estraverse. replace. The former only traverses the AST nodes and controls whether to continue traversing to the leaf nodes through the return value. The replace method can directly modify the AST during the traversal process to implement code refactoring. For specific usage, refer to its official documentation or the sample code included in this article.

Rule Design

Start with the actual code. Recently, some XSS worms encountered code obfuscation similar to the following:

Observe the code style and find that the obfuscator has done the following:

Character string literal obfuscation: First extract all strings, create a string array in the global scope, escape characters to increase reading difficulty, and then replace the occurrence of the string with the reference of the array element.

Variable name obfuscation: Unlike the shortened names of compressors, The underlined and numeric formats are used here, and the distinction between variables is very low, which is more difficult to read than a single letter.

Member operator obfuscation: Replace the dot operator with the string subscript form, and then confuse the string

Delete unnecessary blank characters: reduce the file size. This is what all compressors will do.

After searching, such code is probably generated in the free version of javascriptobfuscator.com. The three options available in the free version (Encode Strings/Replace Names) also confirm the observed phenomenon.

In these transformations, variable name obfuscation is irreversible. If you can intelligently name variables, the tool is also good. For example, this jsnice website provides an online tool that can analyze the specific functions of variables and automatically rename them. Even if it is not perfect, we can use a manual method to use the code refactoring function of IDE (such as WebStorm) and perform manual renaming and Restoration Based on code behavior analysis.

Then let's look at the processing of the string. Because the string will be extracted to a global array, we can observe this feature in the syntax tree: A VariableDeclarator appears in the global scope, and its init attribute is ArrayExpression, and all elements are Literal-This indicates that all elements in this array are constants. Simply evaluate it and associate it with the variable name (identifier. Note: to simplify the processing, the variable name scope chain issue is not taken into account. In js, the scope chain has the priority of the variable name. For example, the global variable name can be redefined by a local variable. If the obfuscator is abnormal, the same variable name is used in different scopes, And the obfuscator does not have a processing scope, it will cause the code to be parsed to error.

In the test program, I set the following replacement rules:

A String Array declared by a global variable. Its value is referenced directly by a numerical subscript in the code.

Result-determined binary operations, such as 1X2 + 3/4-6% 5

Source of the regular expression literal, length of the string literal

Returns values of join, reverse, slice, and other methods.

Return values of methods such as substr/charAt of string constants

For global functions such as decodeURIComponent, replace all their parameters with their return values.

Calls mathematical functions whose results are constants, such as Math. sin (3.14)

As for the reduction of indentation, This is the built-in function of escodegen. Use the default configuration when you call escodegen. generate to generate code (ignore the second parameter.

DEMO program

This obfuscator prototype is placed on GitHub: https://github.com/ChiChou/etacsufbo

For the running environment and usage, see README of the repository.

Slave? You might not need jquery excerpted a piece of code and put it in javascriptobfuscator.com to test obfuscation:

Will confuse the result:

Although the variable name is still poorly readable, you can see the code behavior.

The demo program has a lot of limitations at present. It can only be regarded as a semi-automatic auxiliary tool, and there are many unimplemented functions.

Some obfuscators perform more complex protection on the string literal volume and convert the string to the form of f (x). function f is a decryption function and parameter x is a ciphertext string. You can also generate an anonymous function and return a string. The function expression used in this method has the context-independent feature. The returned value is only related to the input of the function and the context of the current Code (such as the class member and the value obtained in the DOM) irrelevant. See the xor function in the following code snippet:

var xor = function(str, a, b) {

Return String. fromCharCode. apply (null, str. split (''). map (function (c, I) {var ascii = c. charCodeAt (0); return ascii ^ (I % 2? A: B );}));};

How can we determine whether a function has such a feature? First, some library functions can be identified, such as btoa, escape, String. fromCharCode. If the input value is a constant, the return value is fixed. Create a built-in function whitelist and traverse the AST of the function expression. If the parameters involved in calculation of this function are not from the external context, all of the CallExpression callee values are in the function whitelist, so we can use recursion to determine whether a function meets the conditions.

Some obfuscators will create a large number of reference instances for variables, that is, multiple aliases are used for the same object, which are very readable. You can send a token tool to analyze the data stream of the variable identifier and replace it with the correct value. There is also the use of mathematical equations for confusion. If a variable a is declared, if a is a Number, the constant value of expression a-a and a * 0 is 0. However, if expression a satisfies isNaN (a), the expression returns NaN. To clean up such code, you also need to use the data stream analysis method.

Currently, we have not seen any js obfuscation samples implemented by using flat process redirection. I think this may be related to the use scenarios and features of the js language. Generally, JavaScript code generation is business-oriented and does not involve complicated process control or algorithms. obfuscation results may not be satisfactory.

0x03 conclusion

Javascript is indeed a magic language, and you may often encounter some surprising tricks and tricks. It is also interesting to decrypt protected code. It is said that several major technology giants are planning to design a general bytecode standard-WebAssembly for browser applications. Once this idea is realized, code protection can be introduced into the real sense of "Shelling" or virtual machine protection, and the confrontation technology will be upgraded to a new level.

Demo project code hosted on GitHub: https://github.com/ChiChou/etacsufbo

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.