Android reverse trip --- parse the compiled AndroidManifest File Format

Source: Internet
Author: User

Android reverse trip --- parse the compiled AndroidManifest File Format
I. Preface

Today is Saturday again. I have nothing to worry about. I have to write an article. Today we will continue to look at reverse information. Today we will introduce the AndroidManifest file format in Android, some may be curious. What is the format of AndroidManifest? It's not about how tags and attributes are used? That's certainly not. It's boring to introduce the knowledge, and it's irrelevant to our reverse version. Today we want to introduce the format of the AndroidManifest file compiled in Android, first, I want to fill in a knowledge point. The Apk program in Android is actually a compressed package. We can decompress it using the compressed software:


Ii. Technical Introduction

We can see that there are three files which we will explain in detail later:AndroidManifest. xml, classes. dex, resources. arsc

In fact, as long as you decompile the apk, you will know a tool.ApktoolIn fact, his working principle is to parse the three file formats, because Android has its own format after compiling the file into apk, And it is garbled when it is opened in the common text format, so we need to parse them into what we can understand, so from this article, we will introduce the format parsing of these three files one after another, so that when we decompile the apk later, when an error occurs, you can locate the problem accurately.

Today, let's take a look at the AndroidManifest. xml format:


If we show that all the content here is in hexadecimal format, we need to parse it, just like I used to parse the so file:

Any file must have its own format. Since the file is compiled into an apk, google defines a file format for AndroidManifest, we only need to know this format to parse the file in detail:


Are you very excited to see this picture? This is also a god chart, which details AndroidManifest. the format of the xml file, but we can't see what this image looks like, so we need to use a case to parse a file so that we can understand it thoroughly, but this graph is the foundation, the following is a case study:

The case is everywhere. Whoever creates a simple apk, open it with a compressed file, decompress AndroidManifest. xml, and then start to read and parse the content:

Iii. Format Parsing 1. header information

In any file format, there will be header information, and header information is also very important. At the same time, the header is generally in a fixed format.


The header information includes the following fields:

1. File magic count: Four bytes

2. file size: Four bytes

Next we will start to parse all the Chunk content. In fact, each Chunk content has a similar point, that is, the header information:

ChunkType (four bytes) and ChunkSize (four bytes)

2. String Chunk content

This Chunk mainly stores all the string information in the AndroidManifest file.


1. ChunkType: StringChunk type, fixed four bytes: 0x001C0001

2. ChunkSize: size of the StringChunk, four bytes

3. StringCount: Number of strings in StringChunk, four bytes

4. StyleCount: Number of styles in StringChunk, four bytes. However, this value is always 0x00000000 during actual parsing.

5. Unknown: Location Area, four bytes. In the parsing process, four bytes must be skipped.

6. StringPoolOffset: The Offset Value of the string pool in four bytes. The offset value is the header position relative to the StringChunk.

7. StylePoolOffset: The Offset Value of the Style pool, four bytes. There is no Style here, so this field can be ignored.

8. StringOffsets: the offset value of each string, so its size should be: StringCount * 4 bytes

9. SytleOffsets: the offset value of each style, so its size should be SytleCount * 4 bytes

The string content and style content will start later.


Next we will start to look at the code. Because the code is a little long, we will explain it in segments here. The entire project of the code will be shown later,

1. First, we need to read the AndroidManifest. xml file into a byte array:

byte [] byteSrc = null;
FileInputStream fis = null;
ByteArrayOutputStream bos = null;
try {
fis = new FileInputStream ("xmltest / AndroidManifest1.xml");
bos = new ByteArrayOutputStream ();
byte [] buffer = new byte [1024];
int len = 0;
while ((len = fis.read (buffer))! = -1) {
bos.write (buffer, 0, len);
}
byteSrc = bos.toByteArray ();
} catch (Exception e) {
System.out.println ("parse xml error:" + e.toString ());
} finally {
try {
fis.close ();
bos.close ();
} catch (Exception e) {

}
}
2. Let's take a look at parsing header information:

/ **
 * Parse XML header information
 * @param byteSrc
 * /
public static void parseXmlHeader (byte [] byteSrc) {
byte [] xmlMagic = Utils.copyByte (byteSrc, 0, 4);
System.out.println ("magic number:" + Utils.bytesToHexString (xmlMagic));
byte [] xmlSize = Utils.copyByte (byteSrc, 4, 4);
System.out.println ("xml size:" + Utils.bytesToHexString (xmlSize));

xmlSb.append ("");
xmlSb.append ("\ n");
}
There is nothing to say here, just parse it in the format we said above
3. Parse StringChunk information

/ **
 * Parse StringChunk
 * @param byteSrc
 * /
public static void parseStringChunk (byte [] byteSrc) {
// Mark of String Chunk
byte [] chunkTagByte = Utils.copyByte (byteSrc, stringChunkOffset, 4);
System.out.println ("string chunktag:" + Utils.bytesToHexString (chunkTagByte));
// String Size
byte [] chunkSizeByte = Utils.copyByte (byteSrc, 12, 4);
//System.out.println(Utils.bytesToHexString(chunkSizeByte));
int chunkSize = Utils.byte2int (chunkSizeByte);
System.out.println ("chunk size:" + chunkSize);
// String Count
byte [] chunkStringCountByte = Utils.copyByte (byteSrc, 16, 4);
int chunkStringCount = Utils.byte2int (chunkStringCountByte);
System.out.println ("count:" + chunkStringCount);

stringContentList = new ArrayList (chunkStringCount);

// It should be noted here that the next four bytes are the contents of Style, and then the next four bytes are always 0, so we need to filter these 8 bytes
// String Offset relative to the starting position of String Chunk 0x00000008
byte [] chunkStringOffsetByte = Utils.copyByte (byteSrc, 28, 4);

int stringContentStart = 8 + Utils.byte2int (chunkStringOffsetByte);
System.out.println ("start:" + stringContentStart);

// String Content
byte [] chunkStringContentByte = Utils.copyByte (byteSrc, stringContentStart, chunkSize);

/ **
* There is a problem when parsing the string, which is the encoding: UTF-8 and UTF-16, if it is UTF-8, it ends with 00, if it is UTF-16, it ends with 00 00
* /

/ **
* The code here is used to parse the AndroidManifest.xml file
* /
// The format here is: the first two bytes of the offset value are the length of the string, followed by the content of the string, followed by the end of the two strings 00
byte [] firstStringSizeByte = Utils.copyByte (chunkStringContentByte, 0, 2);
// One character corresponds to two bytes
int firstStringSize = Utils.byte2Short (firstStringSizeByte) * 2;
System.out.println ("size:" + firstStringSize);
byte [] firstStringContentByte = Utils.copyByte (chunkStringContentByte, 2, firstStringSize + 2);
String firstStringContent = new String (firstStringContentByte);
stringContentList.add (Utils.filterStringNull (firstStringContent));
System.out.println ("first string:" + Utils.filterStringNull (firstStringContent));

// Place all strings in ArrayList
int endStringIndex = 2 + firstStringSize + 2;
while (stringContentList.size () <chunkStringCount) {
// One character corresponds to two bytes, so multiply by two
int stringSize = Utils.byte2Short (Utils.copyByte (chunkStringContentByte, endStringIndex, 2)) * 2;
String str = new String (Utils.copyByte (chunkStringContentByte, endStringIndex + 2, stringSize + 2));
System.out.println ("str:" + Utils.filterStringNull (str));
stringContentList.add (Utils.filterStringNull (str));
endStringIndex + = (2 + stringSize + 2);
}

/ **
* The code here is used to parse the resource file xml
* /
/ * int stringStart = 0;
int index = 0;
while (index <chunkStringCount) {
byte [] stringSizeByte = Utils.copyByte (chunkStringContentByte, stringStart, 2);
int stringSize = (stringSizeByte [1] & 0x7F);
System.out.println ("string size:" + Utils.bytesToHexString (Utils.int2Byte (stringSize)));
if (stringSize! = 0) {
// Note here is UTF-8 encoded
String val = "";
try {
val = new String (Utils.copyByte (chunkStringContentByte, stringStart + 2, stringSize), "utf-8");
} catch (Exception e) {
System.out.println ("string encode error:" + e.toString ());
}
stringContentList.add (val);
} else {
stringContentList.add ("");
}
stringStart + = (stringSize + 3);
index ++;
}

for (String str: stringContentList) {
System.out.println ("str:" + str);
} * /

resourceChunkOffset = stringChunkOffset + Utils.byte2int (chunkSizeByte);

}
Here we need to explain a few points:
 

1. In the above format description, we need to note that there is an Unknow field, four bytes, so we need to skip

2. When parsing the string content, the end character of the string content is: 0x0000

3. The first two bytes at the beginning of each string are the length of the string

So we have the offset value and size of each string, then parsing the string content is simple:

Here we see that 0x000B (the high and low bits are opposite) is the size of the string, and the end is 0x0000


A character corresponds to two bytes, and there is a method here: Utils.filterStringNull (firstStringContent):

 

public static String filterStringNull (String str) {
if (str == null || str.length () == 0) {
return str;
}
byte [] strByte = str.getBytes ();
ArrayList newByte = new ArrayList ();
The logic of for (int i = 0; i is actually very simple, which is to filter the empty string: NULL in C language, 00 in Java. If it is not filtered, the following situation will occur:
 


Each character is a wide character, which is difficult to read. In fact, it is willing to add a 00 behind each character, so after filtering it is fine

This is much better.

Above we have parsed all the string content in AndroidManifest.xml. Here we need to use a global character list to store the values of these strings, and we will use the index to get the values of these strings later.

 

Third, parse ResourceIdChunk
This chunk mainly stores the resource Id corresponding to the system attribute value used in AndroidManifest, such as the versionCode attribute in android: versionCode, android is the prefix, which will be described later

1. ChunkType: Type of ResourceIdChunk, fixed four bytes: 0x00080108

2. ChunkSize: the size of ResourceChunk, four bytes

3.ResourceIds: the content of ResourceId, here the size is ResourceChunk size divided by 4, minus the size of the header 8 bytes (ChunkType and ChunkSize)

 

/ **
 * Parsing Resource Chunk
 * @param byteSrc
 * /
public static void parseResourceChunk (byte [] byteSrc) {
byte [] chunkTagByte = Utils.copyByte (byteSrc, resourceChunkOffset, 4);
System.out.println (Utils.bytesToHexString (chunkTagByte));
byte [] chunkSizeByte = Utils.copyByte (byteSrc, resourceChunkOffset + 4, 4);
int chunkSize = Utils.byte2int (chunkSizeByte);
System.out.println ("chunk size:" + chunkSize);
// It should be noted here that chunkSize contains two bytes of chunkTag and chunkSize, so it needs to be eliminated
byte [] resourceIdByte = Utils.copyByte (byteSrc, resourceChunkOffset + 8, chunkSize-8);
ArrayList resourceIdList = new ArrayList (resourceIdByte.length / 4);
for (int i = 0; i analysis result:
 


What do we see in the parsed id here?

Here is a brain to make up a point of knowledge:

When we write Android programs, we will find that there is an R file, which contains the Id corresponding to each resource. How do you get these id values?

Package ID is equivalent to a namespace, limiting the source of resources. The Android system currently defines two resource command spaces, one of which is the system resource command space, whose Package ID is equal to 0x01, and the other is the application resource command space, whose Package ID is equal to 0x7f. All Package IDs located between [0x01, 0x7f] are legal, and those outside this range are illegal Package IDs. The Package ID of the aforementioned system resource package package-export.apk is equal to 0x01, and the value of the Package ID of the resources we define in the application is equal to 0x7f, which can be verified by the generated R.java file.
Type ID refers to the type ID of the resource. The types of resources are animator, anim, color, drawable, layout, menu, raw, string, xml, etc., each of which will be given an ID.
Entry ID refers to the order in which each resource appears in the resource type to which it belongs. Note that the Entry ID of different types of resources may be the same, but due to their different types, we can still distinguish them by their resource ID.
For more descriptions of resource IDs and resource references, you can refer to the README file in the frameworks / base / libs / utils directory

We can know where the xml file corresponding to the id of the system resource is: frameworks \ base \ core \ res \ res \ values \ public.xml

Then we use the id parsed above to query the public.xml file:

I found out that it is versionCode, which is still very important for this system resource id storage file public.xml, which will continue to be used later when explaining the resource.arsc file format.

Fourth, parse StartNamespaceChunk
This chunk mainly contains the content of the command space in an AndroidManifest file. The XML in Android is in Schema format, so there must be Prefix and Uri.

Here is a knowledge point in the brain: there are two xml formats: DTD and Schema, students who do not understand can read this article

1. ChunkType: the type of Chunk, fixed four bytes: 0x00100100

2. ChunkSize: the size of Chunk, four bytes

3. LineNumber: line number in the AndroidManifest file, four bytes

4. Unknown: Unknown area, four bytes

5. Prefix: the prefix of the namespace (index value in the string), for example: android

6. Uri: uri of the namespace (index value in the string): for example: http://schemas.android.com/apk/res/android

Parsing code:

/ **
 * Parsing StartNamespace Chunk
 * @param byteSrc
 * /
public static void parseStartNamespaceChunk (byte [] byteSrc) {
// Get ChunkTag
byte [] chunkTagByte = Utils.copyByte (byteSrc, 0, 4);
System.out.println (Utils.bytesToHexString (chunkTagByte));
// Get ChunkSize
byte [] chunkSizeByte = Utils.copyByte (byteSrc, 4, 4);
int chunkSize = Utils.byte2int (chunkSizeByte);
System.out.println ("chunk size:" + chunkSize);

// parse line number
byte [] lineNumberByte = Utils.copyByte (byteSrc, 8, 4);
int lineNumber = Utils.byte2int (lineNumberByte);
System.out.println ("line number:" + lineNumber);

// Analyze the prefix (it should be noted here that the four bytes after the line number are FFFF, filtered)
byte [] prefixByte = Utils.copyByte (byteSrc, 16, 4);
int prefixIndex = Utils.byte2int (prefixByte);
String prefix = stringContentList.get (prefixIndex);
System.out.println ("prefix:" + prefixIndex);
System.out.println ("prefix str:" + prefix);

// Analyze Uri
byte [] uriByte = Utils.copyByte (byteSrc, 20, 4);
int uriIndex = Utils.byte2int (uriByte);
String uri = stringContentList.get (uriIndex);
System.out.println ("uri:" + uriIndex);
System.out.println ("uri str:" + uri);

uriPrefixMap.put (uri, prefix);
prefixUriMap.put (prefix, uri);
}

The results of the analysis are as follows:




The content here is the corresponding string index value after we parsed the String above. Here we need to note that there may be multiple namespaces in an xml, so here we use Map to store the relationship between Prefix and Uri It will be used when parsing the content of the node.
Fifth, StratTagChunk
This Chunk mainly stores the tag information in AndroidManifest.xml, which is also the core content, and of course the most complex content.



1. ChunkType: the type of Chunk, fixed four bytes: 0x00100102


2. ChunkSize: the size of Chunk, fixed at four bytes


3. LineNumber: corresponds to the line number in AndroidManifest, four bytes


4. Unknown: unknown field, four bytes


5. NamespaceUri: The Uri of the namespace used by this tag, for example, if the prefix android is used, then you need to use the Uri of http://schemas.android.com/apk/res/android to get it.


6. Name: label name (index value in the string), four bytes


7. Flags: the type of the tag, four bytes, such as the start tag or end tag, etc.


8. AttributeCount: the number of attributes contained in the tag, four bytes


9. ClassAtrribute: the class attribute contained in the tag, four bytes


10. Attrributes: attribute content, each attribute is regarded as an Entry, the fixed size of this Entry is a byte array of size 5:


[Namespace, Uri, Name, ValueString, Data], we need to pay attention to the fourth value when parsing, we need to do one processing: we need to shift 24 bits to the right. So the size of this field is: number of attributes * 5 * 4 bytes



Parsing code:


 


/ **
 * Parsing StartTag Chunk
 * @param byteSrc
 * /
public static void parseStartTagChunk (byte [] byteSrc) {
// parse ChunkTag
byte [] chunkTagByte = Utils.copyByte (byteSrc, 0, 4);
System.out.println (Utils.bytesToHexString (chunkTagByte));

// parse ChunkSize
byte [] chunkSizeByte = Utils.copyByte (byteSrc, 4, 4);
int chunkSize = Utils.byte2int (chunkSizeByte);
System.out.println ("chunk size:" + chunkSize);

// parse line number
byte [] lineNumberByte = Utils.copyByte (byteSrc, 8, 4);
int lineNumber = Utils.byte2int (lineNumberByte);
System.out.println ("line number:" + lineNumber);

// parse prefix
byte [] prefixByte = Utils.copyByte (byteSrc, 8, 4);
int prefixIndex = Utils.byte2int (prefixByte);
// This may return -1, if it returns -1, it means there is no prefix
if (prefixIndex! = -1 && prefixIndex attrList = new ArrayList (attrCount);
for (int i = 0; i> 24);
attrData.type = value;
break;
case 4:
attrData.data = value;
break;
}
values [j] = value;
}
attrList.add (attrData);
}

for (int i = 0; i code is a bit long, let's analyze:
Resolution attribute:

// parse attribute
// It should be noted here that each attribute unit is composed of five elements, each element takes up four bytes: namespaceuri, name, valuestring, type, data
// When getting the type value, you need to shift right 24 bits
ArrayList attrList = new ArrayList (attrCount);
for (int i = 0; i> 24);
attrData.type = value;
break;
case 4:
attrData.data = value;
break;
}
values [j] = value;
}
attrList.add (attrData);
}

When you see the fourth value, additional processing is required, that is, you need to shift right by 24 bits.
After parsing the attributes, you can get the name and attribute name and attribute value of a label:






See the results of the analysis:



 


The attributes contained in the tag manifest:



Here are a few questions that need to be explained:


1. Why are we seeing three attributes, but the result of parsing and printing is 5?


Because the system will add two properties when compiling the apk: platformBuildVersionCode and platformBuildVersionName


This is the version number and version name released to the device



This is the result after parsing


2. When there is no prefix like android, NamespaceUri is null



3. When the dataType is different, the corresponding data value also has different meanings:



This method is used for escaping, and it will be used later when parsing resource.arsc.


4. Each attribute will theoretically contain a NamespaceUri, which also determines the prefix of the attribute Prefix, the default is android, but sometimes when we will customize a control, you need to import NamespaceUri and Prefix. So there may be multiple Namespaces in an xml, each attribute will contain NamespaceUri.


In fact, even though we have parsed most of the work here, as for EndTagChunk, it is very similar to StartTagChunk, and it will not be explained in detail here:


/ **
 * Parsing EndTag Chunk
 * @param byteSrc
 * /
public static void parseEndTagChunk (byte [] byteSrc) {
byte [] chunkTagByte = Utils.copyByte (byteSrc, 0, 4);
System.out.println (Utils.bytesToHexString (chunkTagByte));
byte [] chunkSizeByte = Utils.copyByte (byteSrc, 4, 4);
int chunkSize = Utils.byte2int (chunkSizeByte);
System.out.println ("chunk size:" + chunkSize);

// parse line number
byte [] lineNumberByte = Utils.copyByte (byteSrc, 8, 4);
int lineNumber = Utils.byte2int (lineNumberByte);
System.out.println ("line number:" + lineNumber);

// parse prefix
byte [] prefixByte = Utils.copyByte (byteSrc, 8, 4);
int prefixIndex = Utils.byte2int (prefixByte);
// This may return -1, if it returns -1, it means there is no prefix
if (prefixIndex! = -1 && prefixIndex
 

But when we parse, we need to do a loop operation:

Because we know that there are many ways to parse Xml in Android, but here we are not using any one way, but written in pure code, so we use a loop to traverse the parse Tag, in fact this way is similar For SAX parsing XML, the Flag field mentioned above is very useful.

Here we also did a job is to format the parsed xml:

It is not difficult, and I will not continue to explain it here. One thing that needs to be optimized here is that you can use the LineNumber property to accurately format the number of lines. However, this workload is a bit heavy, I do n’t want to do it here. Students can consider the result after formatting:

Handsome is not handsome, he parsed out the contents of the previous hexadecimal, and the sense of achievement is bursting ~~

There is a problem here, that is, we see that there are a lot of things like @ 7F070001. This is actually a resource Id. This requires that we parse the resource.arsc file later, and then we can correspond to this resource. Come here. Just know here.

In fact, there is still a problem here, that we found that this can parse the AndroidManifest file, so it can also parse other xml files:

Wipe, we found that when parsing other xml, we found that an error was reported. The location code found that the error was reported in the parsing of StringChunk. Let's modify it:

Because the string format in other xml is different from that in AndroidManifest.xml, it needs to be parsed separately:

It can be done after modification.

4. Technology Development
When decompiling, sometimes we just want to decompile the AndroidManifest content, so the ApkTool tool is a bit cumbersome, but there is a great god on the Internet who has written this tool AXMLPrinter.jar, this tool is very useful: java -jar AXMLPrinter.java xxx.xml> demo.xml

From the project structure, we can find that he uses the Pull parse xml that comes with Android. The main function is:

V. Why write this article
So now we can not use this tool, because we also wrote a tool analysis, is it very hanged? So is this article just for parsing AndroidManifest? Definitely not, writing this article is actually for another purpose, to prepare for the later decompilation of the apk, in fact, many students have found that when using apktool to decompile the apk, it often reports some abnormal information. Those who are hardened are used to combat the apktool tool. They specifically find the vulnerability of the apktool and then harden it to achieve the effect of decompilation failure, so we need to understand the source code and analysis principle of apktool so that we can encounter decompilation. When the error is failed, you can locate the problem. You can fix the apktool tool. The principle of apktool tool analysis is actually very simple. It is to parse AndroidManifest.xml, and then parse resource.arsc to public.xml (this file is generally Stored under the values folder after compilation is the Id list corresponding to the entire decompiled project), followed by classes.dex. There are other layouts, resource xml, etc. So for these problems, we will explain this article: the problem of parsing XML files. We will continue to explain how to parse the format of resource.arsc and classes.dex files later. Of course, I will introduce an article on how to achieve the reinforcement effect by modifying the contents of the AndroidManifest file, and how we can repair it to crack this reinforcement.

6. Summary
Even if this article is over here, I'm a little tired of writing. The parsing code is already there. Some students who don't understand can contact me, join the public account, leave a message, I will reply in due time, thank you, and remember Follow the next two articles that parse the resource.arsc and classes.dex file formats. Thanks ~~

PS: Follow, real-time push of the latest Android technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.