UMD File Format

Source: Internet
Author: User
UMD files can be in two formats: Text Format and cartoon format. Considering the main requirement and text format, this article mainly explains them based on this format.
The first is the file header. Most files rely on the file header to differentiate the file format. UMD is no exception. UMD's file header is 0xde9a9b89, and the first four digits should be 0x89, respectively, 0x9b, 0x9a, 0xde. This is understandable. The following is similar.
The second byte is 0x23, that is, the character '#'. This character is used in UMD as a separator for function blocks.
The next four bytes are fixed bytes 0 x, 0 x, and 0x08. The specific significance is unknown.
Then there are 10th bytes. The 10th bytes determine whether UMD is a text or a cartoon, 0x01 indicates a text, and 0x02 indicates a cartoon.
Then there is a random number of 2 bytes.
After the random number is complete, some basic attributes of UMD are displayed, such as the title and author. These attributes are generated in the same format:
Separator '#': 1 byte
Attribute category: 2 bytes, including the following categories (title -- 0x02, author -- 0x03, year -- 0x04, month -- 0x05, day -- 0x06, novel type -- 0x07, publisher -- 0x08, retailer -- 0x09
0x00: 1 byte
Content Length: 1 byte. The length is also very special. Instead of directly taking the number of characters, the number of characters * 2 + 5, such as the string "hello ", the length is 5*2 + 5 = 15
Content: n Bytes. The content to be written is unicode encoded. Now, I want to see why the above length is multiplied by 2.
Okay. After the basic attributes are completed, the body content is started.
The delimiter '#' is followed by two bytes 0x0b, indicating that the next part is used to record the content length, the separator '#' is followed by the data type. At the end of this article, we will summarize the currently known data types ). The following two bytes are 0x0900, followed by the Content Length of four bytes (the content length is the sum of the body length of each chapter ).
After the length is complete, the Unit offset is reached, with the separator '#'. The data type is 2 bytes, 0x83, and then 2 bytes, 0x0901, next, there will be a random number of 4 bytes. The next byte is 0x24, and then the random number of 4 bytes. This random number must be the same as the previous random number, otherwise, the generated UMD cannot pass the verification in some Resolvers. The next four bytes are related to the number of chapters. The Byte content is (number of chapters * 4) + 9. If there are two chapters in the novel, it is 17, and then the offset of each chapter, each occupies 4 bytes. The offset in Chapter 1 is of course 0, and the offset in Chapter 2 is the body length in Chapter 1*2 (because it is Unicode encoding ). And so on.
Well, the next data type is the chapter title, or '#' followed by two bytes of data type 0x84, two bytes 0x0901, 4-byte random number, 0x24, 4-byte random number (Consistent), followed by 4 bytes related to the length of the title of the novel. The Byte content is (Chapter 1 title length * 2 + 1) + (Chapter 2 Title length * 2 + 1) +... + 9; then, write the content of each chapter title in the following format: first, 1 byte, the title length of the content section * 2, and then the title length of the next section * 2 bytes, the content is the Unicode encoding of the chapter length.
After the title is written, you must begin to write the body. Note: The text is not written by chapter. Instead, the text of all chapters is first written into a long string and then written into n data blocks, in addition, zlib is used to compress each data block. We recommend that you divide each data block by a fixed size. In addition, UMD seems to have a limit on the maximum size of data blocks. If this limit is exceeded, there will be problems during parsing, we recommend that the data block size be 32768 bytes. The following describes how to write data blocks. The first byte is 0x24 (note, not '#'), then 4 bytes of random number, and then 4 bytes, the content is compressed with a data block length of more than 9, and then the compressed data block is written into a data block. After writing each data block, you can choose one or two of the following two things (we recommend using a random number ):
1. Write 1 byte '#', 2 bytes 0xf1, 2 bytes 0x1500, 16 bytes 0x0
2. Write 1 byte '#', 2 bytes 0x0a, 2 bytes 0x0900, 4 bytes Random Number
After writing all body data blocks, write 1 byte '#' and 2 byte data types 0x81, indicating that the body is written completely. 2 bytes 0x0901, 4-byte random number, 1 byte 0x24, 4-byte random number (Consistent), next 4 bytes, the content is the number of data blocks * 4 + 9, then, do you still remember that a random number of four bytes is generated before each data block is written? From the last one, the random number is written in reverse order, such as the random number. Each four bytes ends the writing of the body.
Next is the cover, where one byte '#' and two byte data types are 0x82, three byte 0x010a01, and four byte random numbers, 1 byte 0x24, 4 byte random number, 4 byte data is related to the number of cover bytes, the content is the number of cover bytes + 9, and then write the cover byte data, no compression is required.
By now, the main data has been written, but I found that some UMD generation tools still generate some data types of 0x87. This specific function is not very clear, it is probably to generate some data to notify the UMD parser of what pageoffset should be used to display the content. With my tests, this part can be parsed normally even if it is not written. If you are interested in this part, refer to the Code provided by me.
The end is: 1 byte '#', 2 bytes of data type 0x0c indicates the end of the file, 2 bytes of 0x0901, 4 bytes of file length + 4, finished.
Appendix, data type table:
0x01 -- file start
0x02 -- title
0x03 -- Author
0x04 -- year
0x05 -- month
0x06 -- day
0x07 -- Novel Type
0x08 -- publisher
0x09 -- retailer
0x0b -- Content Length
0x83 -- Chapter offset
0x84 -- Chapter title, body
0x81 -- the body has been written
0x82 -- cover
0x87 -- pageoffset
0x0c -- end of File

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.