Seventh Chapter,
epub
file Processing
--
parsing
. XHTML
document (i)
This chapter describes how code uses the Zltextplainmodel class to process text information and label information separately in. xhtml files.
The core classes involved in this chapter are the Zltextplainmodel class, thezltextwritableplainmodel class, thecachedcharstorage class, xhtmltagaction Interface implementation class
. XHTML The file contains two kinds of information: text information and tag information. We need to correctly parse out the structure of the tag information, in order to correctly display the text information on the screen.
As an example: (This example is the text in three body 1 )
We need to let the program know that there are four types of labels (H1 , H2 ,b ,P ), Each type of label represents a different format. The program must correctly display the format of the different labels in order for the user to see normal text messages.
Before we formally begin to introduce the process of handling textual information and label information in. xhtml files, it is necessary to introduce the three core classes involved in the following process:Zltextwritableplainmodel class,cachedcharstorage class,xhtmltagaction Interface implementation class
Zltextwritableplainmodel
class:
The Zltextwritableplainmodel class is a subclass of the Zltextplainmodel class, and there are three int arrays in this class with a Cachedcharstorage class.
the Mystartentryindices property points to an int array that records each paragraph specifically in the cachedcharstorage Which char array inside the class;
the Mystartentryoffsets property points to an int array that records each paragraph from within the cachedcharstorage class which position of the char array begins;
the Myparagraphlengths property points to an int array that records each paragraph inside the cachedcharstorage class Char How much length is occupied in the array;
Finally, themystorage Property points to the char inside the Cachedcharstorage class The array is where the text information and the label information are actually stored
PS: A group of P tags in the fbreader program represents a paragraph (Paragraph).
Cachedcharstorage
class:
There are two important attributes in this class: themyArray property,themyblocksize Property
The MyArray property points to a ArrayListthat consists of a char array (char Arrays are set to soft reference weakreference, which guarantees that the virtual opportunity reclaims these char arrays without consuming too much memory. The elements inside these char arrays represent this . XHTML text information and tag information.
The Myblocksize property points to an int. the length of the char array is no longer than this length (65536), and once the length is exceeded, the code creates a new char array. At the same time, the array is persisted for later reuse.
xhtmltagaction
interface Implementation class:
There are many tags in the epub file, different tags represent different structures, so FBReader also provides different processing classes for different labels. These processing classes are the implementation classes of the Xhtmltagaction interface.
Labels are usually paired, and the two methods in the Xhtmltagaction interface correspond to each other.
Specifically which classes correspond to which tags are defined by the filltagtable method in the Xhtmlreader class .
After introducing the three core classes, we can formally begin to introduce the process of handling textual information and label information in. xhtml files.
Let's take the process of handling a label pair (containing the start tag and closing tag) as an example. The doIt of the zlxmlparser class in the process of using a for Loop to iterate over a label pair into a char array method is called on the following nodes Xhtmlreader class to manipulate
"<" to the right of the start tag:
Records The offset of the char array, calling the zlxmlreader interface characterdatahandlerfinal Method ( Xhtmlreader class does not implement this method, it can be ignored)
">" to the right of the start tag:
Record The offset of the char array, remove the contents of the two offset, and get the label name of the current label.
Zlxmlparser class, processstarttag method , xhtmlreader class Doatstart of xhtmltagparagraphwithcontrolaction class, Startelementhandler method Method
Close the "<" to the left of the label:
Record the offset of the char Array, remove the contents of the two offset, and get the text information in the label
Xhtmlreader class, characterdatahandler method , bookreader class AddData Method
Store the text information in the label to the mytextbuffer Property of the bookreader class
Close the ">" to the right of the label:
Zlxmlparser class, processendtag method , xhtmlreader class Endelementhandler Method , tag name corresponds to Xhtmltagaction Interface Implementation class Doatend Method
The mytextbuffer Property in the Bookreader class
Let's take a piece of the text in "three body 1" as an example to describe the process in detail:
H1
Label Processing Flow
to the right of the start tag ,
<
":
Record The offset of a char array
to the right of the start tag ,
>
":
Record The offset of the char array, remove the contents of the two offset, and get the label name of the current label.
Zlxmlparser class, processstarttag method , xhtmlreader class Startelementhandler method , xhtmltagparagraphwithcontrolaction class Doatstart Method
Doatstart Method
The Doatstart method calls the pushkind method of the bookreader class with two methods and beginparagraph Method
Bookreader Class of Pushkind Method:
This onemethod isMykindstackproperty to add thefbtextkind.h1( to), and in factMykindstackproperty already has theFbtextkind.regular(0), this property is in theOebbookreaderClass ofReadbookmethod is set in the.
Bookreader Class of Beginparagraph Method
This method calls the Zltextwritableplainmodel Class of Createparagraph method, and then use the for Loop Iteration Mykindstack property and calls the Zltextwritableplainmodel Class of AddControl Method
The Createparagraph method updates the Three properties in the Zltextwritableplainmodel class and later relies on these three properties to quickly position a paragraph in a char array of the Cachedcharstorage class
The AddControl method adds two constants that can represent a label to the char array in the Cachedcharstorage class
PS: Each callAddControlmethod will joinZLTextParagraph.Entry.CONTROL(3This constant , this constant is an indicator. Similar markings and constants.ZLTextParagraph.Entry.TEXT(1), we'll use both of these variables in the next chapter. Please refer to the contents of chapter tenth for details of how these two indicators work.
end tag to the left of the "
<
":
Record the offset of the char Array, remove the contents of the two offset, and get the text information in the label
The Characterdatahandler method of the Xhtmlreader class , storing the text information in the label to Mytextbuffer Properties
end label to the right of the "
>
":
Zlxmlparser class, processendtag method , xhtmlreader class doatend of xhtmltagparagraphwithcontrolaction class, Endelementhandler method Method
The Doatend method invokes the addText method of the Zltextwritableplainmodel class, Add three kinds of information to the Char array in the Cachedcharstorage class:
1, constant ZLTextParagraph.Entry.TEXT(1), which is a kind of indicator, similar constant ZLTextParagraph.Entry.CONTROL(3)
2.length of text information between tags
3, the actual text information between the tags
H2 tags,P tags and H1 tags are basically the same, the only difference is that the ">" Trigger on the right of the start tag AddControl different constants are added to the method, and this variable is actually Fbtextkind defined in the interface.
H2 tags will be added to fbtextkind.regular(0) and fbtextkind.h2 ( + ), P the label will only be added Fbtextkind.regular ( 0 ). Each time the constants are added, the constant ZLTextParagraph.Entry.TEXT(1) is added as an indicator.
The B tag is different from the other three tags, this tag will trigger two times the AddControl method, only two times the parameter is different.
Cachedcharstorage
New in class
Char
Array
Here we need to add the process of adding a char array to the cachedcharstorage class , and we are introducing Cachedcharstorage class, I used to say: "the myblocksize attribute in the Bookmodel class points to an int . the length of the char array is no longer than this int(65536), and once this length is exceeded, The code creates a new char array, and the array is persisted for later reuse. "
The work of adding a char array is done by Cachedcharstorage's createnewblock method
The work of persisting the old char array is done in the freezelastblock method of the Cachedcharstorage class .
The books/ on the SD card . FBReader This folder, we can find these persisted files.
This position is obtained using the Paths class in the bookmodel build function .
In fact, we can try to change the method of persisting char array to UTF8 encoding, and then change the resulting file suffix to. txt .
Open this txt file and we can see data like this
In contrast to the original XML file, those strange symbols represent the label information
OK, so far, we have stored the text information and label information in the . xhtml file in zltextplainmodel . In order for the program to eventually display text information in the correct format, you need to match the contents of the eighth chapter (locating the specified paragraph) and the Nineth chapter (displaying the . html file) to let the user see the text in the correct format.
Off Topic
Finally, insert a few digression, write some of your own thinking:
FBReaderUseChararray form to store theXMLfile content and structure information, and then rely on recording each paragraph in theCharof a specific position in the array.intarray to quickly get the specified paragraph in theCharthe part of the array. Select Array isOK, in the array to locate a part of the speed is relatively fast, but at the same time, also because the use of the data structure of the array, the program is relatively more memory-intensive. In general, e-books are taken from a whole piece of data in order to take part of the data displayed on the screen, this business needs in fact, the data structure of the tree is also very suitable. And when it comes to tree data structures, actuallyAndroidThere's just one ready-made phone.SQLitethe database can provide the data structure of this tree. We can imagine that if you useSQLitedatabase to store and retrieveXMLThe contents of the file and the structure of the information, compared to the array will have what benefits. I think there are at least three benefits: first, save the memory of the phone,SQLiteafter the database is indexed, you do not need to load the data into memory as an array to retrieve it. In this way, the program saves memory, and facilitates cross-platform development .SQLiteSupportLinux,IOSas wellHTML5, if you useSQLiteas storage and retrieval.XMLfile by the way, thenAndroidProgrammers,IOSProgrammers andPCfront-end programmers only need to follow the agreedSQLThe statement can be developed without having to develop three languages independently, orCorC + +In addition, the development of a bottom-level library; third, convenient and server-side docking, when the client needs to collect and analyze the user's reading records, if the client and the service side use the same or similarSQLstructure, then the difficulty of docking will be reduced a lot.
Seventh chapter, epub File Processing--parsing. xhtml file (i)