First, e-book Introduction
Reprint Please specify http://www.cnblogs.com/xckk/p/6020324.html
Epub (Electronic Publication) is a fully open and free ebook standard. is a free open-label
It is a content that can be "automatically rearranged".
The reason for "automatic re-orchestration" is that XHTML is used internally to represent the contents of a file, and a series of CSS is used to define the format and layout, which separates the content from the choreography.
The Epubepub now includes three main specifications:
Open publishing structure (open Publication structure,ops) 2.0, to define the content of the layout;
Open Package Format (Open Packaging Format,opf) 2.0, defines an XML-based. epub file structure;
Open container Format (OEBPS Container FORMAT,OCF) 3.0 to collect all relevant files into a ZIP archive.
The purpose of the OPS specification is to provide a minimal general guidance to content providers (such as publishers, authors, etc.) and publishing tool providers to ensure that electronic content remains consistent across a variety of reading systems.
OPS uses XHTML to build the content of the book, using CSS to define the format and layout of the book.
The OPF specification defines the mechanism by which various OPS publishing components are connected and provides additional structure and language. The OPF specification is separate from the OPS specification and is designed to modularize the content description technology and the packaging description technology.
OCF defines a standard mechanism for packaging all electronic publishing components into a single file for dissemination, delivery, and archiving. is a packaging mechanism.
Related Specification Documents:
OPS2.0 Specification: http://www.idpf.org/epub/20/spec/OPS_2.0.1_draft.htm
OPF2.0 Specification: http://www.idpf.org/epub/20/spec/OPF_2.0.1_draft.htm
OCF Specification: http://www.idpf.org/epub/301/spec/epub-ocf.html
Ii. Introduction to the basic knowledge of epub
Open mode
Epub can be opened with Stanza's PC version, or it can be used in Firefox or Chrome.
Firefox epubreader extensions: Epubreader is a Firefox extension that allows you to read EPUB-formatted files. You do not need to install additional software, in the Firefox browser can read the epub format file, recommend this.
File parsing
An EPUB is a simple ZIP format file (with an. EPUB extension), which includes files arranged in a predefined manner. In addition, EPUB is very simple. Simply change the suffix to. zip or. rar, and unzip to see the contents of the file. Such as
A zip of an epub ebook roughly contains the following:
1, mimetype file, must be the first file of the compressed package. Note that the mimetype must be in a non-compressed format.
2, Meta-inf directory, contains at least one container.xml file.
3, oebps directory (can be another name, but suggested with this name), contains:
A) the image subdirectory (not necessarily always) holds all the picture files
b) Content.opf file name can be other, the extension must be Opf, is an XML format within the package file list
c) Toc.ncx directory file, a "logical directory", browse the control file.
d) Some XHTML or HTML files. Is the content of the book.
Directory and file structure for simple EPUB archives:
MimeType
meta-inf/
Container.xml
oebps/
Content.opf
Title.html
Content.html
Stylesheet.css
Toc.ncx
images/
Cover.png
A complete epub format ebook, the Opf file typically includes the. Opf and. ncx files in the oebps directory.
1. opf file
Includes four elements: metadata, manifest, spine, guide
(1) Metadata:epub metadata, such as title, language, identifier, cover, etc. Of these, both the title and identifier data are required. According to the EPUB specification, identifier is defined by the creator of the digital book and must be unique. For book publishers, this field typically includes the ISBN or Library of Congress numbers, or a URL or a randomly generated unique user ID. Note: The value of Unique-identifier must match the ID attribute of the dc:identifier element.
(2) Manifest: Lists all the files contained in the directory (XHTML, CSS, PNG, NCX, etc.). EPUB encourages the use of CSS to set the style of book content, so the manifest tag also contains CSS files. Note: All files that enter a digital book must be listed in manifest, manifest lists only the files, and the structure order between the files is not listed.
3) Spine: Linear reading order for all XHTML documents. Where the TOC property of the spine label must be included in the Manifest column. NCX ID. The OPF spine can be understood as the order of "pages" in the book, and the parsing is read spine from top to bottom in document order.
Each itemref element in spine needs to have a IDREF attribute that matches an ID in manifest.
2.. ncx file
NCX defines a catalog table for digital books. In a complex book, a table of contents typically uses hierarchies, including nested content, chapters, and sections. Contains the TOC (Tablet of content, which provides some information on fragmentation). The NCX
UID: The unique ID of a digital book. This element corresponds to the dc:identifier in the OPF file.
Depth: reflects the depth of the hierarchy in the table of contents.
Totalpagecount and Maxpagenumber: Only for paper books, keep 0.
The content of Doctitle/text is the title of the book, which matches the dc:title in the OPF. The following example program is a change to the title.
Navmap defines the catalog of books and is the most important part of the Ncx file. Navmap contains one or more navpoint elements, and each navpoint contains the following elements:
Playorder: Describes the reading order of the document. And the ITEMREF elements in the OPF spine are in the same order.
Navlabel/text: Gives the title of the chapter. It is usually the title or number of the chapter.
Content: its SRC attribute points to the physical resource that contains the content. Is the file declared in the OPF manifest.
You can also have one or more navpoint elements. Ncx using nested navigation points to represent a hierarchy of documents
3. What is the difference between the. Opf spine and the. ncx file?
The spine label describes the document order, and the. ncx file describes the directory.
Both are easy to confuse because two files describe the order and content of the document. The simplest way to illustrate the difference between the two is to use a printed book to make an analogy: the spine tag of the. Opf file describes how the chapters in the book are actually connected, for example, by flipping through the last page of chapter one to see the first page of Chapter two. Ncx described the catalogue at the beginning of the book, The table of contents will certainly contain the main chapters of the book, but it may also contain subsections that do not have separate pagination.
One rule is that NCX contains navpoint elements that are usually more than itemref elements in the Opf spine. In fact, all items in spine appear in. Ncx, but. Ncx may be more detailed.
Three, OCF format ebook Introduction
The e-book produced by Mi-goo is. OCF format, which is the same as most of the formats specified by EPUB, and expands it to address the richness of layouts, the openness of formats, and document security, Exchange, extensibility, and clipping. Suitable for mobile phones, handheld reading terminals and other equipment.
The OCF format book is formed after it has been packaged in internal library storage, chapter splitting and compression, encryption, and MEB. Meb as an extension of the ebook, you can be listed for the client to parse the read.
The directory structure of the OCF package:
Ocf│mimetype├─meta-Inf││book.ncx││book.opf││container.xml││cover.xml││right.xml│└─ext│cover180240.jpg│cover5168.png│cover60 0800.jpg│cover6080.jpg│cover75100.jpg│cover81108.png│cover90120.jpg└─oebps├─chapter01││chapter01.html│├─css││s tyle.css│└─images│├─ -││image0.jpg│├─176││image0.jpg│├─ -││image0.jpg│├─ the││image0.jpg│├─ the││image0.jpg│├─480││image0.jpg│└─orig│image0.jpg├─chapter02││chapter02.html│├─css││style.css│└─images│├─ -││image0.jpg││image1.jpg│├─176││image0.jpg││image1.jpg│├─ -││image0.jpg││image1.jpg│├─ the││image0.jpg│├─ the││image0.jpg│├─480││image0.jpg│└─orig│image0.jpg│image1.jpg
The file name and directory name are described below:
File name/directory name |
Describe |
Type |
Qualified |
MimeType |
MIME type description File |
ASCII file |
Must-Choose |
meta-inf/ |
Meta Information Directory |
Directory |
Must-Choose |
Container.xml |
Container description File |
XML file |
Must-Choose |
Book.opf |
Book metadata description file, such as book title, author information |
XML file |
Must-Choose |
Book.ncx |
Describe the directory structure information of a book |
XML file |
Must-Choose |
Cover.xml |
Cover content File |
XML file |
Must-Choose |
Right.xml |
Right description file |
XML file |
Options available |
ext/ |
MEB File extension Directory |
Directory |
Options available |
Coverwh.jpg Coverwh.png |
Cover picture |
Jpg/png file |
Options available |
oebps/ |
MEB Content Information Directory |
Directory |
Must-Choose |
chapterxx/ |
Chapter Catalogue |
Directory |
Must-Choose |
Chapterxx.html |
Page content page |
XHTML file |
/ |
images/ |
Picture Catalogue |
Directory |
Options available |
Orig |
Original Storage Directory |
Directory |
Options available |
128 |
Specific resolution picture Storage directory The current resolution is 128, 176, 240, 320, 360, 480, 600 (600 no separate directory, because 600 and 600 of the width of the picture, only need to exist in the orig) |
Directory |
Options available |
Image0.jpg |
Picture file Orig storage is the original, if the chapter refers to the picture, the directory must have a picture, 128-480 directory, if the original width is greater than the directory name value, there will be a file, the width of 300, then in orig, 128, 176, 240 will have the corresponding compressed picture file. |
Jpg/png |
Options available |
Iv. Java parsing epub format ebook Example
The article concludes with an example program that parses an epub ebook. The main function of the program is to read the modified. epub Format File
Related materials and source code are available for download in the link: Http://pan.baidu.com/s/1bnm8YXT
Including
1, Java Project Engineering Test_epub, which includes a jar package and an epub ebook mybook.epub
2. epub-related jar packages (Epublib-codr-lastest.jar, and Slf4j-*.jar)
3, e-book Mybook.epub
Java parsing. epub Format ebook, the specific implementation code is as follows. Wrote a simple HelloWorld program, added the corresponding jar package.
Program Description:
1, read the Epub/mybook.epub file.
2, modify the title in the metadata
3. Export the new. epub File to the project directory. File name Mynewbook.epub
By extracting the Mynewbook.epub file, you can see that the <dc:title> tag content in the <docTitle> and Content.opf in the Toc.ncx file has been modified.
Packagecom.hk;ImportJava.io.FileInputStream;Importjava.io.FileNotFoundException;ImportJava.io.FileOutputStream;Importjava.io.IOException;ImportJava.io.InputStream;ImportJava.io.OutputStream;Importjava.util.ArrayList;Importjava.util.List;ImportNl.siegmann.epublib.domain.Book;ImportNl.siegmann.epublib.epub.EpubReader;ImportNl.siegmann.epublib.epub.EpubWriter;/*** * epub format file read and Write sample program *@authorhk*@version[version number, November 26, 2015]*@see[Related class/Method]*@since[Product/module Version]*/ Public classtestepub{ Public Static voidMain (string[] args) {System.out.println ("Hello World");//get the ebook pathString Epubpath=Getebookpath ();//Read epub file Book Book=Readbook (epubpath);//Modify ebookModifybook (book); String OutputFileName= "Mynewbook.epub"; Writebook (book, outputfilename);}/*** Get e-book Path * *@return* @see[Class, Class # method, Class # member]*/Private StaticString Getebookpath () {//the resulting Currentpath = file:/e:/study/epub/test_epub/bin/, where e:/study/epub/test_epub is the project pathString Currentpath= Thread.CurrentThread (). GetClass (). GetResource ("/"). toString (); String Epubpath= Currentpath + "Epub/mybook.epub";//prefix in the path last year//file:/Epubpath= Epubpath.substring (6, Epubpath.length ()); Epubpath= Epubpath.replace ("/", "//"); System.out.println (Epubpath);returnEpubpath;}/*** Read epub file * *@return* @see[Class, Class # method, Class # member]*/Private StaticBook Readbook (String epubpath) {epubreader Epubreader=NewEpubreader (); Book Book=NULL;Try{InputStream Inputstr=NewFileInputStream (epubpath);=epubreader.readepub (INPUTSTR);}Catch(FileNotFoundException e) {e.printstacktrace ();}Catch(IOException e) {e.printstacktrace ();}returnBook ;}/*** Modified e-Book This section modifies the <dc:title> tag content in the <docTitle> and Content.opf in the Toc.ncx file. *@parambook*@see[Class, Class # method, Class # member]*/Private Static voidModifybook (book book) {//sets the title within the epub file.List<String> titleslist =NewArraylist<string>(); Titleslist.add ("Test Book"); Book.getmetadata (). Settitles (titleslist);}/*** Output ebook * *@parambook*@paramfilename*@see[Class, Class # method, Class # member]*/Private Static voidWritebook (book book, String fileName) {//Write epubEpubwriter Epubwriter=NewEpubwriter ();Try{outputstream ouput=NewFileOutputStream (fileName); Epubwriter.write (book, ouput);}Catch(FileNotFoundException e) {e.printstacktrace ();}Catch(IOException e) {e.printstacktrace (); }}}
The article refers to the technical blog of work hard work Smart
Http://www.cnblogs.com/linlf03/archive/2011/12/13/2286218.html
Program Implementation Blog Http://www.cnblogs.com/xckk/p/4598196.html
Scholar Kun Kun produced
Analysis of common ebook formats such as epub, OCF----Java sample program