Analysis of common e-book formats such as epub and OCF----Java sample program

Source: Internet
Author: User
Tags unique id

First, e-book Introduction

Reprint Please specify http://www.cnblogs.com/xckk/p/6020324.html

Epub (Electronic Publication) is a fully open and free ebook standard. is a free open-label

It is a content that can be "automatically rearranged".

The reason for "automatic re-orchestration" is that XHTML is used internally to represent the contents of a file, and a series of CSS is used to define the format and layout, which separates the content from the choreography.

The Epubepub now includes three main specifications:

Open publishing structure (open Publication structure,ops) 2.0, to define the content of the layout;

Open Package Format (Open Packaging Format,opf) 2.0, defines an XML-based. epub file structure;

Open container Format (OEBPS Container FORMAT,OCF) 3.0 to collect all relevant files into a ZIP archive.

The purpose of the OPS specification is to provide a minimal general guidance to content providers (such as publishers, authors, etc.) and publishing tool providers to ensure that electronic content remains consistent across a variety of reading systems.

OPS uses XHTML to build the content of the book, using CSS to define the format and layout of the book.

The OPF specification defines the mechanism by which various OPS publishing components are connected and provides additional structure and language. The OPF specification is separate from the OPS specification and is designed to modularize the content description technology and the packaging description technology.

OCF defines a standard mechanism for packaging all electronic publishing components into a single file for dissemination, delivery, and archiving. is a packaging mechanism.

Related Specification Documents:

OPS2.0 Specification: http://www.idpf.org/epub/20/spec/OPS_2.0.1_draft.htm

OPF2.0 Specification: http://www.idpf.org/epub/20/spec/OPF_2.0.1_draft.htm

OCF Specification: http://www.idpf.org/epub/301/spec/epub-ocf.html

Ii. Introduction to the basic knowledge of epub

Open mode

Epub can be opened with Stanza's PC version, or it can be used in Firefox or Chrome.

Firefox epubreader extensions: Epubreader is a Firefox extension that allows you to read EPUB-formatted files. You do not need to install additional software, in the Firefox browser can read the epub format file, recommend this.

File parsing

An EPUB is a simple ZIP format file (with an. EPUB extension), which includes files arranged in a predefined manner. In addition, EPUB is very simple. Simply change the suffix to. zip or. rar, and unzip to see the contents of the file. Such as

A zip of an epub ebook roughly contains the following:

1, mimetype file, must be the first file of the compressed package. Note that the mimetype must be in a non-compressed format.

2, Meta-inf directory, contains at least one container.xml file.

3, oebps directory (can be another name, but suggested with this name), contains:

A) the image subdirectory (not necessarily always) holds all the picture files

b) Content.opf file name can be other, the extension must be Opf, is an XML format within the package file list

c) Toc.ncx directory file, a "logical directory", browse the control file.

d) Some XHTML or HTML files. Is the content of the book.

Directory and file structure for simple EPUB archives:

MimeType

meta-inf/

Container.xml

oebps/

Content.opf

Title.html

Content.html

Stylesheet.css

Toc.ncx

images/

Cover.png

A complete epub format ebook, the Opf file typically includes the. Opf and. ncx files in the oebps directory.

1. opf file

Includes four elements: metadata, manifest, spine, guide

(1) Metadata:epub metadata, such as title, language, identifier, cover, etc. Of these, both the title and identifier data are required. According to the EPUB specification, identifier is defined by the creator of the digital book and must be unique. For book publishers, this field typically includes the ISBN or Library of Congress numbers, or a URL or a randomly generated unique user ID. Note: The value of Unique-identifier must match the ID attribute of the dc:identifier element.

(2) Manifest: Lists all the files contained in the directory (XHTML, CSS, PNG, NCX, etc.). EPUB encourages the use of CSS to set the style of book content, so the manifest tag also contains CSS files. Note: All files that enter a digital book must be listed in manifest, manifest lists only the files, and the structure order between the files is not listed.

3) Spine: Linear reading order for all XHTML documents. Where the TOC property of the spine label must be included in the Manifest column. NCX ID. The OPF spine can be understood as the order of "pages" in the book, and the parsing is read spine from top to bottom in document order.

Each itemref element in spine needs to have a IDREF attribute that matches an ID in manifest.

2.. ncx file

NCX defines a catalog table for digital books. In a complex book, a table of contents typically uses hierarchies, including nested content, chapters, and sections. Contains the TOC (Tablet of content, which provides some information on fragmentation). The NCX

UID: The unique ID of a digital book. This element corresponds to the dc:identifier in the OPF file.

Depth: reflects the depth of the hierarchy in the table of contents.

Totalpagecount and Maxpagenumber: Only for paper books, keep 0.

The content of Doctitle/text is the title of the book, which matches the dc:title in the OPF. The following example program is a change to the title.

Navmap defines the catalog of books and is the most important part of the Ncx file. Navmap contains one or more navpoint elements, and each navpoint contains the following elements:

Playorder: Describes the reading order of the document. And the ITEMREF elements in the OPF spine are in the same order.

Navlabel/text: Gives the title of the chapter. It is usually the title or number of the chapter.

Content: its SRC attribute points to the physical resource that contains the content. Is the file declared in the OPF manifest.

You can also have one or more navpoint elements. Ncx using nested navigation points to represent a hierarchy of documents

3. What is the difference between the. Opf spine and the. ncx file?

The spine label describes the document order, and the. ncx file describes the directory.

Both are easy to confuse because two files describe the order and content of the document. The simplest way to illustrate the difference between the two is to use a printed book to make an analogy: the spine tag of the. Opf file describes how the chapters in the book are actually connected, for example, by flipping through the last page of chapter one to see the first page of Chapter two. Ncx described the catalogue at the beginning of the book, The table of contents will certainly contain the main chapters of the book, but it may also contain subsections that do not have separate pagination.

One rule is that NCX contains navpoint elements that are usually more than itemref elements in the Opf spine. In fact, all items in spine appear in. Ncx, but. Ncx may be more detailed.

Three, OCF format ebook Introduction

The e-book produced by Mi-goo is. OCF format, which is the same as most of the formats specified by EPUB, and expands it to address the richness of layouts, the openness of formats, and document security, Exchange, extensibility, and clipping. Suitable for mobile phones, handheld reading terminals and other equipment.

The OCF format book is formed after it has been packaged in internal library storage, chapter splitting and compression, encryption, and MEB. Meb as an extension of the ebook, you can be listed for the client to parse the read.

The directory structure of the OCF package:

Ocf│mimetype├─meta-Inf││book.ncx││book.opf││container.xml││cover.xml││right.xml│└─ext│cover180240.jpg│cover5168.png│cover60 0800.jpg│cover6080.jpg│cover75100.jpg│cover81108.png│cover90120.jpg└─oebps├─chapter01││chapter01.html│├─css││s tyle.css│└─images│├─ -││image0.jpg│├─176││image0.jpg│├─ -││image0.jpg│├─ the││image0.jpg│├─ the││image0.jpg│├─480││image0.jpg│└─orig│image0.jpg├─chapter02││chapter02.html│├─css││style.css│└─images│├─ -││image0.jpg││image1.jpg│├─176││image0.jpg││image1.jpg│├─ -││image0.jpg││image1.jpg│├─ the││image0.jpg│├─ the││image0.jpg│├─480││image0.jpg│└─orig│image0.jpg│image1.jpg

The file name and directory name are described below:

File name/directory name

Describe

Type

Qualified

MimeType

MIME type description File

ASCII file

Must-Choose

meta-inf/

Meta Information Directory

Directory

Must-Choose

Container.xml

Container description File

XML file

Must-Choose

Book.opf

Book metadata description file, such as book title, author information

XML file

Must-Choose

Book.ncx

Describe the directory structure information of a book

XML file

Must-Choose

Cover.xml

Cover content File

XML file

Must-Choose

Right.xml

Right description file

XML file

Options available

ext/

MEB File extension Directory

Directory

Options available

Coverwh.jpg

Coverwh.png

Cover picture

Jpg/png file

Options available

oebps/

MEB Content Information Directory

Directory

Must-Choose

chapterxx/

Chapter Catalogue

Directory

Must-Choose

Chapterxx.html

Page content page

XHTML file

/

images/

Picture Catalogue

Directory

Options available

Orig

Original Storage Directory

Directory

Options available

128

Specific resolution picture Storage directory

The current resolution is 128, 176, 240, 320, 360, 480, 600 (600 no separate directory, because 600 and 600 of the width of the picture, only need to exist in the orig)

Directory

Options available

Image0.jpg

Picture file

Orig storage is the original, if the chapter refers to the picture, the directory must have a picture, 128-480 directory, if the original width is greater than the directory name value, there will be a file, the width of 300, then in orig, 128, 176, 240 will have the corresponding compressed picture file.

Jpg/png

Options available

Iv. Java parsing epub format ebook Example

The article concludes with an example program that parses an epub ebook. The main function of the program is to read the modified. epub Format File

Related materials and source code are available for download in the link: Http://pan.baidu.com/s/1bnm8YXT

Including

1, Java Project Engineering Test_epub, which includes a jar package and an epub ebook mybook.epub

2. epub-related jar packages (Epublib-codr-lastest.jar, and Slf4j-*.jar)

3, e-book Mybook.epub

Java parsing. epub Format ebook, the specific implementation code is as follows. Wrote a simple HelloWorld program, added the corresponding jar package.

Program Description:

1, read the Epub/mybook.epub file.

2, modify the title in the metadata

3. Export the new. epub File to the project directory. File name Mynewbook.epub

By extracting the Mynewbook.epub file, you can see that the <dc:title> tag content in the <docTitle> and Content.opf in the Toc.ncx file has been modified.

 Packagecom.hk;ImportJava.io.FileInputStream;Importjava.io.FileNotFoundException;ImportJava.io.FileOutputStream;Importjava.io.IOException;ImportJava.io.InputStream;ImportJava.io.OutputStream;Importjava.util.ArrayList;Importjava.util.List;ImportNl.siegmann.epublib.domain.Book;ImportNl.siegmann.epublib.epub.EpubReader;ImportNl.siegmann.epublib.epub.EpubWriter;/*** * epub format file read and Write sample program *@authorhk*@version[version number, November 26, 2015]*@see[Related class/Method]*@since[Product/module Version]*/ Public classtestepub{ Public Static voidMain (string[] args) {System.out.println ("Hello World");//get the ebook pathString Epubpath=Getebookpath ();//Read epub file Book Book=Readbook (epubpath);//Modify ebookModifybook (book); String OutputFileName= "Mynewbook.epub"; Writebook (book, outputfilename);}/*** Get e-book Path * *@return* @see[Class, Class # method, Class # member]*/Private StaticString Getebookpath () {//the resulting Currentpath = file:/e:/study/epub/test_epub/bin/, where e:/study/epub/test_epub is the project pathString Currentpath= Thread.CurrentThread (). GetClass (). GetResource ("/"). toString (); String Epubpath= Currentpath + "Epub/mybook.epub";//prefix in the path last year//file:/Epubpath= Epubpath.substring (6, Epubpath.length ()); Epubpath= Epubpath.replace ("/", "//"); System.out.println (Epubpath);returnEpubpath;}/*** Read epub file * *@return* @see[Class, Class # method, Class # member]*/Private StaticBook Readbook (String epubpath) {epubreader Epubreader=NewEpubreader (); Book Book=NULL;Try{InputStream Inputstr=NewFileInputStream (epubpath);=epubreader.readepub (INPUTSTR);}Catch(FileNotFoundException e) {e.printstacktrace ();}Catch(IOException e) {e.printstacktrace ();}returnBook ;}/*** Modified e-Book This section modifies the <dc:title> tag content in the <docTitle> and Content.opf in the Toc.ncx file. *@parambook*@see[Class, Class # method, Class # member]*/Private Static voidModifybook (book book) {//sets the title within the epub file.List<String> titleslist =NewArraylist<string>(); Titleslist.add ("Test Book"); Book.getmetadata (). Settitles (titleslist);}/*** Output ebook * *@parambook*@paramfilename*@see[Class, Class # method, Class # member]*/Private Static voidWritebook (book book, String fileName) {//Write epubEpubwriter Epubwriter=NewEpubwriter ();Try{outputstream ouput=NewFileOutputStream (fileName); Epubwriter.write (book, ouput);}Catch(FileNotFoundException e) {e.printstacktrace ();}Catch(IOException e) {e.printstacktrace (); }}}

The article refers to the technical blog of work hard work Smart

Http://www.cnblogs.com/linlf03/archive/2011/12/13/2286218.html

Program Implementation Blog Http://www.cnblogs.com/xckk/p/4598196.html

Scholar Kun Kun produced

Analysis of common ebook formats such as epub, OCF----Java sample program

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.