EPUB e-book format conversion (e-book conversion translation)-Calibre

Source: Internet
Author: User
Tags xpath

Address: http://calibre-ebook.com/user_manual/conversion.html#convert-microsoft-word-documents

 

 

The calibre conversion system is designed to be very easy to use. Generally, you only need to add a book to calibre and click convert. calibre will generate output as close as possible to the input. However, calibre accepts many input formats, but not all of them are suitable for conversion to other formats of e-books. In this case, for these input formats, if you want to control the conversion system to a greater extent, calibre has many control options in the conversion process. However, please note that the calibre conversion system is not a completely mature alternative to the e-book editor. To edit e-books, we recommend that you first use calibre to convert them to Epub, and then use a dedicated Epub Editor, such as Sigil, to create a perfect book. Then, you can use the edited Epub as the input and use calibre to convert it to another format.

 

This file mainly refers to the conversion settings, as shown in the conversion dialog box. All these settings can also be converted through the command line interface and recorded for conversion of ebook. In calibre, you can move your mouse to get help in any personal settings. A tool prompt is displayed indicating the settings.


 

Content

 

Introduction

Appearance and feeling

Page settings

Structure Detection

Directory

How to Set options/save Conversions

Conversion tips for specific formats

 

 

 

Introduction

First, you need to know about the conversion system. It is designed as a pipeline. As follows:

The input format is first converted from the corresponding Input plug-in to XHTML. Then convert HTML. In the last step, there is an appropriate output plug-in for processing XHTML to convert to the specified output format. The conversion result may vary greatly according to the input format. Some formats are better than other tools. Here is a list of the best source format conversions, such as watermark, MoBi, Epub, HTML, PRC, RTF, PDB, txt, and PDF.

 

The conversion of XHTML output occurs in all work. There are various conversions, such as inserting the metadata page at the beginning of the book, checking the Chapter title and automatically creating the directory table, and adjusting the font size proportionally. It is important to remember that all conversions are performed by the XHTML output input plug-in instead of the input file itself. Therefore, for example, if you ask calibre to convert the RTF file to Epub, it is first converted to XHTML internally, and various conversions will be applied to the XHTML, the output plug-in automatically generates the Epub file, all metadata (metadata), directories, and so on.

 

You can see the process of this action by using the debug option. You only need to specify the debug path as the output directory. During the conversion process, calibre will place the stages of the generated XHTML conversion pipeline in different subdirectories. The four subdirectories are:

 

Pipeline transition phase

Directory | description

Input | this includes the HTML output input plug-in. Use this to debug the input plug-in.

Parsed | pre-processing and conversion from the input plug-in to XHTML output results. Structure detection for debugging.

Structure | post-structure check, but before converting CSS flattening and font size. Font size conversion and CSS conversion for debugging.

Processed | before the e-books are delivered to the output plug-in. Debug the output plug-in.

 

 

If you want to edit the input file before using calibre for conversion, the best way is to edit the file in the input subdirectory, compress it, and use it as the compressed file for subsequent conversion input format. To do this, add the ZIP file of the compressed file in the edit meta information dialog box, and select zip as the input format in the lower left corner at the top of the conversion dialog box.

This file mainly deals with the conversion of various operations in the XHTML format and describes how to control them. There are some prompts at the end, specific to each input/output format.

 

 

Appearance and feeling

 

Content

Font size scaling

Section spacing

Additional CSS

Miscellaneous

 

This set of options controls various aspects of the appearance and the feeling of changing the ebook.

 

Font size scaling

One of the best features of the electronic reading experience is the ability to easily adjust the font size to suit individual needs and brightness options. Calibre has a complex algorithm to ensure that all books and outputs have a Consistent font size. The font size is not subject to any problems specified in the input file.

 

 

 

The basic font size of a document is the most common font size of text in this file. When you specify the appropriate font size, calibre automatically scales all the font sizes in the document proportionally. Therefore, the most common font size is the specified basic font size, adjust other font sizes as appropriate. By selecting a larger basic font size, you can obtain a larger font in the file, and vice versa. When you set the appropriate font size to achieve the best effect, you should also set the key font size.

 

In general, calibre automatically selects the appropriate basic font size for the output mode you have selected (see the page settings. However, if you can overwrite it here, it is not suitable for you by default.

 

 

 

The font size key option controls how to reset the non-basic font size. The font size key is used to work with the font Scaling Algorithm. This is a simple list of font sizes separated by commas. The font size key tells calibre how large or small the given font size should be compared to the basic font size ". This idea is a document with a limited font size. For example, the font size of a super/sub script and a footer is used for the text size of a body, titles of different levels, and a pair of super/sub scripts. Font size key allows calibre to divide the input file into separate "box" font sizes corresponding to different logical font sizes.

 

Let's use an example. Assume that our source file conversion is made by a person with excellent eyesight, and the font size is 8 PT. This means that most of the text in the file is in 8 pts size, and the title is a bit large (such as 10 and 12pt) and a little smaller 6pt footer. Now, if we use the following settings:

 

 

Base font size: 12pt

Font size key: 7, 8, 10, 12, 14, 16, 18, 20

 

The output file will have a base font size of 12 pt, 14, 16 PT title and a footer 8pt font size. Now let's assume that we want to make the largest title more prominent, as well as some large footer. To achieve this, the key font should be changed:

 

New font size key: 7, 9, 12, 14, 18, 20, 22

 

 

 

The largest title will be 18pt, And the footer will be 9pt. You can use these settings to find out the best way to use the font zoom wizard. You can click a small button next to it to set the font size to your optimal access.

 

If you want to retain the font size in the input file, all font size scaling conversions are disabled.

 

A related setting is the Row Height. The vertical height of the line's height control line. By default, (the row height is 0), and no Row Height is manipulated. If you specify a non-default value, the row height will be set at all locations without specifying its own row height. However, this is a blunt tool and should be used with caution. If you want to adjust the Row Height of some input parts, it is best to use extra CSS.

 

 

Section spacing

In general, there is no text indentation in the XHTML section. Calibre has a pair of options to control these rendering methods. Delete the spacing between paragraphs to force no paragraph spacing between all paragraphs. It also sets the text indent 1.5em (Changeable) to mark the beginning of each paragraph. On the other hand, empty rows can be inserted to ensure that there are free rows between each paragraph. These two options are very comprehensive, removing spaces, or inserting all paragraphs (actually corresponding to the <p> and <div> labels. This allows you to set options and make sure that the execution is like an advertisement, regardless of how messy the input file is. The only exception is when the input file uses a hard line break to implement cross-section spacing.

 

If you want to delete the spacing between all paragraphs, do not use these options unless you have several choices. Instead, add the following CSS code as extra CSS:

 

 

 

P, div {margin: 0pt; Border: 0pt; text-indent: 1.5em}

. Spacious {margin-bottom: 1em; text-indent: 0pt ;}

Then, in your source file, you need to use class = "spacious" to mark the paragraph spacing. If the file you entered is not in HTML, use the debugging option to describe how to obtain HTML (using input subdirectories ).

 

Extra CSS

This option allows you to specify any CSS to apply it to all input HTML files. This CSS application has a very high priority, so it is best to rewrite CSS in the input file itself. You can use it to fine-tune the document's presentation/layout. For example, if you want to correctly align all the paragraph ending classes, you only need to add:

. Endnote {text-align: right}

 

Or if you want to change the indentation of all paragraphs:

P {text-indent: 5mm ;}

 

Extra CSS is a very powerful option, but you do need a good understanding of how to use CSS to make full use of its potential.

 

 

 

Miscellaneous

There are several options in this section.

 

No text justification

In general, if the output format is supported, calibre will force the output of an e-book with a reasonable text (that is, a smooth right margin. This option disables this action. In this case, specifying any adjustment in the input file will be used.

 

Linearize tables

Some poorly designed files use tables to control the text layout on the web page. When these files are converted to text, they often run on pages and other articles. This option presents the extracted table content in a linear manner. Note that this option is used for all tables that are linearly typed. Therefore, you must ensure that the input file is used improperly, such as submitting table information.

 

Transliterate Unicode characters

 

 

ASCII representation of Unicode characters. Use it with caution because it will replace the use of ASCII Unicode characters. For example, it will replace "Mikhail gorbachiov" with "too many zookeeper ". In addition, note that a single character has multiple meanings (for example, a Chinese character and a Japanese character), and the maximum number of users is used (in the previous example, Chinese characters are used ). This option is mainly useful if your electronic device does not support Unicode e-books.

 

Input character encoding

The character encoding of the old document is sometimes not specified. During conversion, this may cause non-English characters or special characters, such as smart quotation marks being damaged. Calibre tries to automatically detect the character encoding of the source file, but it is not always successful. You can use this to set the encoding of a specific character. Cp1252 is a common file encoding generated by Windows software. You should also read how do I convert my file containing non-English characters, or smart quotes? <Http://calibre-ebook.com/user_manual/faq.html#char-encoding-faq> solve more coding problems.

 

 

 

Page settings

The page setting option controls the screen layout, such as the page margin and screen size. If the selected output format supports the margin option, there are some options to configure the margin for the output plug-in. In addition, you should select an input and output configuration file. These two sets of configuration files basically deal with how to measure screen size and default font rescaling keys in the input/output documents.

 

If you know that your file is being converted for a specific device/software platform, select the corresponding input configuration. Otherwise, select the default input configuration. If you know that the file you are generating is for a specific device type, select the corresponding output configuration file. In particular, for MoBi output files, you should choose the appropriate Kindle, Baidu for Microsoft, Epub for Sony Reader. For Epub, Sony Reader configuration will result in output of all Epub files. However, it has some side effects, such as manually inserting a delimiter to keep the internal component smaller than the threshold value required by Sony devices. Especially for iPhone/Android phones, select Sony output configuration file. If you know that your Epub file will not be readable on a Sony or similar device, use the default output configuration file. If you generate MoBi instead of Kindle, select the Library output configuration file of the mobipocket.

 

The output configuration file can also control the screen size. This will lead to, for example, automatic resizing of images to fit in some screen output formats. Therefore, select a configuration file similar to the screen size of your device.

 

 

 

Structure Detection

When the input document does not correctly specify the structure element, the calibre structure detection will try its best to detect, for example, Chapter, page, header, footer, and so on. As you can imagine, this process will change a lot with different books. Fortunately, calibre has very powerful options to control. Control options are complex, but once you take the time to learn the complexity of them, you will find it very worthwhile.

 

Chapter and paging character

Calibre has two chapters: Check options and insert pagination operator set. This is sometimes a little confusing. By default, calibre inserts a paging character at the position of the Section and the detected paging option. The reason is that there are often locations that insert pagination characters, but they do not fall within the boundaries of this chapter. In addition, the detected chapter can be set to automatically Insert the generated directory.

 

Calibre uses a powerful language, XPath, which allows you to specify chapter boundaries/pagination characters. XPath looks a little daunting at the beginning. Fortunately, here is the XPath tutorial user manual. Remember that structure detection operations are generated by the XHTML conversion pipeline. Use debugging options to find out the appropriate settings for your ebook. There is also an XPATH wizard button to help process the generation of simple XPath expressions.

 

By default, calibre uses the following expression detection section:

 

 

// * [(Name () = 'h1 'or name () = 'h2') and RE: Test (., 'Chapter | book | section | part/S + ',' I ') or @ class = 'Chapter']

This expression is quite complex because it tries to process multiple cases at a time. It means that calibre assumes that the chapter begins with the

 

The related option is the chapter flag. When it detects a chapter, it allows you to control calibre to perform some operations. By default, it inserts a paging character before the chapter. You can also insert rule lines or append page breaks. You can also do nothing.

 

The default page for detection is set:

 

// * [Name () = 'h1 'or name () = 'h2']

This means that by default, calibre inserts a pagination character in each

 

Note: The default expression may change when your input format is converted.

 

Delete the header and footer

These options are useful for conversion of PDF files. Usually, convert the text left in the header and footer. These options use regular expressions to try to detect and delete headers and footers. Remember that they operate in the middle of the pipeline that converts XHTML. There is also a wizard to help you customize the regular expression of the document.

 

Use a regular expression in the header and footer to delete the header and footer options. If the delete option does not enable the regular expression, it will not be used to remove the matching text. The purge process uses a python regular expression. All matched texts are only removed from the file. You can learn about regular expressions and Their syntax (http://docs.python.org/library/re.html ).

 

 

 

Miscellaneous

There are several options in this section.

 

Insert metadata as page at start of book

One of the most successful things about calibre is that it allows you to maintain the complete metadata of all books, such as rating, tags, and comments, this option will create a single page of all metadata (metadata) and insert it into the converted ebook, usually after the cover. You can think of it as a way to create your own book jacket.

 

Remove first image

Sometimes, the source file you want to convert contains a part of the book as the cover, rather than as a separate cover. If you also specify a cover, the converted book will have two covers. This option only deletes the first image of the source file, so that the converted book has only one cover, that is, one specified by calibre.

 

Preprocess Input

This option activates algorithms that attempt to identify and correct common situations of incorrect format input files, such as hard line breaking and unformatted large block text. If the format of your input file is very bad, open this option. However, in some cases, this option may lead to worse results, so be careful when using this option.

 

Line-Unwrap factor

This option controls the calibre used to remove the hard line feed algorithm. For example, if the value of this option is 0.4, it means that calibre will delete a hard line break whose length is less than 40% of the length of all rows in the document.

 

 

 

Directory

When the input file is already in its metadata directory table, calibre only uses it. However, the old format or metadata-based directories are not supported, or some documents do not. In this case, the options in this section can help you automatically generate a directory that converts the actual content of the input file of an e-book.

 

The first option is to force automatic directory usage. With this option selected, You can have calibre overwrite any directory and find the generated file in the Automatic Input and metadata.

 

By default, calibre will first try to add the table generated by any detected chapter. You can learn how to customize the chapter structure of the part detected above. If you do not want to include the detected content section in the generated directory table, do not add the detected chapters option.

 

If it falls below the detection chapter number threshold, calibre adds any discovered hyperlink to the directory table of the input file. This is often effective for directories that start to include a hyperlink in multiple input files. The number of links option can be used to control this behavior. If it is set to zero, no link is added. If you set a maximum of numbers greater than zero, that is, the number of links increases as much as possible.

 

Calibre automatically filters the duplicate contents of the table generated by the Directory. However, if there are some additional bad entries, you can use the TOC filter to filter them. This is a regular expression that will match the title of the content entry in the generated directory table. Each time a match is found, it is deleted. For example, to delete all entries with the title "Next" or "previous", use:

Next | previous

 

Finally, level 1, 2, and 3 TOC options allow you to create a complex multi-level directory. They are the XHTML tags in the pipeline generated by the transformation that the XPath expression matches. See how to access this XHTML. Read the XPath tutorial to learn how to build an XPATH expression. There is a button next to each option to start a wizard to help create basic XPath expressions. The following simple example shows how to use these options.

 

Suppose you have an input file that looks like this in XHTML:

 

<HTML xmlns = "http://www.w3.org/1999/xhtml">

<Head>

<Title> sample document </title>

</Head>

<Body>

<H1> Chapter 1

...

<H2> section 1.1 </H2>

...

<H2> section 1.2 </H2>

...

<H1> Chapter 2

...

<H2> section 2.1 </H2>

...

</Body>

</Html>

Then, we set the options:

 

Level 1 TOC: // H: h1

Level 2 TOC: // H: H2

This will cause an automatic generation of two levels of directory tables, which looks like this:

 

Chapter 2

Section 1.1

Section 1.2

Chapter 2

Section 2.1

Warning not all output formats support multi-level directories. You should first try to use the Epub output. If it works properly, select your format.

 

 

 

Set/saved option for conversion

The calibre conversion option can be set in two places. First, go to preferences-> conversion. These settings are the default conversion options. When you try to convert a new book, The settings will be used by default.

 

You can also change the conversion settings of each book in the conversion dialog box. When you convert a book, calibre will remember to use it for the setting of your book. Therefore, if you switch it again, the setting for saving each book will be higher than setting the default priority in the preference. You can use the restore to defaults button in the book conversion dialog box to restore to the default personal settings.

 

When you batch convert a set of books, the settings are in the following order:

From the defaults set in preferences-> Conversion

Save the conversion settings (if any) from each converted book ). In the upper left corner of the bulk conversion dialog box, this option may be disabled.

From the settings set in the bulk conversion Dialog

Note that the final settings for batch conversion of each book will be saved and reused if the book is converted again. Since bulk conversion is the most preferred setting in the large-capacity conversion dialog box, this will overwrite the specific settings of any book. Therefore, you should only batch convert the books with similar settings. The exception is the metadata input format and specific settings. Since the batch conversion dialog box does not have these two types of settings, they will take specific settings (if any) or default values from the book.

 

Note: you can click the rotation icon in the lower-right corner, double-click a personal conversion task, and use the actual settings in any conversion. This will have a conversion log, including the actual settings, close to the top.

 

 

 

Specific format conversion description

Here, you will find the specific format conversion. For specific format options, whether it is the input or output conversion dialog box, you can select according to your own part, such as TXT input or EPUB output.

 

Convert Microsoft Word documents

Calibre does not directly convert the Microsoft Word doc file. However, in word, you can save it as an HTML file, and then use calibre to convert the generated HTML file. When saving in HTML format, you must use the "Save As web page, filtered" option, because this will generate a clean HTML and will be well converted.

 

Here is a word macro package, which can be used to automatically convert Word files using calibre. This makes the generated directory table much simpler. This is the so-called bookcreator, which is provided free of charge on assumeread.

 

Convert TXT files

TXT files do not have clear methods to specify structure formats such as bold and italic, or files such as paragraphs, titles, and chapters. Because the TXT file does not provide a method to explicitly mark the text, by default, calibre only converts some lines of the input document into paragraphs. One or more blank section borders are assumed by default:

 

This is the first.

 

This is

Second paragraph.

TXT input supports a number of options to differentiate how paragraphs are detected.

 

Treat each line as a paragraph

Assumes that every line is a paragraph:

 

This is the first.

This is the second.

This is the third.

Assume print formatting

Assumes that every paragraph starts with an indent (either a tab or 2 + spaces). Paragraphs end when the next line that starts with an indent is reached:

 

This is

First.

This is the second.

 

This is

Third.

 

Process Using markdown

Calibre also supports the conversion of TXT input files by a pre-processor named markdown. Markdown allows you to add the basic format to a TXT file, such as bold, italic, Section title, table, list, and a directory. The following describes how to mark the Chapter title, leading # and setting the XPath detection expression "// H: h1" is the easiest way to generate an appropriate directory table from a TXT file. You can learn about markdown syntax

 

 

 

Convert PDF files

PDF file format is the worst conversion, fixed page size and format text location. This means that it is very difficult to determine the end of a paragraph and the start of another paragraph. Calibre tries its best to use the line un-wrapping factor to unbind segments. This is used to determine which row should be to unlock the length scale. The valid values are decimal places between 0 and 1. The default value is 0.5, which is a medium length. Decrease this value to include more expanded text. Increase includes less. You can adjust the value of input conversion settings in this PDF format.

 

In addition, they often have headers and footers that become part of a text file. Use the option to delete the header and footer to alleviate this problem. If the header and footer are not deleted from the text, the segment is split.

 

Some restrictions on PDF input are complex, and multi-column, image-based files are not supported. Extracting vector images and tables in this document is also not supported.

 

 

 

Cartoon set

A collection of comic books is a. CBC file. A. CBC row file is a zip file that contains other CBZ/CBR files. In addition, the. CBC file contains a simple example file named comics.txt, Which is encoded in UTF-8. In the. CBC file, the comics.txt file must contain a cartoon file name list in the format of filename: title, as shown below:

 

One. CBZ: Chapter one

Two. CBZ: Chapter Two

Three. CBZ: Chapter Three

The. CBC file will include:

 

Comics.txt

One. CBZ

Two. CBZ

Three. CBZ

Calibrewill automatically convert .cbc files into e-books, and the contents in the e-books are directed to each entry in comics.txt.

 

Advanced format demonstration of Epub

Various advanced Epub file formats are presented in this demo file (http://calibre-ebook.com/downloads/demos/demo.epub. This file is created in HTML manually encoded with calibre. It is designed as an Epub template for your own efforts.

 

Htmlsource can be created from demo.zip (http://calibre-ebook.com/downloads/demos/demo.zip. Settings used to create an Epub from a zip file include:

 

Ebook-convert demo.zip. EPUB-VV -- Authors "kovid Goyal" -- language en -- level1-toc '// * [@ class = "title"]' -- disable-font-rescaling -- page-breaks-before/ -- no-default-Epub-Cover

Note: Because this file explores the potential of Epub, most advanced formats cannot work properly in readers that are lower than calibre's built-in Epub viewer.

 

 

-------------------------- Full text -----------------------------------------

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.