Terms
The XSL-FO (XML Stylesheet Language-Formatting Objects)
A XSL-T (XSL transformations) PrefaceThis article describes fop and its related technologies, such as XSL-FO, fo tools, and how to use these tools for document conversion (finally we will introduce the use of wh2fo to convert Word documents to PDF ). FOP Introduction
FOP (Formatting Objects processor) is the first printing format processor based on XSL: FO and the first format processor independent of output. It is a JavaProgram, Which can be read from the object tree and then generate rendered pages and output them to the specified stream. Currently, the supported output formats include PDF, PCL, PS, SVG, XML (represented in a tree structure), printer, AWT, MIF, and TXT. The most important output is PDF.
The original author of James Tauber-fop. He developed the original version of the tool and opened it very generously.CodeLater, it was handed over to the Apache XML project. (He also selected an excellent name for the tool. In addition to the combination of the name as the first word, Webster defines fop as a person who looks too well ".) James is now the chief XML designer of Bowstreet. The Apache XML project is developed in an open and cooperative manner to provide business-quality XML-based solutions. From the standard implementation perspective, it can provide feedback to standard organizations (such as IETF and W3C. XSL: fo Introduction
XSL Formatting Objects (XSL-FO) is the second part of Extensible Stylesheet Language (XSL. the XSL-FO is specifically an XML application that describes what a page should look like when it is presented to its readers. A style sheet uses the XSL-T language (XSL Transformation Language) to convert an XML document organized in semantic terms to another XSL-FO word represented in representation. While some would expect Web browsers to know how to directly display documents marked with XSL-FO in the future, there is now an additional conversion step that is required, convert the generated fo document to another format, such as Adobe PDF. Fo Tool
JFOR (Java XSL-FO to RTF converter) is to convert the XML document according to the XSL-FO specification into the rich-text format, the purpose is similar to converting a XSL-FO (typically generated using XSLT) document to a PDF (using fop or other similar tools.
The author of wh2fo and Foa is Fabio giannetti. Wh2fo is a Java application that processes HTML generated by Word 2000 and converts them into XML content files and XSL style sheet files. From these files, a standard XSLT processor may obtain fo files containing only XSL-FO tags. You can also use a style sheet to convert an XML file to HTML. In this way, the additional word markup is discarded. Using XSL-FO render, such as fop, can be further rendered into PDF. How does it work?
XSL format object and Its Features
The XSL-FO provides a visual layout model that is more complex than HTML + CSS ). Some formats, including right-to-left and top-to-bottom text, footnotes, margin notes, page numbers in cross-references, etc., are supported by XSL-FO, but HTML + CSS cannot. In particular, while CSS (Cascading Style Sheets) is mainly applied on the web, XSL-FO is designed to be more widely used. For example, you can write a book that uses the XSL style of fo to arrange the entire print instinct. Another style sheet can convert the same XML document for use on the web. [I] format object [/I] Specifically, there are 56 XSL format object elements that are defined in the http://www.w3.org/1999/XSL/Format namespace. Most of the 56 elements represent the rectangular areas of various rectangles ). Most of the remaining elements are area and space container.
The XSL Formatting model is a rectangular boxes based on a rectangle called an area. It can contain text, empty space, images, or other format object (Formatting Objects ). like the box in CSS, the area has borders and padding in each direction, but the CSS margins are replaced by space-before and space-after in XSL. The XSL formatter reads the Formatting Objects to determine which areas are placed on the page. Many format objects generate an areas (in most cases), but because page breaks, word wrapping, and hyphenation exist) and other details that need to be considered when we fill in a limited space with a large amount of potential text, some format objects occasionally produce more than one area.
The main difference between format objects is that they represent different. For example, an object in the FO: List-item-label format is a box that contains a bullet, a number, or a list item) other indicators. FO: the list-item-body format object is also a box, which contains text and list items without labels ). An object in the FO: List-item format is also a box that contains both the tag (FO: List-item-label) and list (FO: List-item-body) objects.
During processing, the fo document is divided into several pages. Web browser windows are usually treated as a very long page. The printable format usually contains multiple separate pages, and each page contains multiple areas. There are four main types of areas:
1. Regions
2. Block areas
3. Line areas
4. inline Areas
These form a rough hierarchy ). Regions can contain Block areas. Block areas can contain other blocks areas, line areas, and content ). Line areas can contain inline areas. Inline areas can contain other inline areas and content ).
Region is the container of the highest level (highest-level) defined in the XSL-FO ). You can imagine this article Article The page contains three regions: the header, the main body of the page, and the footer (footer ). regions generated by format objects include FO: Region-body, FO: Region-before, FO: Region-after, FO: Region-start, and FO: Region-end.
Block area indicates block-level elements, such as a paragraph or a list item. although Block areas can contain other block areas, there is usually a line break before and after each block area ). The block area is placed sequentially in the container containing it, which is better than the precise positioning using coordinates. When other block areas is added to the front or inside of it, it will convert the coordinates to generate the required space. The block area can contain parsed character data, inline areas, line areas, and other block areas, which are arranged sequentially in the block area of the container. Format objects that can generate Block areas include FO: Block, FO: Table-and-caption, and FO: List-block.
Line area indicates a line of text in the block. For example, each row in this list is a line area. Line areas can contain inline areas and inline spaces. No corresponding format object corresponds to line areas. Instead, when formatting engine decides how to wrap rows in Block areas, it calculates line areas.
Inline areas is a part of a line, such as a single character, a footer reference, or a mathematical equation ). Inline areas can contain other inline areas and plain text (raw text ). The format objects that can generate inline areas include FO: character, FO: External-graphic, FO: inline, FO: instream-foreign-object, FO: Leader, and FO: page-number. [I] format features [/I] Overall, the various format objects in the XSL-FO document specify the order in which content is placed on the page. However, formatting properties format details such as size, position, Font, color, and a lot more. the format feature is used as an attribute to act on a single format object element.
Many details of these features are similar to CSS. The following shows that CSS and XSL-FO use the same name to represent the same thing. For example, the CSS font-family feature (property) and the XSL font-family feature (property) indicate the same thing. Although the assignment syntax is different, however, these values have the same meanings. To represent FO: The Block Element uses a certain time format, you may use the following CSS rules:
FO: block {font-family: 'New York ', 'times new Roman', Serif}
Equivalent rules that use XSL-FO to include the font-family attribute in the FO: block are:
<FO: block font-family = "'New York ', 'times new Roman', Serif">
We can think that this is their difference, but their style names (font-family) and style values ('New York ', 'times new Roman ', Serif) are the same. The font-family feature of CSS is a table of font names separated by commas (,), sorted from start to end by options. The font-family feature of the XSL-FO is also a table of font names separated by commas, sorted from start to end by option. Both CSS and the font names referenced by the XSL-FO contain spaces, and both of them regard the keyword serif as a unique serif font.
Of course, many of the features supported by XSL objects are not available in CSS. For example, destination-placement-offset, block-progression-dimension, character, and hyphenation-keep. You must learn them to gain all the advantages of using XSL. Convert to format object
A XSL-FO is a complete XML vocabulary that is used to place text on pages. A XSL-FO document is a well-organized XML document with its vocabulary. This means that it has XML definitions, root elements, child elements, and others. It must follow any other well-organized XML document, otherwise formatters will not accept it. According to the Convention (Convention), files containing XSL objects can use a suffix of A. FOB file with three letters, or a. Fo suffix with two letters. However, it can also use the. xml suffix because it is also a well-organized XML file.
List 1 is a simple document marked with an object in XSL format. The root of this document is FO: Root. This element contains an FO: Layout-master-set and an FO: page-sequence element. The FO: Layout-master-set element contains the FO: simple-page-master, a page sub-element that describes the content that will be placed here. This document is just a very simple page, but more complex documents will have different master pages, which will be first, right, and left, body pages, front matter, back matter, and so on. Each Other may have the same blank edge (margins), page numbering, and other features. The referenced name of page master is defined in the master-name attribute.
The content is stored in the copy fo: page-sequence of the mater page. FO: page-sequence has a master-reference attribute that identifies the name of the referenced master page. Its FO: Flow sub-element contains the content truly placed on the page. The content here is represented by two sub-elements of the FO: block. Each sub-element defines 20 pixels, And the font-family is serif and line-height) is the attribute of 30 pixels. List 1: A simple XSL-FO document <? XML version = "1.0"?>
<FO: Root xmlns: fo = "http://www.w3.org/1999/XSL/Format"> <FO: Layout-master-set>
<FO: simple-page-master-name = "only">
<FO: Region-body/>
</FO: simple-page-master>
</FO: Layout-master-set> <FO: page-sequence master-reference = "only"> <FO: Flow flow-name = "XSL-region-Body">
<FO: block font-size = "20pt" font-family = "serif"
Line-Height = "30pt">
Hydrogen
</FO: block>
<FO: block font-size = "20pt" font-family = "serif"
Line-Height = "30pt">
Helium
</FO: block>
</FO: flow> </FO: page-sequence> </FO: root> Although you can manually write documents like List 1, this will lose all the convenience that content-format independenc has brought to us. Generally, you can write an XSLT style table that converts an XML source file into an XSL-FO file. List 2 below shows us An XSLT style table. List 2: A style table used to convert to XSL: fo <? XML version = "1.0"?>
<XSL: stylesheetversion = "1.0"
Xmlns: XSL = "http://www.w3.org/1999/XSL/Transform"
Xmlns: fo = "http://www.w3.org/1999/XSL/Format"> <XSL: Output indent = "yes"/> <XSL: template match = "/">
<FO: Root xmlns: fo = "http://www.w3.org/1999/XSL/Format"> <FO: Layout-master-set>
<FO: simple-page-master-name = "only">
<FO: Region-body/>
</FO: simple-page-master>
</FO: Layout-master-set> <FO: page-sequence master-reference = "only"> <FO: Flow flow-name = "XSL-region-Body">
<XSL: Apply-templates select = "// Atom"/>
</FO: flow> </FO: page-sequence> </FO: root>
</XSL: Template> <XSL: template match = "atom">
<FO: block font-size = "20pt" font-family = "serif"
Line-Height = "30pt">
<XSL: value-of select = "name"/>
</FO: block>
</XSL: Template> </XSL: stylesheet> Use wh2fo to convert a Word document to a PDF file
As the end of this article, we will show how to use wh2fo to convert a Word file into a PDF file. The word used in this example is the first two pages of the first draft of this article.
1. First, save the Word2000/XP document as an HTM file. The file name I use is fop.htm.
2. Use the wh2fo Batch Processing Command for conversion. The wh2fo version I used is 0_3_1 and windows.
> Wh2fo fop.htm
This command generates three files: fop. XML, fop. XSL, and fopatts. XSL.
3. the following uses fop in the Apache XML project for XSLT conversion and FOP rendering. The fop version I use is a fop-0.20.4rc. Download from the Apache fop homepage http://xml.apache.org/fop. It will reference the other two jar packages: aveon-framework and batik.
4. register the font of FOP. FOP does not directly support TrueType fonts. The current version can be resolved through registration. I registered a frequently used Chinese font. The following is the registration method of simhei:
* An XML font ing file is generated by the TTF (TrueType) font file:
> JAVA org. Apache. fop. fonts. Apps. ttfreader c: \ winnt \ fonts \ simhei. TTF simhei. xml
For TTC (including multiple TrueType) font files:
> JAVA org. Apache. fop. fonts. Apps. ttfreader c: \ winnt \ fonts \ simsun. TTC-ttcname "simsun" simsun. xml
* Modify the conf/userconfig. xml file and add it to <fonts> </fonts>.
<Font metrics-file = "simhei. xml" kerning = "yes" embed-file = "C: \ winnt \ fonts \ simhei. TTF">
<Font-triplet name = "simhei" style = "normal" Weight = "normal"/>
</Font>
5. XSLT conversion. Before conversion, you must modify the XSL file to avoid errors and output failures. Some of these errors are caused by the incompatibility of the XSL: fo version or the tags not currently supported by the FOP, and some are caused by the unsupported Chinese encoding.
* Modify the fopatts. XSL file: Change the Chinese "" to the registered "". The name I use is "simsun". Similarly, change "simhei" to "simhei ".
* Modify the FOP. XSL file: Find the file containing
<XSL: Apply-templates select = "document ('fop. XML ')/document/section [1 style = 'layout-grid: 15.6pt'] "> </XSL: Apply-templates>, set style = 'layout-grid: remove 15.6pt. Result :... Section [1]…. The conversion command is as follows:
> Org. Apache. xalan. XSLT. process-in fop. XML-XSL fop. XSL-out fop. fo
6. Through the above steps, we will get the FOP. Fo file. Before further conversion, we need to modify the fo file. Correct its page layout error and syntax unsupported error. As we mentioned above, you will find that the generated fo file has a defined page fo missing: simple-page-master element (this may happen sometimes when page layout information is lost ). We use the regular A4 page, so we can add it below the FO: Layout-master-set node:
* <FO: simple-page-master-name = "regulara4"
Page-Height = "29.7"
Page-width = "21 cm"
Margin-Top = "2.54"
Margin-Bottom = "2.54"
Margin-Left = "3.17"
Margin-Right = "3.17">
<FO: Region-body/>
<FO: Region-before/>
<FO: Region-after/>
</FO: simple-page-master>
* Change the attribute value of the master-reference sub-element FO: repeatable-page-master-reference under FO: page-sequence-master to the page name regulara4 we just defined.
* The remaining syntax does not support errors: Change the master-reference attribute of FO: page-sequence to the name of the FO: page-sequence-master "Section1-ps ".
7. Run the following command to check the result:
> Org. Apache. fop. Apps. fop-c conf/userconfig. xml fop. Fo fop.pdf
Do not forget to copy the directory of the image file.
The following is my result, which is not very satisfactory, especially when there are multiple Chinese fonts. Some text is displayed as "#", because fop does not support Chinese bold and italic. But no matter how it is a technology, it may become more perfect in the future.
Reference
1. Apache XML home page http://xml.apache.org
2. W3C XSL home page http://www.w3.org/Style/XSL/
3. Fabio giannetti homepage http://www-uk.hpl.hp.com/people/fabgia/index.html
4. Chapter 1 http://www.ibiblio.org/xml/books/bible2/chapters/ch18.html of XML le
Copyright Notice to the author |
Does this article help you? Vote: Yes NoVoting Result: 6 0 |
|
|
Other articles by the author:
- Candle lights up your web application (candlermi)
- Candle lights up your web application (Getting Started)
- Ireport integrated vector graphics
- About tapestry
- Advantages of Java. util. canlendar
|
|