1. Basic Introduction
Aspose.words is a commercial. NET class library that enables applications to handle a large number of file tasks. Aspose.words supports doc,docx,rtf,html,opendocument,pdf,xps,epub and other formats. You can use Aspose.words to generate, modify, transform, and print documents without using Microsoft.word. Using Aspose.words in your project can have the following benefits.
1.1 Rich feature set
Its rich functional characteristics are mainly the following 4 aspects:
1) format conversion. Aspose.words has a high-quality file format conversion function, and can be converted to doc,ooxl,rtf,txt and other formats.
2) Document Object model. Access all document elements and formats programmatically through a rich API, allowing creation, modification, extraction, copying, splitting, adding, and replacing file content.
3) file rendering. You can convert the entire document on the server side or the page to pdf,xps,swf format, as well as convert the document page to an image format, or a. NET Graphics object, which is the same as the Microsoft.word.
4) Report. You can generate a file from an object or from a data source fill-in template.
1.2 No Microsoft.word required
Aspose.words can work on a machine that does not have Microsoft Office installed. All Aspose components are independent and do not require Microsoft's authorization. In summary, Aspose.words is a great choice for security, stability, scalability, speed, price, and automation capabilities.
1.3 Stand-alone platform
Aspose.words can run on Windows,linux and Mac OS OS. You can use Aspose.words to create 32-bit or 64-bit. NET applications, including ASP, WCF, WinForm, and so on, as well as using COM components in the ASP, Perl, PHP, and Python languages. You can also use aspose.words to build. NET applications on the Mono platform.
1.4 Performance and Scalability
Aspose.words can run on both the server and the client, which is a standalone. NET assembly that can be replicated and deployed by any. NET application. With Aspose.words, you can generate thousands of documents in a short period of time, open documents, modify formats and content, populate data, and save them. Aspose.words are multithreaded security, and different threads process different documents at the same time.
1.5 Minimum learning curve
Although Aspose.words has more than 150 public classes and enumeration types, Aspose.words's learning curve is small because the Aspose.words API is designed around the following goals:
1) draw on some well-known API design experience, such as Microsoft Word.
2) Reference. The experience of the NET Framework Design Guide.
3) provides easy-to-use detailed documentation of document element operations.
Developers who previously used Microsoft Word in your project can find many familiar classes, methods, and properties in Aspose.words.
Back to Table of contents 2. Document Object Model Overview 2.1 DOM Introduction
The Aspose.words Document Object Model (hereinafter referred to as the DOM) is a Word document that is mapped in memory, and the Aspose.words Dom can programmatically read, manipulate, and modify the contents and formatting of a Word document. It is important to understand the structure of the DOM and the corresponding type, which is the basis for flexible programming using aspose.words. The following is a Word document example and its structure is as follows:
When the above document is read by the Aspose.words DOM, a tree object with the following structure is created:
From the structure and the corresponding Word documents, we can see the approximate structure of the related objects in the DOM, and with these basic concepts, we can manipulate the Word document in a very process. Document, section, Paragraph, Table, Shape, Run, and other ovals in the diagram are Aspose.words objects that have a tree hierarchy, and the annotations in the diagram also show that the objects in these document object trees have multiple properties.
The DOM in Aspose.words has the following characteristics:
1. All node classes eventually inherit from the node class, which is the basic type of the Aspose.words dom.
2. Nodes can contain (nest) other nodes, such as sections and paragraph, which inherit from the Compositenode class, and the Compositenode class source and Node class.
2.2 Node type
When aspose.words reads a Word document into memory, different types of document elements are substituted by different types of objects, and each text box is a node object, paragraph, table, section, or even the document itself. Aspose.words defines a class for each document node type.
The following is a UML class diagram that represents the relationship between different node types in the DOM. The name of the abstract class is represented in italics. Note that the Aspose.words DOM also includes classes of non-node types, such as style, PageSetup, font, and so on, which are not shown in this image.
Look at these major classes and roles.
Aspose.words class |
Category |
Describe |
Document |
Document |
The Document object is the root node of the documentation tree, providing access to the entire document |
Section |
Document |
Section object corresponds to one of the sections in a document |
Body |
Document |
is the main text container in a section |
HeaderFooter |
Document |
Special header or footer container in the section |
Glossarydocument |
Document |
Represents the root entry for a glossary in a Word document |
BuildingBlock |
Document |
Represents a glossary document, such as a widget, AutoText, or an AutoCorrect entry |
Paragraph |
Text |
A text paragraph that protects an inline node |
Run |
Text |
A text block of consistent formatting |
Bookmarkstart |
Text |
A bookmark's start mark |
BookmarkEnd |
Text |
End tag for a bookmark |
Fieldstart |
Text |
A special character specifies the beginning of a word field |
Fieldseparator |
Text |
Delimiter for a Word field |
Fieldend |
Text |
A special character specifies the end of a word field |
FormField |
Text |
A form Field |
Specialchar |
Text |
Special character type, no specific |
Table |
Tables |
A table in a Word document |
Row |
Tables |
Row of a Table object |
Cell |
Tables |
Cells for Table rows |
Shape |
Shapes |
Images, shapes, text boxes, or OLE objects in a Word document |
Groupshape |
Shapes |
A group of Shapes objects |
DrawingML |
Shapes |
Sharp or image in a document, chart |
Footnote |
Annotations |
Include footnotes or endnotes for text in the document |
Comment |
Annotations |
Comments that contain text in the document |
Commentrangestart |
Annotations |
The start of a related annotation area |
Commentrangeend |
Annotations |
The end of a related comment area |
SmartTag |
Markup |
A smart tag that surrounds one or more inline structures within a paragraph |
Customxmlmarkup |
Markup |
Custom XML markup for some structures in a document |
Structureddocumenttag |
Markup |
A structured document label (content control) in a document |
Officemath |
Math |
A mathematical object, such as a function, equation, or matrix, that represents office. |
2.3 Composition Mode
The structure tree of the Aspose.words document is very important, and the following design spit can clearly understand the containment relationship between the nodes.
2.3.1 Document and section
Documents and sections:
It can be seen from:
1. A document has 1 or more section nodes;
2.Section has 1 body (body), no or multiple headerfooter nodes;
3.Body and HeaderFooter can contain multiple block-level nodes;
4.1 Document can have a glossarydocument.
1 Word documents contain 1 or more sections, a section that defines its own page numbers, margins, orientation, and text for the header and footer; A section that protects the main issues such as headers, footers (home page, odd pages, even pages).
2.3.2 Block-level Node
The diagram for the Block-level node is as follows:
From here you can see:
The 1.block-level element can appear in many places in the document, such as the body's child nodes, footnotes, comments, and other elements of the cell.
2. The most important BLOCK-LEVEL nodes are tables and paragraphs;
3.1 tables have 0 or more rows;
Customxmlmarkup and Structureddocumenttag can contain other block-level nodes;
2.3.3 Inline-level Node
From the chart above you can see the following relationships:
1.Paragraph is the most frequent inline-level node;
2.Paragraph can contain different run-format nodes, or they can contain bookmarks (bookmarks) and annotations (annotations)
3.Paragraph can also contain shapes, images, drawing objects, and smart tags;
2.3.4 Table row Cells
A table can contain many rows, rows can contain cells, and cells can include block-level nodes.
2.4 Design Patterns and navigation
Aspose.words represents a document as a tree with nodes, so you can switch between nodes. Aspose.words provides a "document browser" (Documentexplorer), which is a project example demo. As shown in the following:
The ancestor node can be accessed through the ParentNode property of the node class, so it is convenient to get the parent node. The Document Object model is composed of a large number of objects, and their relationships are as follows:
The 1.Node class is the base class for all node classes;
The 2.CompositeNode class is the base class of the combined nodes;
In the 3.Node class, there is no child node management interface, the method of child node management only appears in Compositenode;
4. Remove the child node management method from the node class, cleaner, can reduce a lot of additional conversions;
C # Operations Word aspose.words Components Introduction and Usage-Basic introduction and DOM overview