Study: Preliminary analysis of Wordxml format

Source: Internet
Author: User

OFFICE2003, Word can be stored in XML text format so that you can use an external program to create a Word file without using Word's objects. You can also freely open the parsing Word file, or publish it to your own web page, or any number of other apps.

A typical wordxml structure can look like this:

<?xml version="1.0"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
   <w:body>
       <w:p>
        <w:r>
         <w:t>Hello, World.</w:t>
        </w:r>
       </w:p>
   </w:body>
</w:wordDocument>

You can create a file with Notepad, paste the above XML content, and save it as Helloworld.xml, and open it in office Word to see what's shown.

这是最简单的WordXML内容,它包括这几部分:

XML的声明和名称空间的指明:
<?xml version="1.0"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">

Document content

<w:body>...</w:body>

Base node type

As can be seen from inside the body, there are 3 types of nodes that make up the actual text content:
<w:p> represents a paragraph

<w:r> represents a style string that indicates the display style of the text it includes

<w:t> represents real text content

What if we need to specify a text that is bold?

<w:r>
<w:rPr>
<w:b w:val= "on"/>
</w:rPr>
<w:t> 2.0c</w:t>
</w:r>

<w:b w:val= "On" > indicates that the text of the format string is bold.

In this way, we know that <w:r> represents a specific text format, a slightly more complex format:

<w:r>
<w:rPr>
<w:b w:val= "on"/>
<w:sz w:val= "/><w:szcs" w:val= "/>"
<w:rfonts w:ascii= "Arial" w:eastasia= "Arial" w:hansi= "Arial"/>
</w:rPr>
<w:t xml:space= "preserve" >2.0C</w:t>
</w:r>

Font is bold, size is 40 except 2 equals 20 what is the number of fonts? , font name "Arial"

<w:t xml:space= "preserve" > 2.0c</w:t>

The xml:space= "preserve" in the literal sense is to keep the spaces.

If you do not have this content, the text will be ignored by word before and after space.

If we need to specify the alignment of a segment, what does line spacing do?

This will set the properties of <w:p>. Similar to this:

<w:p>
<w:pPr>
<W:JC w:val= "right"/>
<w:spacing w:line= "" w:linerule= "Auto"/>
</w:pPr>

...

</w:p>

Alignment direction: <W:JC w:val= "Right"/> Here is the alignment.

Line spacing: <w:spacing w:line= "x" w:linerule= "Auto"/> 600 is a multiple of line spacing multiplied by 240, if twice is the line spacing, 480. Here should be 2.5 times times the line spacing.

Thus, it is relatively simple to assemble a file in a wordxml format.

Include segment properties in <w:pPr></w:pPr>

Include text formatting in <w:rPr></w:rPr>

The PR here is the meaning of the property, indicating that the block is in the format of R (run) or P (paragraph).

is a wordxml file over? This can be said, but if you double-click the XML file you just created, there is a big machine that will not be opened by word.

What is this for?

We also need to place a statement in the appropriate place:

<?xml version="1.0"?>
    <?mso-application progid="Word.Document"?>
    <w:wordDocument 

The corresponding handler used to indicate the XML file, corresponding to the key value in the registry:

Hkey_local_machine\software\microsoft\office\11.0\common\filter\text/xml

However, after you add this statement, when you double-click Open, Word will prompt that the XML is not formatted correctly, although it can be opened. That's because there's a lot of content that's not stated. Let's not add this statement first.

Page Setup

下面内容设置了页的宽,高,和页的各边距。各项的值均是英寸乘1440得出:

<w:body>…
<w:sectPr>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
</w:sectPr>

</w:body>

The following content sets the page header footer:

W:SECTPR wsp:rsidr= "002C452C" >
<w:hdr w:type= "Odd" >
<w:p>
<w:pPr>
<w:pstyle w:val= "Header"/>
</w:pPr>
<w:r>
<w:t>my header</w:t>
</w:r>
</w:p>
</w:hdr>
<w:ftr w:type= "Odd" >
<w:p>
<w:pPr>
<w:pstyle w:val= "Footer"/>
</w:pPr>
<w:r>
<w:t>my footer</w:t>
</w:r>
</w:p>
</w:ftr>

</w:sectPr>
</w:body>

These two paragraphs are very straightforward, there is no need to explain.

Document settings

</w:body>

<w:docPr>
<w:view w:val= "print"/><w:zoom w:percent= "/>"
</w:docPr>

</w:wordDocument>

DOCPR, is the meaning of the document property.

The view that represents the document is "print", and the view scale is 100%

Complete XML File Instance

<?xml version= "1.0" encoding= "UTF-8" standalone= "yes",
<?mso-application progid= "Word.Document"?
<w:worddocument xmlns:aml= "Http://schemas.microsoft.com/aml/2001/core"
xmlns:dt= "UUID: c2f41010-65b3-11d1-a29f-00aa00c14882 "
xmlns:o=" Urn:schemas-microsoft-com:office:office "
xmlns:v=" urn: SCHEMAS-MICROSOFT-COM:VML "
xmlns:w10=" Urn:schemas-microsoft-com:office:word "
xmlns:w="/http/ Schemas.microsoft.com/office/word/2003/wordml "
xmlns:wx=" http://schemas.microsoft.com/office/word/2003/ Auxhint "
xmlns:wsp=" http://schemas.microsoft.com/office/word/2003/wordml/sp2 "
xmlns:sl="/http/ Schemas.microsoft.com/schemalibrary/2003/core "
w:macrospresent=" no "w:embeddedobjpresent=" no "w:ocxPresent=" No "
xml:space=" preserve ";

<w:body>
<w:p>
<w:ppr>
<W:JC w:val= "left"/>
<w:spacing  w:line= " "w:linerule=" Auto "/>
</w:ppr>
<w:r>
<w:rpr> 
<w:sz w:val=" "/> <w:szcs w:val= "/>",
<w:rFonts   w:ascii= "Arial" w:eastasia= "Arial" w:hansi= "Arial"/>
</w:rpr> 
<w:t>niu don ' t like Red or blue! It seems that </w:t>
</w:r>
<w:r>
<w:rpr> 
<w:sz w:val= "$"/><w: SzCs w:val= "/>",
<w:rFonts   w:ascii= "Arial" w:eastasia= "Arial" w:hansi= "Arial"/>
</w : rpr> 
<w:t>hello world!</w:t>
</w:r>
</w:p>

<W:SECTPR wsp:rsidr= "002C452C" >
<w:pgsz w:w= "12240" w:h= "15840"/>
<w:pgmar w:top= "1526.4" w:right= "3254.4" w:bottom= "2966.4" w:left= "1670.4" w:header= "720" w:footer= "720" w:gutter= "0"/>
<w:hdr w:type= "Odd" >
<w:p>
<w:pPr>
<w:pstyle w:val= "Header"/>
</w:pPr>
<w:r>
<w:t>Header</w:t>
</w:r>
</w:p>
</w:hdr>
<w:ftr w:type= "Odd" >
<w:p>
<w:pPr>
<w:pstyle w:val= "Footer"/>
</w:pPr>
<w:r>
<w:t>Footer</w:t>
</w:r>
</w:p>
</w:ftr>
</w:sectPr>
</w:body>

<w:docPr>
<w:view w:val= "print"/><w:zoom w:percent= "/>"
</w:docPr>
</w:wordDocument>

In this way, a basic wordxml is created and, of course, an application-level Word document is definitely not only these content, but more detailed content needs to be referenced in the MS Office SDK.

Study: Preliminary analysis of Wordxml format

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.