Use C # programming to operate the RTF document [reprint]

Source: Internet
Author: User

I am using C # to develop a text editor named xwriter, which needs to provide support for the RTF document. I have never engaged in the RTF document before, so I did some research on it temporarily, after several days of study and practice, I have some knowledge about the C # operating RTF document. Therefore, I can write this article to explain it. I hope it will be helpful for others to learn the RTF document format.

The RTF document format is a document format used by Microsoft to describe formatted text. It was proposed in the last century and has been used for many times.ProgramAll support this format. Microsoft's office software family, Windows WordPad software, and so on are supported. Moreover, the clipboard and OLE drag-and-drop operations of Windows operating systems also support the RTF documentation, in this way, different software can exchange formatted text with each other through the RTF format. For example, C # In vs. net2003 I used #CodeEditor, where a piece of code text is copied. paste the text in MS Word to produce text with high brightness. Therefore, the function of the RTF format is not small, and the RTF format is pure text format, not binary format, read and write are not difficult.

The RTF document format is similar to HTML, XML, and other Markup languages. The principles are not complex, but there are still a lot of content, which is available in Microsoft's msnd.ArticleDetailed introduction of the RTF format, the address is MS-help: // Ms. msdnqtr.2003feb. 2052/dnrtfspec/html/rtfspec.htm. You can use NotePad to open an RTF document. It can be found that it is also plain text data, and generally it is all ANSI characters, generally, RTF documents are stored in the asni character encoding format. Therefore, escape characters must be used to store Chinese characters and other characters with an encoding greater than 127. In the RTF document, a pair of curly braces "{}" are used to define a group. The group can be embedded and defined. "\" is used to define an instruction and escape character; in addition, it can contain plain text data. All commands and escape characters must be included in one group. One RTF document has only one root group, which is similar to the XML document's rule that only one root node is allowed.

We use Windows WordPad to create a new RTF document, enter only the "Hellow" text, set the text color to blue, save it, and use NotePad to open the saved RTF file, now you can see the content of the simplest RTF document. The content is as follows.

{

\ Rtf1 \ ANSI \ ansicpg936 \ deff0 \ deflang1033 \ deflangfe2052

{\ Fonttbl

{\ F0 \ fmodern \ fprq6 \ fcharset134 \ 'cb \ 'ce \ 'CC \ 'e5 ;}

}

{\ Colortbl; \ red0 \ green0 \ blue255 ;}

{\ * \ Generator msftedit 5.41.15.1507 ;}

\ Viewkind4 \ uc1 \ pard \ CF1 \ lang2052 \ F0 \ fs20 hellow \ CF0 \ par

}

The code is indented for ease of reading. In fact, the blank characters in the RTF document will affect the display result. Generally, do not add any extra blank characters when generating the RTF document.

The first and last lines of the RTF Code represent the curly braces in the root group, followed by commands starting with "\". The command names are all composed of English letters, if the command is followed by several numbers, these numbers are the command parameters. For example, "\ rtf1", the command name is "rtf", the parameter value is "1", and "\ ANSI" the command name is "ANSI", no parameter.

The command "\ rtf" is essential for every RTF document and is always the first command. Therefore, it can be seen as the file header mark of the RTF document. If the first instruction of an RTF document is not an "rtf" instruction, it may be deemed that this RTF document is illegal.

The Instruction "\ ansicpg" indicates the encoding format of the content in the RTF document. The parameter is the encoding format number. For example, "\ ansicpg936" indicates that the encoding format is the 936 character set, for C # programs, it is the library function system. text. encoding. the returned result of getencoding (936), that is, the gb2312 encoding format. The RTF document itself must be saved in the Standard ANSI format. The character encoding format specified here is used to process escape characters in the RTF document, for example, the Code contains the continuous escape character \ 'cb \ 'ce \ 'CC \ 'E5. When the program parses the RTF document, it should generate a byte array for this string of escape characters, the content is 0xcb, 0xce, 0xcc, 0xe5, and then use the getstring (byte []) function of the 936th encoded format object to restore the stored string, that is, the word ". This is more complicated than HTML Escape Character Processing. HTML Escape Character is a command to define a character, while in RTF, one command to define a byte, and Chinese character to double-byte character encoding, try to obtain the complete byte sequence before conversion.

The command "\ fonttbl" defines a list Of all fonts used in the document. The RTF Text Content references this font list to obtain the fonts used for displaying the document, this is similar to defining CSS styles in HTML documents. The fonttbl group contains several sub-groups. Each sub-group defines a font, and the first command of the font definition group is "\ f" with a parameter indicating the font number, for example, "\ F0" indicates that the font number is 0, and "\ F1" indicates that the font number is 1. The font definition group also defines other information about the font, the most important of which is the final font name. In this demonstration document, the font name is "\ 'cb \ 'ce \ 'CC \ 'e5;", after encoding, it is ";". Be careful that there is a semicolon next to it. Note that the font numbers may be discontinuous. For example, the font table code "{\ F0...} may exist ...} {\ F1 ...} {\ f99 ...} {\ f212 ...} ", so this should be taken into consideration when parsing the RTF font table.

The Instruction "\ colortbl" defines the document color table. The RTF document references color values in a uniform manner. The text color and background color settings of the document content reference the color table, only the RGB values of each color are defined in the RTF color table. No clear serial numbers are defined. The color is referenced from left to right, and the number of the color value is calculated from "1. A color value "\ red0 \ green0 \ blue255" is defined here, that is, pure blue.

The command "\ * \ generator" is the creator of the document. The command definition method is special here, And the prefix "\ * \" is used, in my personal understanding, an extended instruction is defined, which can be ignored by other RTF document processing programs.

The subsequent instructions start to describe the text of the RTF document. For example, "\ pard" begins to clear the current section settings, and the current section is set to the default format; "\ F0" indicates that the current font is set as the font of "0" in the font table; "\ fs20" indicates the font size. The font size here is "20 ", the Unit is half a vertex (msnd says this: font size in half-points (the default is 24); "\ CF1" indicates that the current text uses the first color, that is, the blue color (the number of the RTF color table starts from 1), and the plain text data "Hellow" is the plain text content of the RTF document.

Most of the English content can be directly output to the RTF document, but some special characters need to be escaped, such as "\", "{", "}", etc, you must add the escape prefix "\". Therefore, the actual output is "\", "\ {", "\}", which is similar to the Escape Character Processing in C language. For tabs, "\ tab" must be output. For characters with a code greater than 256, such as Chinese characters, the text content encoder must be used to encode and generate binary data, then, use the escape prefix "\ '" to escape the output Byte encoding. For example, "", its gb2312 encoding generates the byte sequence 0xcb, 0xce, 0xcc, 0xe5, the output result to the RTF document is "\ 'cb \ 'ce \ 'CC \ 'e5 ".

The image can be embedded in the RTF document. You can use the code "{\ Pict ...} ", the image group contains a hexadecimal encoded string of the binary data of the image. There are not many descriptions about the RTF image format in msdn. I am not clear about the format of some image data, therefore, there is not much to say about how to process RTF images.

For detailed instructions on various commands, refer to the relevant articles in msdn. The article address is "MS-help: // Ms. msdnqtr.2003feb. 2052/dnrtfspec/html/rtfspec_16.htm # rtfspec_21 ".

After learning about the RTF document format, you can start programming to operate the RTF document. It is nothing more than piecing strings in the RTF format. For example, my text editor has a function that can save the edited content to the RTF format. At this time, you need to generate the RTF document based on the content of my document.

The first step is to create an RTF document writer. Although the operation to generate an RTF document can be seen as piecing together the RTF string, in programming practice, it cannot be really so pieced together that you have to build a system. XML. xmlwriter is used as an RTF document writer. I have compiled an RTF document writer named rtfwriter, which implements the basic control of the RTF document format internally to ensure that the correct RTF document is output, it also provides convenient programming interfaces for other program modules to call. The complete C # code of this RTF document writer is as follows:

///// RTF document writer //////
/// This writer provides basic support for generating the RTF document /// compile yuan Yongfu http://www.xdesigner.cn
/// Public class rtfwriter: system. idisposable {
# Region test code ************************************ ******************
[System. stathread] Static void main () {testwritefile ();
Testclipboard () ;}///// test to generate an RTF File
/// After executing this function, you can use MS word to open the file c: \ A. rtf ///
Internal static void testwritefile (){
Rtfwriter W = new rtfwriter ("C: \ A. rtf"); testbuildrtf (w );
W. Close ();
System. Windows. Forms. MessageBox. Show ("Okay, you can open the file c: \ A. rtf .");}
///// Test the generated RTF document and set it to the system clipboard.
/// After executing this function, you can use the paste operation in the MS word to display the document generated by the program ///
Internal static void testclipboard (){
System. Io. stringwriter mystr = new system. Io. stringwriter ();
Rtfwriter W = new rtfwriter (mystr); testbuildrtf (w );
W. Close (); system. Windows. Forms. dataobject Data = New system. Windows. Forms. dataobject ();
Data. setdata (system. Windows. Forms. dataformats. rtf, mystr. tostring ());
System. Windows. Forms. clipboard. setdataobject (data, true );
System. Windows. Forms. MessageBox. Show ("okay, you can paste the text in MS Word .");}
///// Test and generate the RTF document writer //// the RTF document writer
Private Static void testbuildrtf (rtfwriter W ){
W. Encoding = system. Text. encoding. getencoding (936); // output file header
W. writestartgroup (); W. writekeyword ("rtf1 ");
W. writekeyword ("ANSI ");
W. writekeyword ("ansicpg" + W. encoding. codePage); // output font table
W. writestartgroup (); W. writekeyword ("fonttbl ");
W. writestartgroup (); W. writekeyword ("F0"); W. writetext (" ;");
W. writeendgroup (); W. writestartgroup (); W. writekeyword ("f1 ");
W. writetext (";"); W. writeendgroup (); W. writeendgroup ();
// Output Color Table W. writestartgroup (); W. writekeyword ("colortbl ");
W. writetext (";"); W. writekeyword ("red0 ");
W. writekeyword ("green0"); W. writekeyword ("blue255 ");
W. writetext (";"); W. writeendgroup (); // output body
W. writekeyword ("QC"); // set the center alignment W. writekeyword ("F0"); // set the font
W. writekeyword ("fs30"); // font size W. writetext ("This is the first text ");
W. writekeyword ("CF1"); // set the color W. writetext (" ");
W. writekeyword ("CF0"); // set the default color to W. writekeyword ("f1"); // set the font.
W. writetext ("center alignment abc12345"); W. writekeyword ("par"); // start a new paragraph
W. writekeyword ("Pard"); // clear the center alignment W. writekeyword ("f1"); // set the font
W. writekeyword ("fs20"); // font size W. writekeyword ("CF1 ");
W. writetext ("this is the second text left-aligned abc12345"); // end the output
W. writeendgroup () ;}# endregion // initialization object
///// Text writer public rtfwriter (system. Io. textwriter W)
{Mywriter = W;} // initialization object ///
/// File name public rtfwriter (string strfilename ){
Mywriter = new system. Io. streamwriter (strfilename, false,
System. Text. encoding. ASCII );}
Private system. Text. Encoding myencoding = system. Text. encoding. getencoding (936 );
//// Character encoding format ///
Public System. Text. Encoding encoding {
Get {return myencoding ;} Set {Myencoding = value ;}
} // The built-in text writer ///
Private system. Io. textwriter mywriter = NULL;
Private bool bolindent = false; // whether to use indentation ///
///// The RTF document cannot be indented at will. This option is only used to generate an easy-to-read RTF document for program debugging,
/// You can set this attribute to true during development and debugging so that developers can directly view the generated RTF document.
/// Set this attribute to false during the program. // public bool indent {
Get {return bolindent;} set {bolindent = value ;}}
Private string strindentstring = ""; // indent the string
/// Public String indentstring {
Get {return strindentstring;} set {strindentstring = value ;}
} // Current indent level ///
Private int intgrouplevel = 0; // close the object ///
Public void close () {If (this. intgrouplevel> 0)
Throw new system. Exception ("A group has not been written"); If (mywriter! = NULL ){
Mywriter. Close (); mywriter = NULL ;}}///
/// Output a group ///// keyword
Public void writegroup (string keyword ){
This. writestartgroup (); this. writekeyword (keyword );
This. writeendgroup ();} // start the output group ///
Public void writestartgroup () {If (bolindent ){
Innerwritenewline (); mywriter. Write ("{");} else
Mywriter. Write ("{"); intgrouplevel ++ ;}///
/// End output group // public void writeendgroup (){
Intgrouplevel --; If (intgrouplevel <0)
Throw new system. Exception ("group mismatch"); If (bolindent ){
Innerwritenewline (); innerwrite ("}");} else
Innerwrite ("}");} // output the original text ///
/// Text value: Public void writeraw (string txt ){
If (txt! = NULL & TXT. length> 0 ){
Innerwrite (txt) ;}//// output keyword ///
/// Keyword value public void writekeyword (string keyword ){
Writekeyword (keyword, false);} // output keyword
///// Keyword value /// whether it is an extended keyword
Public void writekeyword (string keyword, bool ext ){
If (keyword = NULL | keyword. Length = 0)
Throw new system. argumentnullexception ("the value cannot be blank ");
If (bolindent = false & (keyword = "par" | keyword = "Pard "))
{// Blank lines can be output before par or pard, without affecting the display of the RTF document
Innerwrite (system. environment. newline );}
If (this. bolindent ){
If (keyword = "par" | keyword = "Pard "){
This. innerwritenewline () ;}} if (EXT)
Innerwrite ("\ * \"); else innerwrite ("\\");
Innerwrite (keyword);} // content text encoding format ///
Private system. Text. Encoding Unicode = system. Text. encoding. Unicode;
///// Output plain text ///// Text Value
Public void writetext (string text ){
If (text = NULL | text. Length = 0) return;
Innerwrite ('');
For (INT icount = 0; icount <text. length; icount ++ ){
Char c = text [icount]; If (C = '\ t '){
This. writekeyword ("tab"); innerwrite ('');}
Else if (C <256) {If (C> 32 & C <127)
{// If a special character is displayed, it must be escaped by a slash if (C = '\' | C = '{' | C = '}')
Innerwrite ('\'); innerwrite (c);} else {
Innerwrite ("\ '"); writebyte (byte) C );}}
Else {byte [] BS = myencoding. getbytes (C. tostring ());
For (INT icount2 = 0; icount2 <BS. length; icount2 ++ ){
Innerwrite ("\ '"); writebyte (BS [icount2]);}
} // For (INT icount = 0; icount <text. length; icount ++ )}
///// Current position // Private int intposition = 0;
///// The position of the current row // Private int intlinehead = 0;
//// Hexadecimal character group ///
Private conststring hexs = "0123456789 abcdef ";///
/// Output byte array ///// byte array
Public void writebytes (byte [] BS ){
If (BS = NULL | BS. Length = 0) return; writeraw ("");
For (INT icount = 0; icount <BS. length; icount ++ ){
If (icount % 32) = 0 ){
This. writeraw (system. environment. newline); this. writeindent ();
} Else if (icount % 8) = 0) {This. writeraw ("");
} Byte B = BS [icount];
Int H = (B & 0xf0)> 4; int L = B & 0xf;
Mywriter. Write (hexs [H]); mywriter. Write (hexs [l]);
Intposition + = 2 ;}///// output a byte of data
///// Public void writebyte (byte B ){
Int H = (B & 0xf0)> 4; int L = B & 0xf;
Mywriter. Write (hexs [H]); mywriter. Write (hexs [l]);
Intposition + = 2; // fixindent ();}
# Inside region ************************************ ******************
Private void innerwrite (char c) {intposition ++;
Mywriter. Write (c);} private void innerwrite (string txt)
{Intposition + = TXT. length; mywriter. Write (txt );}
Private void fixindent () {If (this. bolindent ){
If (intposition-intlinehead> 100) innerwritenewline ();
} Private void innerwritenewline (){
If (this. bolindent) {If (intposition> 0 ){
Innerwrite (system. environment. newline );
Intlinehead = intposition; writeindent ();}}}
Private void writeindent () {If (bolindent ){
For (INT icount = 0; icount <intgrouplevel; icount ++ ){
Innerwrite (this. strindentstring );}}}
# Endregion // destroy the object ///
Public void dispose () {This. Close ();}}

 

You can use vs. Net to create a C # project, delete the automatically generated main () function, and copy and paste the code to compile and run the program.

On the basis of rtfwriter, You can construct your own RTF application. For example, you can export the database data to the RTF document and transmit data to other programs in the RTF format. The xwriter text editor that I am developing also uses rtfwriter to save the edited documents as the RTF format. In fact, this article is completely edited using xwriter and then exported as the HTML format, the MS Word, FrontPage, and other document editors are not used. The code in this article is in. net C # directly copy and paste it in the code editor.

This article only provides some simple instructions for the operation of the RTF document. For details, refer to the instructions on the RTF in msdn. The resources on the network are even more powerful. The RTF document format is simple, but has a lot of content. It is a very old technology, but it has been widely used until now, and it is estimated that it can be used for a long period of time. In fact, when we are learning new technologies that are constantly emerging, we can also pay attention to those old but time-tested technologies.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.