Course description
This course describes how to use C # for XML development. This section describes how to use the system. xml namespace to read, save, and process XML documents.
XML Introduction Basic XML specifications
XML full name is a scalable markup language and is a text-based data storage format stipulated by W3C International Standards Organization. It is derived from ibm sgml technology, HTML is also derived from SGML. The content of SGML is very complex, while XML uses the 20% Syntax of SGML to implement the 80% function of SGML.
From the perspective of software developers, XML syntax mainly includes
- XML is an international standard. Most software vendors, development tools and programming languages support the same basic XML specification. XML documents can be used on any development platform, which is the biggest advantage of XML technology. Java, a non-international standard technology, is not easy to use across platforms.
- XML is based on plain text. XML documents cannot contain binary data. In addition, the text encoding format is involved when files are stored.
- The XML file has a hierarchical structure. A pair of angle brackets are used to define an XML element. An XML element can contain several attributes, and the XML element can contain several sub-XML nodes.
- An XML document can only define one root element, and cannot be defined.
- XML elements cannot be defined improperly. For example, "<A> <B> </a> </B>" is an incorrect XML document.
- The XML format is designed for the exchange of data between systems. The design process takes into account temporary storage and exchange of convenient data, rather than the long-term storage of data. Therefore, XML documents are redundant, because the file size is large, it is not suitable for storing large data volumes and low network transmission efficiency. This issue needs to be noticed in software development.
Xpath
XPath is a standard proposed by W3C International Standards Organization to quickly search and locate XML nodes in an XML document. It will be detailed in the next course.
XSLT
XSLT is also the XML document conversion standard proposed by W3C International Standards Organization Based on XML standards. It is a very important XML application and cross-platform, supported by many software vendors. The next course will introduce XSLT in detail.
W3C
What is the W3C International Standards Organization?
W3C is an international organization that most software companies work together to develop some important software industry standards. Its members include software giants such as Microsoft, IBM, and sun. It has developed and maintained important standards in the software industry, such as HTML, XHTML, HTTP, XML, VML, XPath, and XForm. Most software vendors support W3C standards, the standards it sets are truly cross-platform universal across the world. Therefore, it has a huge impact on the global software industry, especially the Web software industry. Its website is http://www.w3c.org/. on the website, you can see the hundreds of standards defined by Alibaba. If you want to develop a web application system with an international level, you should study some W3C standards.
Significance of international standards
The significance of international standards is described here.
The so-called international standards are an authoritative non-profit international organization. They stand neutral and do not represent a specific company, but represent the entire industry. It introduces some specifications and standards for a commonly used technology. Specific software vendors consciously abide by this set of international standards when using this technology. This facilitates data exchange between systems, ensures integration of heterogeneous systems, and maintains long-term stability and compatibility of data structures. Such international organizations include ISO, ECMA and W3C.
Some of the technologies we use have become international standards, such as SQL, JavaScript, C #, HTML, XML, XSLT, HTTP, and many other technologies.
International standards have some characteristics. First, they are stable and consistent. Once the international standards are officially released, they maintain a considerable degree of stability. They can only be added with caution and cannot be deleted, international standards organizations will not easily modify officially released international standards, and will fully take into account various factors in the standard modification to ensure compatibility between the above and below, this maximizes the industry's investment in old standards. Moreover, these international organizations sometimes propose standard revision plans in advance when releasing international standards.
Second, international standards are observed by the global industry. Although there is no mandatory mechanism, most software vendors will comply with or strive to comply with these international standards. In addition, there are many major software vendors in the International Standards Organization. For example, W3C members include Microsoft, IBM, sun, and other large companies. Therefore, international standards represent the fundamental interests of the majority of the software industry and represent the most advanced software productivity.
For application software developers, making full use of international standards can greatly protect customers' investment in IT systems. Due to the stability and consistency of international standards, if the customer's IT systems fully use these international standards, they will be well compatible when upgraded to the new standard. IT systems do not need to be re-launched, which can protect customers' investment in existing systems.
As software developers, we should also understand these international standards. First, we can easily integrate heterogeneous systems and achieve better system compatibility and maintainability. In addition, software developers are switching development platforms, such as migrating from Java. on the Net platform, investment that has previously learned international standards will be protected, and transplantation and translation of source code that complies with the same standards is also low-cost.
Support for XML by the dot. NET Framework
The. NET Framework provides powerful support for XML, and the. NET Framework itself uses XML format to store various configuration information. For example, the web. config file.
In the. NET class library, the namespace system. xml contains a large number of types of operating XML documents. These types constitute the processing model of two XML documents.
Stream Processing Model
In the stream processing model, we make the XML document a data stream for processing, and we will process the data in the XML document one by one. In this model, we can read a large volume of XML documents in a read-only manner quickly, with less memory usage and good program performance. The system. xml. xmlreader type provides a stream processing model, which allows you to quickly read XML documents.
The stream processing model has some disadvantages. First, it can only read XML documents and cannot modify XML documents. Second, it is inconvenient to retrieve XML documents and cannot use XPath technology; the programming interface is relatively simple, and it is not convenient to process XML documents. The stream processing model can be used when the program needs to easily read data from XML documents.
Dom Processing Model
In the DOM processing model, we first use the idea of the Document Object Model to parse the entire XML document and generate an object tree in the memory to express the XML document. For example, you can use an xmlelement object to film an element in an XML document and use an xmlattribute object to film an attribute in an XML document. In this way, the objects in the memory can be programmed to manipulate XML documents.
Using Dom to process XML documents has considerable advantages. First, it is easy to process. We can use various programming techniques to process the tree structure of XML documents objects, for example, you can recursively traverse part or all of XML documents, insert, modify, or delete XML elements into the tree structure, and set the attributes of XML elements.
In the DOM mode, we can use the XPath technology to quickly search and locate XML documents in the tree structure, which makes it easier to process XML documents.
In C #, we can easily process XML documents using Dom. We first instantiate a system. XML. xmldocument type. Call the load method to load the XML document and generate the tree structure of the XML Node object. Then we can traverse this tree and add and modify and delete nodes, in addition, any node can use selectnodes or selectsinglenode to quickly find other nodes through the relative path of xpath.
In the namespace system. XML, most types are used to support DOM processing models. Many types are combined to form xmldom, which is a typical application of Document Object Model. Document Object Model is a relatively advanced software design model. I will introduce the Document Object Model in detail in future courses.
The supported Dom types in the system. xml namespace mainly include:
Xmlnode is the basic type of all types in the DOM structure. It defines the common attributes and methods of all XML nodes and is the basis of xmldom. It has a childnodes attribute, indicating the child XML nodes it contains.
Xmlattribute indicates the XML Attribute, Which is saved only in the attributes list of xmlelement.
Xmldocument indicates the XML document itself, which is a top-level object in the xmldom model. It is used to control the XML document as a whole and is the only entry for other programs to access the XML Document Object Tree.
Based on the xmlnode, xmlworker node implements the method of accessing the same-level nodes before and after the xmlnode.
Xmlelement indicates an XML element. Is the most commonly used object type in xmldom. It has the attributes attribute and can process its attributes. You can use the childnodes attribute to obtain all its subnodes. And provides some methods to add and delete subnodes.
Xmlcharacterdata represents the basic types of character data in XML documents. Character text data is plain text data distributed between each xmlelement. Text Data in xmlattribute does not belong to XML text blocks.
Xmlcdatasection indicates the CDATA section in the XML document. The CDATA uses "<! [CDATA []> "contains plain text data. Because XML is marked with Angle brackets, it has escape characters similar to HTML. If special characters such as angle brackets are encountered in plain text segments of XML, escape characters are required, when a text segment contains a large number of such special characters, it is difficult to manually write and view the XML document. To improve the readability of the XML document, you can use the CDATA section here. In the CDATA section, all characters, including special characters, do not need to be escaped, so it is easier to view and modify XML documents.
Xmlcomment indicates a piece of comment. xml comment is the same as HTML comment. Use a pair of "<! -->.
Xmltext indicates a piece of plain text data.
Xmlwhitespace indicates a piece of text in an XML document that is purely composed of white spaces. white spaces include spaces, tabs, line breaks, and carriage returns. full-width spaces are not white spaces. Xmldocument will process blank characters when parsing XML documents. When the preservewhiitespace attribute of the xmldocument object is true, the xmlwhitespace object is generated for the pure blank text block in the XML document, if this attribute is false, the plain white text is ignored and the xmlwhitespace object is not generated, as if such blank text blocks do not exist in the original XML document.
Other processing models
In addition to the stream processing model and Dom processing model, there are some alternative processing models that are rarely used. Here is a brief introduction.
Dbdom
Dbdom is a database-based XML document processing model. It is an open-source project. It uses a large number of stored procedures and database operations to save XML elements and XML attributes to database fields. Use relational databases to simulate the tree structure of XML. I don't know much about this model, but I only know the general principle.
Binaryxml
Dom processing of XML documents requires a large amount of memory. When processing large XML documents, Dom processing will greatly affect the performance of the application system. For this reason, someone began to propose a binaryxml processing model. In this model, the XML document is loaded into the memory as binary data, and then the document is parsed, a large number of pointers are used to point to the key position in the XML document, through which you can quickly locate the XML document, attackers can modify XML documents and provide dom-like programming interfaces. This method can greatly save memory, and the memory consumed is only slightly larger than the XML file size. However, I do not know the actual running effect.
Significance of XML for Web Development
XML technology is of great significance for web development. To develop high-level Web systems, use XML technology.
XML and HTML
Both XML and HTML are sourced from SGML and share the same source. Both of them use the markup language of angle brackets, which has great similarity. XML can be used to simulate HTML, and W3C proposes that the XHTML standard used by modern web sites is the combination of XML and HTML.
In the development of Web Systems Using ASP. NET, in addition to displaying data using ASP. NET controls, a large amount of HTML code needs to be pieced together by a program to display data. Simply concatenating strings to generate HTML pages is not a sustainable process of software development and maintenance. The program code is easily disorganized and the generated HTML code is not readable. If the XML technology is used for reference in the process of generating HTML code, this problem can be improved to better control the Web software development process and improve the software quality.
XML and WebService
WebService is based on XML. The principle of WebService is to serialize a programming object into an XML document and pass it to the client through the HTTP protocol. The client accepts this XML document and reconstructs the programming object through deserialization. Therefore, WebService is based on XML serialization technology. Developing and debugging a WebService that is a little complicated requires a certain XML technical foundation.
The bottom layer of Ajax technology uses XML to transmit data, which can be considered as a special WebService. WebService is a public method of the web system, while Ajax is a private method.
XML/XSLT provides a new Development Mode
The combination of XML and XSLT technologies provides a completely new web system development mode. In this mode, the page generates an XML document from the pure data organization that needs to be displayed with the XSL conversion information header, and then sends the XML document to the client to be parsed In the IE browser of the client, download the XSLT document based on the XSL conversion header, execute the XSLT conversion, and then display the conversion result. In this case, the web page can display data in the specified format normally, and is itself a WebService that can be called by other programs. The source code output from this page is an XML document, and only browser software such as IE processes the XSLT conversion information header. Other programs will ignore this information. At this time, the page has a dual function to facilitate integrated code development and maintenance.
The next course of XSLT will be described in detail.
Use C # to output XML
Next we will use C # For actual XML development. Because XML technology is particularly useful for web development, ASP. NET will be used to demonstrate how to use C # for XML development. The demo program has been written, and the program code is described in detail.
This program is an ASP. net program. After obtaining the program code, you need to set the virtual directory in IIS. Because the program also needs to access some files under the program directory, you also need to configure some permissions. Demomdb. MDB in the program directory is the database file used by the program.
After the website is configured, enter the website address in the browser to open its default page. You can see a simple description of some program content on the hosts page.
First, let's look at the recordxml. ASPX page. Let's look at the HTML page code of recordxml. aspx. We can see that the HTML code of this page is very simple and there is only one line. Therefore, all content on the page is generated using C # code.
Switch to the C # code on this page. You can see that the code output page content is added to the page_load function. The Code content is
// Xmltextwriter is used here to quickly output XML document content. XML document object structure is not constructed.
This. response. contentencoding = system. Text. encoding. getencoding (936 );
This. response. contenttype = "text/XML ";
// Connect to the database
Using (system. Data. oledb. oledbconnection conn = new system. Data. oledb. oledbconnection ())
{
Conn. connectionstring = @ "provider = Microsoft. Jet. oledb.4.0; Data Source ="
+ This. server. mappath ("demomdb. mdb ");
Conn. open ();
// Query the database
Using (system. Data. oledb. oledbcommand cmd = conn. createcommand ())
{
Cmd. commandtext = "select * from MERs ";
System. Data. oledb. oledbdatareader reader = cmd. executereader ();
// Obtain all field names
Int fieldcount = reader. fieldcount;
String [] fieldnames = new string [fieldcount];
For (INT icount = 0; icount <fieldcount; icount ++)
{
Fieldnames [icount] = reader. getname (icount );
}
// Generate an XML document writer
System. xml. xmltextwriter xmlwriter = new system. xml. xmltextwriter (this. response. output );
Xmlwriter. indentation = 3;
Xmlwriter. indentchar = '';
Xmlwriter. Formatting = system. xml. Formatting. indented;
// Start to output the XML document
Xmlwriter. writestartdocument ();
// Output the XSLT style table information Header
String strxslref = This. Request. querystring ["XSL"];
If (strpolicref! = NULL & str1_ref. length> 0)
{
Xmlwriter. writeprocessinginstruction (
"XML-stylesheet ",
"Type = 'text/XSL 'href = '" + str1_ref + "'");
}
Xmlwriter. writestartelement ("table ");
While (reader. Read ())
{
// Output a record
Xmlwriter. writestartelement ("record ");
For (INT icount = 0; icount <fieldcount; icount ++)
{
// Output a field value
Xmlwriter. writestartelement (fieldnames [icount]);
Object v = reader. getvalue (icount );
If (V = NULL | dbnull. value. Equals (v ))
{
Xmlwriter. writeattributestring ("null", "1 ");
}
Else
{
Xmlwriter. writestring (convert. tostring (V ));
}
Xmlwriter. writeendelement ();
}
Xmlwriter. writeendelement ();
} // While (reader. Read ())
Reader. Close ();
Xmlwriter. writeendelement ();
Xmlwriter. writeenddocument ();
Xmlwriter. Close ();
} // Using (system. Data. oledb. oledbcommand cmd = conn. createcommand ())
} // Using (system. Data. oledb. oledbconnection conn = new system. Data. oledb. oledbconnection ())
The execution process of the page_load function is
Set HTTP output type
First, set the HTTP output type. The output encoding format is gb2312. getencoding (936) is used to obtain the gb2312 encoding format.
We also set contenttype to set the output format of the document. If you know some HTTP transmission protocols, the contenttype attribute describes the document output type. when the document is passed to the client, the client browser obtains the contenttype attribute value, query the com information registered in the Registry and windows, obtain the file type determined by the property value, and then display the document in the corresponding mode. For example, if the contenttype attribute is set to application/vnd. MS-Excel, the client browser queries the Registry to find the corresponding file type information in the registered project hkey_class_root \. in xls, the local file type is "Excel. sheet.8 ", and then call the Excel COM component based on other information to display the obtained HTTP document.
We can understand from the description of the contenttype attribute and have a deep understanding of web development. Sometimes we need to know some windows programming knowledge, because the client of the B/S system, that is, various browsers, in particular, ie is a complicated windows program.
Query a database
After setting the HTTP document output mode, we start to output the XML document content. First, we connect to the database and use demomdb in the program directory. mdb database, execute an SQL query to obtain a data reader.
Use xmltextwriter to output XML documents
After the data is queried, We can traverse the database records obtained from the query and start to output the XML document. Here we use xmltextwriter to output the document.
There are two ways to output XML documents. One is to use xmltextwriter for output, the other is to construct the XML document object structure starting from the xmldocument type, and then output the XML document using the Save method of xmldocument. The two methods have their own characteristics.
Xmltextwriter is a forward-only quick output XML document, and cannot access the output XML document content or modify the generated XML document. This method is fast, occupies less memory, but is not flexible enough.
The xmldocument type can be used to construct the XML document structure and then output the XML document. You can access and modify the XML document that has been output at any time. This method is slow, occupies a large amount of memory, but is flexible.
Here we will try to use xmltextwriter to output the XML document, and use xmldocument to output the XML document on another page number.
First, create an xmltextwriter object in the page output stream and set it to enable indentation. Its indentation, indentchar, and formating control the indent style. For details, see msdn. The XML document indent aims to improve the readability of the XML document. The indented XML document is easy for people to read and modify directly. However, for applications, there is no difference in whether the XML document is indented.
Xmltextwriter is a packaging of output XML documents based on other streams. It cannot open files. Therefore, it must specify the underlying output object when initializing xmltextwriter, the output object can be a stream or text writer. In theory, we can use String concatenation to generate XML documents. However, in actual development, it is not wise to use strings to piece together XML documents. We recommend that you use xmltextwriter.
In web development, we sometimes use String concatenation to generate HTML documents. Because HTML documents do not have strict syntax restrictions, ie can explain inferior HTML code, therefore, sometimes developers may piece together HTML documents with such strings, but this may lead to messy code and poor readability. XML documents have strict syntax checks. If an XML syntax error occurs, the parsing of the entire XML document is incorrect. Therefore, we should use xmltextwriter, therefore, it can help us check the basic XML syntax and ensure that we can output qualified XML documents.
We call writestartdocument to start outputting XML documents. xmltextwriter provides many pairs for use. After using one method, we need to use another pair method. For example, if writestartdocument is paired with writeenddocument, writestartelement is paired with writeendelement, the pairing method must be called in pairs. Here we use writestartdocument to start writing XML documents. We must use writeenddocument to output XML documents. When xmltextwriter is used to output XML documents, writestartdocument must be the first method to be called.
Then, we use a page parameter named XSL to output the XML-stylesheet header of the XML document. For more information about XSL, skip this parameter in this section.
We call the writestartelement method to output the root node of the XML document. The parameter here is the string "table", indicating that the root node name of the output XML document is table.
Then we use the READ function of the database data reader to traverse all the queried data. For each record, we use the writestartelement method of the XML writer to output XML elements, the parameter here is the string "record", indicating that the output XML element is named record, and this node is added to the root node of the XML document.
For each record, we also traverse all its field values, and use writestartelement to add an XML Element for each field value. The element name is the name of each field. If the field value is null, use writeattributestring to output the XML Attribute named null. Otherwise, use writestring to output the string expression value of the field value.
Since writestartelement and writeendelement are paired, you must call writeendelement to end the output XML element after an XML element is output. After all the content is output, we call writeenddocument to end the output of the entire XML document.
Use xmldocument to output XML documents
The page record. aspx function is similar to recordxml. aspx. However, it uses xmldocument to construct the XML Document Object Structure and then outputs the XML document. The process is described.
Open the HTML code of record. aspx and you can see that the code is very simple. There is only one line, and all the page output is implemented in the program code. Open its C # code. You can see that the code execution page output is added to the page_load method. The code is
// The code dynamically constructs the xmldocument object to output the XML document.
System. xml. xmldocument xmldoc = new system. xml. xmldocument ();
Xmldoc. appendchild (xmldoc. createelement ("table "));
// Connect to the database
Using (system. Data. oledb. oledbconnection conn = new system. Data. oledb. oledbconnection ())
{
Conn. connectionstring = @ "provider = Microsoft. Jet. oledb.4.0; Data Source ="
+ This. server. mappath ("demomdb. mdb ");
Conn. open ();
// Query the database
Using (system. Data. oledb. oledbcommand cmd = conn. createcommand ())
{
Cmd. commandtext = "select * from MERs ";
System. Data. oledb. oledbdatareader reader = cmd. executereader ();
// Obtain all field names
Int fieldcount = reader. fieldcount;
String [] fieldnames = new string [fieldcount];
For (INT icount = 0; icount <fieldcount; icount ++)
{
Fieldnames [icount] = reader. getname (icount );
}
While (reader. Read ())
{
// Output a record
System. xml. xmlelement recordelement = xmldoc. createelement ("record ");
Xmldoc. documentelement. appendchild (recordelement );
For (INT icount = 0; icount <fieldcount; icount ++)
{
// Output a field value
System. xml. xmlelement fieldelement = xmldoc. createelement (fieldnames [icount]);
Recordelement. appendchild (fieldelement );
Object v = reader. getvalue (icount );
If (V = NULL | dbnull. value. Equals (v ))
{
Fieldelement. setattribute ("null", "1 ");
}
Else
{
Fieldelement. appendchild (xmldoc. createtextnode (convert. tostring (V )));
}
}
} // While (reader. Read ())
Reader. Close ();
} // Using (system. Data. oledb. oledbcommand cmd = conn. createcommand ())
} // Using (system. Data. oledb. oledbconnection conn = new system. Data. oledb. oledbconnection ())
String strxslref = This. Request. querystring ["XSL"];
If (strpolicref! = NULL & str1_ref. length> 0)
{
// Execute XSLT Conversion Based on the XSLT style table name specified by the page Parameters
Strxslref = This. server. mappath (strxslref );
System. xml. XSL. Pipeline transform = new system. xml. XSL. Pipeline transform ();
Transform. Load (strpolicref );
Transform. Transform (xmldoc, null, this. response. Output, null );
}
Else
{
// Directly output the generated XML document
This. response. Write (xmldoc. documentelement. outerxml );
}
The execution process of this method is described.
First, create an xmldocument object. xmldocument is an empty XML document with no content and no root element. Therefore, the first step is to add the root element using the appendchild method of the Document Object. Here we use the createelement function of the Document Object to create an xmlelement object named table.
All types of XML document objects, including elements, attributes, text blocks, annotations, and so on, cannot be directly instantiated. You can only use a series of functions starting with create in xmldocument to create object instances. The created XML document objects are discrete objects. They must be promptly added to the XML document object structure to become part of the XML document. Generally, the newly created XML Document Object is added to the specified object by using the XML document object or the appendchild method of the element object, so that the XML document structure is added to the family.
This processing mode is similar to adding a new data row to the datatable. Datarow itself cannot be directly instantiated. We first need to use the newrow of the able to create a new datarow, and then use the add method of the rows attribute of the datatable to add the newly created data row to the data table.
After initializing an XML document object, we connect to the database, query the database to obtain a data reader, traverse the database records obtained from the query, and output the XML document.
For each database record, create a recordelement object, add it to the root node of the XML document, traverse each field value of the database record, create a fieldelement object, and add it to the recordelement object, if the field value of the current database is null, call the setattribute method of fieldelement to set the attribute value of null to 1. Otherwise, add an XML text node to fieldelement.
After the XML document is generated, the content of the XML document is output to the page. If the XSLT conversion document name is specified in the page parameters, the XSLT conversion is executed and the conversion result is output. The next course of XSLT will be described in detail.
If the XSLT conversion information is not specified, the peripheral XML string of the XML document root node is output.
Each XML Document Object has the innerxml attribute and the outerxml attribute. Both attributes directly return the XML string indicating that the XML document fragment is not indented, but there is a difference between the two. The XML string returned by innerxml indicates all child nodes of the node. Outerxml returns the XML string of the node itself and all child nodes. For example, for the XML document "<A> <B/> 123 </a>", the innerxml of its root node is "<B/> 123 ", the outerxml of its root node is "<A> <B/> 123 </a> ". Note that this string is not indented. The XML document is saved directly to the file with the specified name with indentation.
On the IE browser, you can see that IE only displays the plain text content in the XML document, not the indent display when other XML documents are displayed. This is because contenttype is not set to XML in the aspx code, but uses the default HTML format. Therefore, ie accepts the document code on the page and uses it as HTML for parsing and display, because XML names such as table and record are not HTML tags, ie ignores these XML tags and only displays plain text content. However, you can see from the source code of the page that the content of this document is still in the standard XML format. The source code is not indented here.
Summary
In this course, we briefly introduce the basic syntax of XML, and describe the stream processing mode and Dom processing mode for processing XML documents. C # is also used to demonstrate the output XML document.
XML is not a simple technology, and many other technologies are derived from it. As a contemporary software developer, especially web developers, we should be familiar with and use XML technology and some Derived Technologies. Familiarity with XML technology helps developers maintain a considerable level of software development capability for a long time and is also an important basis for learning other advanced productivity. You should study the XML technology.