What is XML schema?
Like a DTD, XML schema defines and describes the structure and content pattern of an XML document. It defines the relationships between elements in an XML document and the Data Types of elements and attributes.
XML schema is an XML document, which conforms to the XML syntax structure. You can use a common XML parser to parse it.
Why use schema?
We have already used DTD to define the structure and data type of an XML file. Why do we need schema?
DTD has many defects:
1) DTD is based on regular expressions and has limited descriptive capabilities;
2) dtds do not support data types and are insufficient in most application environments;
(3) Insufficient limits on the definition of DTD constraints, making it impossible to make more detailed Semantic Restrictions on XML instance documents;
4) The structure of the DTD is not structured enough, and the reuse cost is relatively high;
5) DTD does not use XML as a description method, but does not have a standard programming interface for DTD construction and access. It cannot be maintained using standard programming methods.
XML schema is designed to address the shortcomings of these DTD. The advantages of XML Schema:
1) XML schema is based on XML and there is no special syntax
2) XML can be parsed and processed like other XML files
3) XML Schema supports a series of data types (INT, float, Boolean, date, etc)
4) XML Schema provides an extensible data model.
5) XML Schema supports a comprehensive namespace
6) XML Schema supports attribute groups.
A simple XML schema document
This schema defines an element: quantity, which is of the nonnegativeinteger type and xmlns is the schema namespace, which has been described in the previous section 3rd.
The following XML snippets are valid:
The following XML snippets are invalid:
Type in Schema
Schema contains three main components: element, attribute, and notation ).
These three basic components can also be combined into the following components:
A) type definition component: simple type and composite type
B) components
C) Attribute Group parts
Simple Type
XML schema defines some built-in data types that can be used to describe the content and attribute values of elements.
If an element contains only numbers, strings, or other data, but does not include child elements, this is called a simple type.
As in the figure, the element quantity is a simple type. Its element content must be a non-negative integer, excluding any attributes and child elements.
<Quantity> some </quantity> |
All built-in simple types
Original Type
String, Boolean, decimal, float, double, duration Datetime, time, date, gyearmonth, gyear, gmonthday, Dday, gmonth, hexbinary, base64binary, any Uri, QNAME Notation
|
Derivative type (base type in parentheses)
Normalizedstring (string), language (Tonken), token (normalizedstring) Nmtoken (token), name (token), ncname (name), ID (ncname), idref (ncname) Idrefs (list of idref), Entity (ncname), entities (list of entity) INTEGER (decimal), nonpositiveinteger (integer ), Negativeinteger (nopositiveinteger), long (integer), INT (long ), Short (INT), byte (short), nonnegativeinteger (integer) Unsignedlong (nonnegativeinteger), unsignedint (unsignedlong ), Unsignedshort (unsignedint), unsignedbyte (unsignedshort ), Positiveinteger (nonnegativeinteger)
|
Create simple type
In the figure, we first create a simple type: quantitytype, which is inherited from integer. minequalsive and maxcompusive define its minimum value 2 and maximum value 5. Finally, we define the quantity type as quantitytype.
Correct: <quantity> 3 </quantity> Error: <quantity> 10 </quantity> <Qauntity> AAA </quantity>
|
With restriction, we can limit that only a certain value or certain text can be accepted,
Basic aspects: equal, ordered, bounded, cardinality, numeric Restriction: length, minlength, maxlength Pattern, enumeration Whitespace Maxcompute Sive, maxexclusive, minexclusive, and minexclusive Totaldigits, fractiondigits
|
Example 1 of a simple type
The value of this SKU type: the three numbers are followed by a font size and followed by two uppercase letters.
Pattern is followed by a regular expression. For more information about the regular expression syntax, see other books.
Correct: <oursku> 123-AB </oursku> Error: <oursku> ABC-AB </oursku> <Oursku> 123-AB </oursku>
|
Example 2 of a simple type
This is a usstate used to describe the name of a U.S. state. enumeration is used to list all State names. When the value is set, only the State names listed in it can be used.
<! -- And so on...-> This is a comment statement.
Correct: <statename> AK </statename> Error: <statename> Alaska </statename>
|
List type
List can be used to define the list type. listofinttype is defined as an integer list. The value of the element listofmyint can be several integers separated by spaces.
Correct: <listofmyint> 1 5 15037 95977 95945 </listofmyint> Error: <listofmyint> 1 3 ABC </listofmyint>
|
Union type
In the figure, union is used to define a union type. The member types include usstate and listofmyinttype, the value of an element that applies the Union type can be an atomic instance or an instance of the list type. However, an element instance cannot contain two types at the same time.
Correct: <zips> Ca </zips> <Zips> 95630 95977 95945 </zips> <Zips> AK </zips> Error: <zips> Ca 95630 </zips>
|
Anonymous Type Definition
Before defining the element type, we always first define a data type, and then set the element type to the newly defined data type. If the new data type is used only once, we can directly set it in the Element Definition without additional settings. The quantity type of the element in is an integer from 1 to 99.
This new type does not have its own name defined method. We call it an anonymous type definition.
Composite Type
All we mentioned above are simple types, that is, the elements only contain content, and do not include attributes or other elements. Next, let the elements contain attributes and other elements, which are called composite types.
In the figure, we use complextype to indicate that this is a composite type (Here we use the anonymous type to define it ). Simplecontent indicates that this element does not contain sub-elements. extension indicates that the element value is decimal. attribute is used to set its attribute currency and its type is string.
Correct: <internationalprice currency = "EUR"> 423.46 </internationalprice>
|
Mixed content
Similarly, we use the anonymous type method to define an element salutation. We noticed that a mixed = "true" is added after complextype, which indicates that this is a hybrid type, which contains both the content of the element and other sub-elements. The name element is a child element of salutation.
Correct: <salutation> dear mr. <Name> Robert Smith </Name>. </salutation> Error: <salutation> dear mr. </salutation>
|
Sequence indicates that sub-elements appear in the same order as those in the schema. We will talk about the choice and all methods corresponding to sequence later.
Empty content
Sometimes the element has no content at all, and its content model is empty. To define the content as a null type, we can define an element that can only contain child elements but not the content of the element. Then, we do not define any child elements, in this way, we can define elements with the content model being empty.
In the figure, complexconet indicates that only child elements are included. Then, two attributes currency and value are defined, but no child elements are defined.
Correct: <Internationalprice currency = "EUR" value = "/423.46"/> Error: <Internationalprice currency = "EUR" value = "/423.46"> Here is a mistake! </Interanationprice>
|
More concise method definition:
<XSD: element name = "internationalprice"> <XSD: complextype> <XSD: attribute name = "currency" type = "XSD: string"/> <XSD: attribute name = "value" type = "XSD: decimal"/> </XSD: complextype> </XSD: Element>
|
Because a composite type definition without simplecontent or complexcontent is interpreted as complexcontent with the type defined as anytype, This is a default shorthand method, therefore, this concise syntax can work in the mode processor.
Anytype
An anytype type does not constrain its content in any form. We can use anytype, the first statement, like other types. The elements declared in this method are unrestricted. Therefore, the element value can be 423.46, or any other character sequence, or even a mixture of characters and elements. In fact, anytype is the default type, so the preceding statement can be rewritten as the second statement.
If you want to express unrestricted element content, for example, if the element contains prose, which may need to be embedded with tags to support international representation, then the default Statement (no constraint) or some form of micro-constraints will be appropriate.
Note
To facilitate other readers and applications to understand schema documents, XML Schema provides three elements for annotation.
Annotation Documentation Appinfo
|
In the figure, we place a Basic Pattern description and copyright information in the documentation element, which is a recommended place for people to read. We recommend that you use the XML: lang attribute in any documentation element to indicate the language in which the description information is used.
Construct a Content Model
In the figure, we introduce two element group definitions in the purchaseordertype definition. There are two options to describe an address for a purchase order: the first is to include an independent shipping address and receipt address, in the second case, it only contains a simple address, which is the shipping address and the receiving address.
For a choice group element, only one sub-content in this group is allowed in the instance. For the example in the figure, the first child is an internal group element that references the element group named after shipandbill. This element group consists of the element sequence shipto and billto. The second child is singleusaddress. Therefore, in an instance document, the purchaseorder element must either contain a billto element and a shipto element, or contain a singleusaddress element.
The choice group is followed by comment and items element declarations. The declaration of elements and groups is the sub-content of the sequence group. The effect of this definition is that the comment and items elements must follow the address element in order.
In the content model, named or unnamed element groups (represented by group, choice, sequence, and all) can have minoccurs and maxoccurs attributes.
Attribute Group
We can create a named attribute group to include all the attributes expected by the item element, and reference this attribute group itemdeleivery by name in the item element declaration.
Using Attribute groups in this way can improve the readability of the mode document and facilitate the update of the mode document. This is because an attribute group can be defined and edited in one place, and can be referenced in multiple definitions and definitions. Note that an attribute group can contain other attribute groups, and note that the description and reference of the attribute group must be at the end of the composite type definition.
Null Value (nil)
The XML Schema null mechanism includes a null signal. In other words, there is no real null value as the element content. Instead, it indicates that the content of the element is the attribute of null. To display this, we modify the declaration of the shipdate element so that the null value can be clearly notified to the user.
<XSD: element name = "shipdate" type = "XSD: Date" nillable = "true"/>
|
To explicitly indicate that shipdate has a null value in the instance documentation, we can set the NIL attribute to true:
<Shipdate xsi: Nil = "true"> </shipdate>