XML Tutorials
A simple XML document obtains data and declarations from many different resources and files. In fact, some data comes directly from databases, CGI scripts, or other files that are not in file format. In any form, a project that holds fragments of an XML document is called an entity. Entity references load entities into the XML main document. The generic entity reference loads data into the underlying element of the XML document, while the parameter entity reference loads the data into the document's DTD.
The main contents of this chapter are as follows:
* What is an entity?
* Internal common entity
* External Common entity
* Internal parameter entity
* External parameter entity
* How to create a document from a partial start
* Entities and DTDs in the complete structure document
9.1 What is an entity?
Logically, an XML document consists of a sequence of processes, followed by a strict inclusion of the basic elements of all other elements. However, the actual data for an XML document can be spread across several documents. For example, even if a baseball league contains about 900 players, each player element can be in a separate file form. A storage unit that contains the details of an XML document is called an entity (entities), and an entity may consist of one file, one database record, or another project that contains data. For example, all of the complete XML files in this book are entities.
The storage unit and the base element that contains the XML declaration or document type declaration is called the document Entity (entity). However, the base element and its derived elements can also contain an entity reference that points to additional data that is about to be inserted into the document. An XML processor that checks for correctness will combine all the different entity references into a single logical structure document before submitting the document to the final application or before displaying the file.
Processors that do not check for correctness can but do not necessarily insert external objects; they must insert an internal object.
The main purpose of an entity is to preserve the following: structured XML, other forms of text, or binary data. The sequence process and document type declarations are part of the basic elements of the document to which they belong. An XSL style sheet can be used as an entity only if it is itself a fully structured XML document. The entity that makes up the XSL style sheet is not one of the constituent entities of the XML document that applies the style sheet. A CSS style sheet is not an entity at all.
Most entities have a name that can be referenced. The only exception is the primary file that contains the XML document and the document entity (the document entity is not necessarily a file) compared to the database records, the output of the CGI program, or other data. The document entity, no matter what structure it takes, is a storage unit for storing XML declarations, document type declarations (if any), and basic elements. Therefore, each XML document has at least one entity.
There are two types of entities: internal entities and external entities. Entities that are defined entirely within the document entity are called internal entities. The document itself is such an entity that all XML documents have at least one internal entity.
In contrast, data obtained from a URL-positioned resource is called an external entity. The main document contains only a URL that actually refers to the data location. In HTML, the document that is contained between the <HTML> and </HTML> tags itself is an internal entity, while an IMG element (the actual image data) represents an external entity.
Entities can be divided into two categories: analytical and non-analytical entities. An analytical entity contains a fully structured XML literal. An non-analytical entity contains binary or non-XML literals (similar to e-mail messages). This chapter focuses on the analytical entity if, in essence, most of the current XML processors do not support (if not fully support) an entity that is not available for analysis.
The 11th chapter is the nesting of non-XML data and the irreducible object.
With entity references, data from multiple entities can be merged together to form a single document. A common entity reference incorporates data into the content of the document. Parameter entity references incorporate declarations into the document's DTD. In fact <;, >;, ', "e;, &; are predefined body references, which refer to the text entity <, >, ', ', and & symbols. However, you can also define new entities in the document DTD.
.2 Internal General Entities
Internal generic entity references can be viewed as frequently used text or as abbreviations for forced-format text. <! in a DTD The entity> tag defines the abbreviation, and the abbreviation replaces the text. For example, you can simply define a footer as an entity footer in a DTD, and then each page simply type &footer, rather than typing the same footer at the bottom of each page. In addition, if you decide to change the footer block (perhaps because your e-mail address has changed), just make one change in the DTD, and do not need to change the page that shares the same footer individually.
The generic entity reference starts with the "&" symbol and ends with a ";", with an entity name between two symbols. For example, "<;" is a generic entity reference that is less than the symbol (<), the entity named Lt, and the replacement text of the entity is a character "<". The entity name consists of a mixture of letters and numbers and underscores, prohibiting the use of spaces and other punctuation characters. Similar to other content in XML, entity references are case-sensitive.
Although technically, the colon ":" is allowed in the object name, but as mentioned in chapter 18th, this symbol is reserved for use with the named Domain (namespace).
9.2.1 defines an internal generic entity reference
Using tags in DTDs <! The entity> tag defines an internal generic entity reference that has the following format:
<! Enity name "Replacement text" >
Name is the abbreviation for replacement text. The alternate text needs to be placed in double quotes because it may contain spaces and XML tags. You can type the entity name in the document, and the reader sees the replacement text.
For example, my name is "Elliotte Rusty Harold" (which I blame my parents for taking a long name). Even after years of habit, I still often make mistakes. I can define a generic entity reference for my name so that every time I type &ERH;, the reader will see "Elliotte Rusty Harold", which is defined as follows:
<! Enity ERH "Elliotte Rusty Harold" >
The listing 9-1 example illustrates the &ERH; common entity reference, and figure 9-1 shows the documentation that is loaded into Internet Explorer. As you can see, the &ERH; entity reference output in source code is replaced with "Elliotte Rusty Harold".
Figure 9-1 Listing 9-1 cases where the internal generic entity reference is replaced by the actual entity
Manifest 9-1:erh Internal common entity reference
<?xml version= "1.0" standalone= "Yes" >
<! DOCTYPE Docume [
<! Etity ERH "Elliotte Rusty Harold" >
<! ELEMENT Docume (TITLE, SIGNATURE) >
<! ELEMENT TITLE (#PCDA A) >
<! ELEMENT COPYRIGHT (#PCDATA) >
<! ELEMENT EMAIL (#PCDA A) >
<! ELEMENT last_modified (#PCDATA) >
<! ELEMENT SIGNATURE (COPYRIGHT, EMAIL, last_modified) >
]>
<DOCUMENT>
<TITLE>&ERH;</TITLE>
<SIGNATURE>
<copyright >1999 &ERH;</COPYRIGHT>
<EMAIL>elharo@metalab.unc.edu</EMAIL>
<las _modified> March, 1999 </las _modified>
</SIGNATURE>
</DOCUMENT>
Note that the generic entity references in them, even if the copyright and title are declared to accept only #pcdata subclasses,&erh; still appear among them. Due to &ERH; The alternate text for an entity reference is an analytical character data, so this arrangement is legal. After all entity references are replaced by entity values, the document is checked for correctness.
The same situation occurs when you use a style sheet. When a style sheet reference is present, the style is applied to the element tree that actually exists after the entity reference is replaced with an entity value.
The copyright, e-mail, or final modification date can be declared as a generic entity reference in the following ways:
<! ENTITY COPY99 "Copyright 1999" >
<! ENTITY EMAIL "elharo@metalab.unc.edu" >
<! ENTITY LM "Last modified:" >
Dates in &LM; are ignored because the dates may change for different documents. If you refer to a date as an entity, it does not bring any benefits.
Now, you can rewrite listing 9-1 as a more concise form:
<DOCUMENT>
<TITLE>&ERH;</TITLE>
<SIGNATURE>
<COPYRIGHT>©99; &ERH;</COPYRIGHT>
<EMAIL>&EMAIL;</EMAIL>
<las _modified>&lm; March, 1999</last_modified>
</SIGNATURE>
</DOCUMENT>
One of the benefits of applying entity references instead of writing text is to make it easier to change text, particularly useful when a simple DTD is shared by several documents. For example, if you change your e-mail address from elharo@metalab.unc.edu to eharold@solar.stanford.edu, you only need to modify one line of the DTD as follows, instead of looking for and replacing the e-mail address in multiple documents:
<! ENTITY EMAIL "eharold@solar.stanford.edu" >
9.2.2 use of common entity references in DTDs
The reader may be skeptical about including another generic entity reference in a generic entity reference, as shown in the following code, as follows:
<! ENTITY COPY99 "Copyright 1999 &ERH;" >
In fact, the example is legal because the ERH entity exists as part of the COPY99 entity, and the COPY99 entity itself eventually becomes part of the document's content. Although there are certain limitations, a generic entity reference can also be used here for the rest of the DTD to eventually be converted to part of the document's content (for example, as a default property value). First constraint: The following circular reference cannot be used in a statement:
<! ENTITY ERH "©99 elliotte Rusty Harold"?
<! ENTITY COPY99 "Copyright 1999 &ERH;" "?"
Second restriction: A generic entity reference cannot insert text that is only part of a DTD and cannot be the content of the document. For example, an attempt to use the following shorthand cannot be achieved:
<! ENTITY PCD "(#PCDATA)" >
<! ELEMENT Antimal &PCD;>
<! ELEMENT FOOD &PCD;>
However, it is often useful to use entity references to incorporate text into a document's DTD. For this purpose, XML uses the parameter entity references that are described in the following chapter to do this.
The limit to a common entity value is only that it cannot contain three characters directly:%, &, but can be used to include these three characters by using a character reference. If & and% are only the beginning of an entity reference and do not represent its own meaning, you can include it. Restrictions rarely mean that an entity can contain tags and splits into multiple lines. For example, the following signature entities are valid:
"<SIGNATURE>
<copyright>1999 Elliotte Rusty Harold</copyrigh >
<EMAIL>elharo@metalab.unc.edu</EMAIL>
<LAST_MODIFIED> March, 1999 </LAST_MODIFIED>
</SIGNATURE> "
>
The next issue of concern is whether an entity can have parameters. Can you use the signature entity above, but change the data for each individual last_modified element on each page? The answer is no; the entity is only static alternate text. If you need to transfer data to an entity, use a marker instead and provide the appropriate implementation instructions along with the style sheet.
9.2.3 Predefined generic entity references
XML predefined five common entity references, as shown in table 9-1. Five entity references appear in the XML document in place of special characters that are interpreted as tokens without reference. For example, entity reference <; represents less than sign, and less than symbol < can be interpreted as the beginning of the tag.
For maximum compatibility, if you plan to use predefined entity references, you should declare those references in the DTD. Because you need to avoid recursive references to characters in a DTD, you must be very careful when declaring them. For easy reference declarations, character references use the hexadecimal ASCII value of the character. Listing 9-2 shows the required declarations.
Table 9-1 predefined entity references in XML
Entity references
Character
&
&
<
<
>
>
"
"
'
Listing 9-2: Predefined common entity reference declarations
<! ENTITY LT "& #60;" >
<! ENTITY GT ">" >
<! ENTITY amp "& #38;" >
<! ENTITY apos "'" >
<! ENTITY quot "" ">
9.3 External General Entities
Data other than the primary file that contains the base element/document entity is called an external entity. External entity references can embed external entities in a document and get data from several separate files to be organized into XML documents.
Documents that use only internal entities are very similar to HTML schemas. The full text of the document is stored in a single file, and images, Java applets, sounds, and non-HTML data can also be linked into the file, but at least all the text must be in the file. Of course, there are some problems with HTML schemas. The process of embedding dynamic information especially in a document is a very difficult task. You can embed dynamic information by using CGI, Java applets ' favorite database software, server-side tools, and a variety of other methods, but HTML provides static document support only. The behavior of getting a data build document from several files must be done outside of HTML. The easiest way to solve this problem in HTML is to use a framework, but this is a very bad user interface that usually confuses and dislikes the user.
Part of the problem is that HTML documents are not naturally inserted into another HTML document. Each HTML document has and only one body, server-side embedding provides the ability to embed HTML fragments into a document, rather than embedding a valid document entity, and the server-side reference is dependent on the presence of the server. Instead of the real part of the HTML document.
However, XML is more flexible. The basic element of a document is not necessarily the same as another document base element. Even if two documents share the same basic element, the DTD can declare the element's inclusion to itself. At the appropriate time, the XML standard does not prevent the practice of embedding a fully structured XML document into another structured XML document.
However, XML goes a little further and defines a mechanism by which a new XML document can be built on several local or remote systems, based on smaller XML documents. The task of the parser is to combine all the different documents in a fixed sequence. The document can contain another document, and perhaps this contains other documents as well. As long as there is no recursive document containing behavior (errors that the processor can report), the application can only see a single, complete document. Essentially, this mechanism provides client-side embedding capabilities.
For XML, you can embed another document in a document by using a method that is referenced by an external common entity. In DTDs, external references can be declared in the following syntax structure:
<! ENTITY name SYSTEM "URI" >
URIs are uniform Resource Identifier, similar to URLs, but allow more precise resource chaining specifications. In theory, the URI separates the resource from its storage location, so the Web browser can select several mirrors of the most recent or most idle mirror, without explicitly indicating the link. The field of URI research is very active and hotly debated, so in practical applications and in this book, a URI is a multipurpose URL.
For example, you might want to place the same signature block on every page in your site. To be clear, we assume that the signature block is the XML code shown in listing 9-3, and that the code is assumed to be available from the URL http://metalab.unc.edu/xml/signature.xml.
List 9-3:xml Signature Documents
<?xml version= "1.0" >
<SIGNATURE>
<copyright>1999 Elliotte Rusty harold</copyright>
<EMAIL>elharo@metalab.unc.edu</EMAIL>
</SIGNATURE>
Add the following declaration to the DTD to associate the entity reference &SIG; with this file:
<! ENTITY SIG SYSTEM "Http://metalab.unc.edu/xml/signature.xml" >
You can also use the associated URL. For example:
<! ENTITY SIG SYSTEM "Xml/signature.xml" >
If the referenced file is placed in the same directory as the file that references the file, only a single file name is used to reference it. For example:
<! ENTITY SIG SYSTEM "Signature.xml" >
Using either of these declarations, you need to use &sig only, and you can refer to the contents of the signature file at any location in the document. As shown in the simple document in Listing 9-4, figure 9-2 shows the documents delivered in Internet Explorer 5.0.
Figure 9-2 Documents referenced using an external common entity
Manifest 9-4:sig External Common entity reference
<?xml version= "1.0" standalone= "no" >
<! DOCTYPE DOCUMENT [
<! ELEMENT DOCUMENT (TITLE, SIGNATURE) >
<! ELEMENT TITLE (#PCDATA) >
<! ELEMENT COPYRIGHT (#PCDATA) >
<! ELEMENT EMAIL (#PCDATA) >
<! ELEMENT SIGNATURE (COPYRIGHT, EMAIL) >
<! ENTITY SIG SYSTEM
"Http://metalab.unc.edu/xml/signature.xml"?
]>
</DOCUMENT>
<title>entity references</title>
&SIG;
</DOCUMENT>
Note the additional effect of the external entity reference, because the file is no longer complete, so the standalone property value in the XML declaration is No. The parse file indicates that the file requires data from the external file signature.xml.
9.5 External parameter Entities
A single DTD is used in the preceding example to define all the elements in the document. However, the longer the document, the less the technology is applied. In addition, you usually want to use some of the content in the DTD in many different places.
For example, address definitions are common and can be easily applied in different contexts for message-address DTDs that describe infrequently changing. Similarly, the predefined entity references listed in Listing 9-2 can be used in most XML documents, but do not always copy and copy the manifest.
External parameter entities can be used to make small DTDs into large DTDs. In other words, an external DTD can be linked to another external DTD, and the second DTD introduces the elements and entities declared in the first DTD. Although it is strictly forbidden to use loops--if DTD2 references DTD1, DTD1 cannot reference DTD2