web| Insert | Site Since RSS is technically a well-formed XML document, it can be handled using standard XML programming techniques. There are two main technologies: SAX (The simple API for XML) and Dom (the Document Object Model).
The SAX parser traverses the entire XML document while it is working, and calls a specific function when it encounters a type-free tag. For example, call a specific function to handle a start tag, call another function to handle an end tag, and then call a function to handle the data between the two. The parser's job is simply to traverse the document sequentially. And the function it calls is responsible for handling the discovered markup. Once a tag is processed, the parser continues to parse the next element in the document, and the process repeats itself.
On the other hand, the DOM parser works by reading the entire XML document into memory and converting it into a hierarchical tree structure. It also provides APIs for accessing different tree nodes (and the content attached to the nodes). Recursive processing plus API functions enable developers to distinguish between different types of nodes (elements, attributes, character data, annotations, etc.), and make it possible to perform different actions depending on the node type and node depth of the document tree.
Sax and Dom parsers support almost every language, including your favorite--php. I'll use PHP's SAX parser to deal with RDF examples in this article. Of course, it's also easy to use a DOM parser.
Let's take a look at this simple example and write it down in the brain. Here's an RDF file I'm going to use, which is directly selected from http://www.freshmeat.net/:
<?xml version= "1.0" encoding= "Iso-8859-1"?>
<RDF:RDF xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
Xmlns= "http://purl.org/rss/1.0/"
Xmlns:dc= "HTTP://PURL.ORG/DC/ELEMENTS/1.1/"
>
<channel rdf:about= "http://freshmeat.net/" >
<title>freshmeat.net</title>
<link>http://freshmeat.net/</link>
<description>freshmeat.net maintains the Web ' s largest index of Unix
and cross-platform open source software. Thousands of applications are
Meticulously cataloged in the Freshmeat.net database, and links to new
Code are added daily.</description>
<dc:language>en-us</dc:language>
<dc:subject>Technology</dc:subject>
<dc:publisher>freshmeat.net</dc:publisher>
<dc:creator>freshmeat.net contributors</dc:creator>
<dc:rights>copyright (c) 1997-2002 osdn</dc:rights>
<dc:date>2002-02-11T10:20+00:00</dc:date>
<items>
<rdf:Seq>
<rdf:li rdf:resource= "http://freshmeat.net/releases/69583/"/>
<rdf:li rdf:resource= "http://freshmeat.net/releases/69581/"/>
The following is a PHP script that analyzes the document and displays the data:
<?php
XML file
$file = "FM-RELEASES.RDF";
Set up some variables for use by the parser
$currentTag = "";
$flag = "";
Create parser
$XP = Xml_parser_create ();
Set Element Handler
Xml_set_element_handler ($XP, "Elementbegin", "elementend");
Xml_set_character_data_handler ($XP, "characterdata");
Xml_parser_set_option ($xp, xml_option_case_folding, TRUE);
Read XML file
if (!) ( $fp = fopen ($file, "R"))
{
Die ("Could not read $file");
}
Parse data
while ($xml = Fread ($fp, 4096))
{
if (!xml_parse ($XP, $xml, feof ($FP))
{
Die ("XML parser Error:".)
Xml_error_string (Xml_get_error_code ($XP)));
}
}
Destroy parser
Xml_parser_free ($XP);
Opening Tag Handler
function Elementbegin ($parser, $name, $attributes)
{
Global $currentTag, $flag;
Export the name of the current tag to the global scope
$currentTag = $name;
If within an item block, set a flag
if ($name = = "ITEM")
{
$flag = 1;
}
}
Closing Tag Handler
function Elementend ($parser, $name)
{
Global $currentTag, $flag;
$currentTag = "";
If exiting an item blocks, print a line and reset the flag
if ($name = = "ITEM")
{
echo "$flag = 0;
}
}
Character Data Handler
function Characterdata ($parser, $data)
{
Global $currentTag, $flag;
If within an item block, print item data
if ($currentTag = = "TITLE" | | $currentTag = = "LINK" | |
$currentTag = =
"DESCRIPTION") && $flag = = 1)
{
echo "$currentTag: $data <br>";
}
}
?>
Can't you see? Don't worry, there will be an explanation later.
Capture Flag
The first thing to do with this script is to set some global variables:
XML file
$file = "FM-RELEASES.RDF";
Set up some variables for use by the parser
$currentTag = "";
$flag = "";
$currentTag variable save is the name of the element that the parser is currently working on-you'll soon see why you need it.
Because my ultimate goal is to display every single item (item) in the channel with a link. Also know when the parser exits the <channel></channel> block and when it enters the <item></item> part of the document. Besides, I'm using the SAX parser, which works in a sequential fashion, with no parser APIs available, and no way to know the depth and location of the document tree. So I had to invent a mechanism to do it myself--that's why the $flag variable was introduced.
$flag variable will be used to determine whether the parser is in the <channel> block or in the <item> block.
The next step is to initialize the SAX parser and start parsing the RSS document.
Create parser
$XP = Xml_parser_create ();
Set Element Handler
Xml_set_element_handler ($XP, "Elementbegin", "elementend");
Xml_set_character_data_handler ($XP, "characterdata");
Xml_parser_set_option ($xp, xml_option_case_folding, TRUE);
Read XML file
if (!) ( $fp = fopen ($file, "R"))
{
Die ("Could not read $file");
}
Parse data
while ($xml = Fread ($fp, 4096))
{
if (!xml_parse ($XP, $xml, feof ($FP))
{
Die ("XML parser Error:".)
Xml_error_string (Xml_get_error_code ($XP)));
}
}
Destroy parser
Xml_parser_free ($XP);
The code is straightforward, and the annotations are clear enough. The Xml_parser_create () function creates a parser instance and assigns it to the handle $XP. The callback function is then created to handle open and closed tags, as well as character data between the two. Finally, the Xml_parse () function is combined with multiple fread () calls to read the RDF file and parse it.
In the document, the Open tag processor Elementbegin () is invoked each time an open tag is encountered.
Opening Tag Handler
function Elementbegin ($parser, $name, $attributes)
{
Global $currentTag, $flag;
Export the name of the current tag to the global scope
$currentTag = $name;
If within an item block, set a flag
if ($name = = "ITEM")
{
$flag = 1;
}
}
This function takes the name and property of the current tag as the argument. The tag name is assigned to the global variable $currenttag. If the opening mark is <item>, then the $flag variable is placed in 1.
Similarly, if a closed tag is encountered, the closed tag processor Elementend () is invoked.
Closing Tag Handler
function Elementend ($parser, $name)
{
Global $currentTag, $flag;
$currentTag = "";
If exiting an item blocks, print a line and reset the flag
if ($name = = "ITEM")
{
echo "$flag = 0;
}
}
The closed tag handler function also takes the tag name as its argument. If you encounter a closed tag for </item>, the value of the variable $flag is reset to 0 and the value of the variable $currenttag is emptied.
So, how do you handle the character data between tags? This is where we are interested. Say hello to the character Data Processor (Characterdata) first.
Character Data Handler
function Characterdata ($parser, $data)
{
Global $currentTag, $flag;
If within an item block, print item data
if ($currentTag = = "TITLE" | | $currentTag = = "LINK" | |
$currentTag = =
"DESCRIPTION") && $flag = = 1)
{
echo "$currentTag: $data <br>";
}
}
Now you can look at the parameters passed to this function, and you'll find that it only receives data between the open and closed tags, and doesn't know which tag the parser is currently working on. And this is the reason we introduced the global variable $currenttag from the beginning.
If the value of the $flag variable is 1, that is, if the parser is currently between <item></itme> blocks, then the element currently being processed, whether <title>,<link> or < Description>, the data is printed on the output device (where the output device is a Web browser), followed by a newline character <br> after the output of each element.
The entire RDF document is processed in this order, showing a certain output for each <item> tag found. You can look at the following running results:
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.