The most common method for parsing XML in Perl is to use XML: Dom and XML: simple. XML: The Dom is too large and the parsing result is a DOM tree, which is inconvenient to operate. For small and non-complex XML files, XML: Dom is a cool tool. Now it's the turn of the lightweight XML: simple.
XML: simple is as simple as its name. Assume that the XML content is as follows:
<opt>
<user login = "grep" fullname = "Gary R Epstein" />
<user login = "stty" fullname = "Simon T Tyson">
<session pid = "12345" />
</ user>
<text> This is a test. </ text>
</ opt>
Then just write:
use XML :: Simple;
use Data :: Dumper;
$ xml = XMLin ('sample.xml');
print Dumper ($ xml);
You can easily parse the XML into a hash, and then use foreach to process it in turn.
$ VAR1 = {
'text' => 'This is a test.',
'user' => [
{
'fullname' => 'Gary R Epstein',
'login' => 'grep'
},
{
'session' => {
'pid' => '12345'
},
'fullname' => 'Simon T Tyson',
'login' => 'stty'
}
]
};
The following laws can be found:
The tag name of the element is used as the hash key.
The content of a single element is used as the value of the hash, and the content of multiple repeated elements is placed in an array reference as the value of the hash
Attributes and subelements appear in the content of the element as hash key => value pairs
One problem is that the inconsistent results of processing a single element and multiple repeating elements will cause foreach processing to be more troublesome (need to distinguish between scalar and array references), such as the value of text and user above. The solution is to add the option ForceArray => 1, you can force a single element to be placed in the array reference.
$ xml = XMLin ('sample.xml', ForceArray => 1);
print Dumper ($ xml);
Operation result (part):
$ VAR1 = {
'text' => [
'This is a test.'
],
'user' => [
...
Another problem is that if your element attribute contains id, name or key, then the element is no longer placed in the array reference, but in the hash reference. For example, the following XML, pay attention to the difference with the above results:
<opt>
<user id = "grep" fullname = "Gary R Epstein" />
<user id = "stty" fullname = "Simon T Tyson">
<session pid = "12345" />
</ user>
<text> This is a test. </ text>
</ opt>
$ VAR1 = {
'text' => [
'This is a test.'
],
'user' => {
'grep' => {
'fullname' => 'Gary R Epstein'
},
'stty' => {
'session' => [
{
'pid' => '12345'
}
],
'fullname' => 'Simon T Tyson'
}
}
};
The content of user is no longer an array reference, but a hash reference, and id = 'grep' also becomes a key.
To disable this feature, you should specify the option KeyAttr => ''. This option means that which attributes should be used as hash keys during parsing. The default values are ['id', 'name', 'key'].
In the XML :: Simple documentation, all options are described in detail, and the KeyAttr and ForceArray options are marked as important, showing how common they are.