Readers may be familiarApache Software fund and its various related projects. Next, we will discuss the Xalan-Java XSLT processor and the application of its segmentation function.
XML data has various formats. However, the data format in the XML document does not necessarily comply with the specifications of the target system. XMLT templates are often used to convert one format to another. Unfortunately, the XSLT method only provides a set of limited functions to execute these transformations.
The Xalan project of the Apache Software Foundation includes Java and C ++ versions of XSLT processors. This processor provides the ability to parse XML documents and convert them using XSLT templates. In addition to standard XSLT transformations, Xalan also provides an extension method. In the methods provided by the extension library, a string tokenizer splits the string into a group of tokens.
Problem area
The tokenize method is used for a precise set of XML conversions. At any time, you can use the tokenize method to break down a string into substrings in a consistent style. In fact, the tokenize method is an XSLT method with two parameters. The first parameter specifies the string to be split. The second parameter is used to separate a string into tokens.
The result of the tokenize method is a group of nodes that represent the token. These tokens and nodes can be processed using iterator or as a single value. You can use tokenizer to break down a string into a group of individual values and obtain a single token from a long string.
Example
To illustrate the tokenize method usage, let's look at an example of using it. The following is an XML document that contains the string to be split:
<CustomerAddress>
<Address1> 9399 W Higgins Street </Address1>
<Address2> Rosemont, IL 60018 </Address2>
</CustomerAddress>
This example demonstrates a customer address record in the system, which contains two rows of addresses. This is a quite common situation in the system. Address information is only used for sending emails, and the actual city, state, and zip code information is not particularly important. Unfortunately, many systems want address information to be divided into cities, states, and zip codes. A mechanism is required to divide the combined <Address2> elements into separate city, state, and zip code elements.
Solution
To provide data to the target system in an appropriate format, I use the tokenize Extension Feature of Xalan. This method Splits a string, such as an image address, into multiple tokens based on a set of delimiters. If no Delimiter is specified, use the default space symbol as the delimiter. In our example, the delimiter used includes a space symbol and a comma.