Use Xalan-Java to separate strings

Source: Internet
Author: User
Tags xslt xslt processor

 

Readers may be familiarApache Software fund and its various related projects. Next, we will discuss the Xalan-Java XSLT processor and the application of its segmentation function.
XML data has various formats. However, the data format in the XML document does not necessarily comply with the specifications of the target system. XMLT templates are often used to convert one format to another. Unfortunately, the XSLT method only provides a set of limited functions to execute these transformations.
The Xalan project of the Apache Software Foundation includes Java and C ++ versions of XSLT processors. This processor provides the ability to parse XML documents and convert them using XSLT templates. In addition to standard XSLT transformations, Xalan also provides an extension method. In the methods provided by the extension library, a string tokenizer splits the string into a group of tokens.

Problem area

The tokenize method is used for a precise set of XML conversions. At any time, you can use the tokenize method to break down a string into substrings in a consistent style. In fact, the tokenize method is an XSLT method with two parameters. The first parameter specifies the string to be split. The second parameter is used to separate a string into tokens.

The result of the tokenize method is a group of nodes that represent the token. These tokens and nodes can be processed using iterator or as a single value. You can use tokenizer to break down a string into a group of individual values and obtain a single token from a long string.

Example

To illustrate the tokenize method usage, let's look at an example of using it. The following is an XML document that contains the string to be split:

<CustomerAddress>
<Address1> 9399 W Higgins Street </Address1>
<Address2> Rosemont, IL 60018 </Address2>
</CustomerAddress>
This example demonstrates a customer address record in the system, which contains two rows of addresses. This is a quite common situation in the system. Address information is only used for sending emails, and the actual city, state, and zip code information is not particularly important. Unfortunately, many systems want address information to be divided into cities, states, and zip codes. A mechanism is required to divide the combined <Address2> elements into separate city, state, and zip code elements.

Solution

To provide data to the target system in an appropriate format, I use the tokenize Extension Feature of Xalan. This method Splits a string, such as an image address, into multiple tokens based on a set of delimiters. If no Delimiter is specified, use the default space symbol as the delimiter. In our example, the delimiter used includes a space symbol and a comma.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.