Stream tokenizing (exploded string)

Source: Internet
Author: User
Tags object string
Stream| string from the Sun Web site to see the stream tokenizing
In Tech Tips:june, 1998, a example of string tokenization was presented, using the class Java.util.StringTokenizer.

There ' s also another way to do tokenization, using Java.io.StreamTokenizer. Streamtokenizer operates on input streams rather than strings, and each byte into the input stream is regarded as a characte R in the range ' \u0000 ' through ' \u00ff '.

The Streamtokenizer is lower level than StringTokenizer, but offers more control over the tokenization process. The class uses an internal table to control how tokens are parsed, and this syntax table can is modified to change the par Sing rules. Here's a example of how Streamtokenizer works:


Import java.io.*;
Import java.util.*;

public class Streamtoken {
public static void Main (String args[])
{
if (Args.length = = 0) {
SYSTEM.ERR.PRINTLN ("Missing input filename");
System.exit (1);
}

Hashtable wordlist = new Hashtable ();

try {
FileReader FR = new FileReader (args[0]);
BufferedReader br = new BufferedReader (FR);

Streamtokenizer st = new Streamtokenizer (BR);
Streamtokenizer st =
New Streamtokenizer (New StringReader (
"This is a Test");
St.resetsyntax ();
St.wordchars (' A ', ' Z ');
St.wordchars (' A ', ' Z ');
int type;
Object dummy = new Object ();
while (type = St.nexttoken ())!=
streamtokenizer.tt_eof) {
if (type = = Streamtokenizer.tt_word)
Wordlist.put (st.sval, dummy);
}
Br.close ();
}
catch (IOException e) {
System.err.println (e);
}

enumeration enum = Wordlist.keys ();
while (Enum.hasmoreelements ())
System.out.println (Enum.nextelement ());
}
}

In this example, a streamtokenizer are created on the top of a Filereader/bufferedreader pair that represents a text file. Note that a streamtokenizer can also is made to read from a String by using StringReader as illustrated in the Commented-o UT code shown above (StringBufferInputStream also works, although this class has been deprecated).

The method Resetsyntax are used to clear the internal syntax table, so this streamtokenizer forgets any rules that it knows About parsing tokens. Then Wordchars is used to declare this only upper and lower case letters should the to form considered. That's, the only tokens that Streamtokenizer recognizes are sequences to upper and lower case letters.

Nexttoken is called repeatedly to retrieve words, and each resulting word was found in the public instance variable "St.sva L ". The words are inserted into a Hashtable, with at the end of processing the contents of the table are displayed, using a En Umeration as illustrated in Tech Tips:june 23, 1998. So the "This" is the "all" to find "the unique words in a text file and display them."

Streamtokenizer also has special facilities for parsing numbers, quoted, and strings. It's a useful alternative to stringtokenizer, and are especially applicable if you are tokenizing input streams, or wish to Exercise finer control over the tokenization process



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.