The StreamTokenizer is another very useful class that parses an input stream into tokens. This class is not derived from InputStream or OutputStream. Yet, it is classified under the I/O library.

The reason behind this is that it works only with InputStream objects. It tokenizes an underlying stream or even a Reader into tokens. Here’s what we mean by tokenizing: The sentence “Mary had a little lamb” contains five tokens, because each word is considered a token.

Once a given input stream is tokenized, we use the nextToken method in a loop to iterate through all the tokens. For each token, we can find its kind, value, and so on, with the help of several predefined fields or attributes. For example,

·    The ttype field indicates the type of token read, which can be a word, number, or end-of-line.

·    The sval field indicates the string value of a token.

·    The nval field indicates its numeric value.

We will learn to use these fields in the next program. Before starting the loop, we can set the syntax table to customize what is recognized and what is ignored; otherwise, we can simply use the default rules. The class recognizes identifiers, numbers, quoted strings, and C/C++-style comments.

Utility to count Words and Numbers
Program Code

public class WordAndNumberParser {
                     public static void main(String args[]) throws IOException {
                                 if (args.length < 1) {
                               System.out.println("Usage: java WordAndNumberParser <filename>");                                       System.exit(0);
                           WordAndNumberParser app = new WordAndNumberParser();

                     private void parseFile(String fileName) {
                             int wordCount = 0;
                             int numberCount = 0;
                             try (FileReader reader = new FileReader(fileName);) {
                                      StreamTokenizer tokenizer = new StreamTokenizer(reader);                                       tokenizer.slashSlashComments(true);
                                      while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {
                                                if (tokenizer.ttype == StreamTokenizer.TT_WORD) {                                                           wordCount++;
                                         }else if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {                                                           numberCount++;
                                                                }if (tokenizer.sval != null &&                                                                                 tokenizer.sval.equals("DataInputStream")) {                                                           System.out.println(tokenizer.toString());

                             } catch (FileNotFoundException fe) {
                                      System.out.println("File not found: " + fileName);
                             } catch (IOException ioe) {
                                      System.out.println("Error parsing file");
                             System.out.println("Number of words: " + wordCount);
                             System.out.println("Number of numerals: " + numberCount);



The main function, after checking for the proper invocation, creates an application instance and calls its parseFile method. The parseFile method creates a StreamTokenizer instance by first opening the given file using the character-oriented reader classes discussed earlier:

·         FileReader reader = new FileReader(fileName);

·         StreamTokenizer tokenizer = new StreamTokenizer(reader);

Note that we use the try-with-resources syntax of Java SE 7 for opening the file. Before parsing the file, we set the following constraints:

·         tokenizer.slashSlashComments(true);

·         tokenizer.slashStarComments(true);

The tokenizer now ignores both styles of Java comments (that is, single and multiline). The tokenizer ignores all the tokens inside these comments. We now set up a while loop to iterate through all the tokens in the file:

·     while (tokenizer.nextToken() != StreamTokenizer.TT_EOF) {

For each token, we check whether it is an alphanumeric word or a number by comparing its ttype field with the predefined constants:

if (tokenizer.ttype == StreamTokenizer.TT_WORD) {

} else if (tokenizer.ttype == StreamTokenizer.TT_NUMBER) {



Accordingly, the program increments the two counters. Within the loop, we also check whether the current token equals the identifier DataInputStream. If so, we print the line number on which the token is found:

if (tokenizer.sval != null

&& tokenizer.sval.equals("DataInputStream")) {



After the loop terminates, the program prints the word and number count to the console.


A sample, typical output run on the  program is shown here:

Token[DataInputStream], line 29

Number of words: 78
Number of numerals: 9


The output shows the word DataInputStream occurred in one place, on line number 29. The number of words in the entire file is 78, and the number of numerals is nine.

We can modify the contents of the comments in the input file to confirm that the tokenizer indeed ignores the comments. Note that the actual output will vary depending on your input file.

  Modified On Dec-18-2017 02:13:05 AM

Leave Comment