Read a .txt file and return a list of words with their frequency in the file

Question

Samuel Fernandes · Answer

Do something like this. I'm assuming only comma or period could occur in the file. Else you'll have to remove other punctuation characters as well. I'm using a TreeMap so the words in the map will be stored their natural alphabetical order

  public static TreeMap<String, Integer> generateFrequencyList()
    throws IOException {
    TreeMap<String, Integer> wordsFrequencyMap = new TreeMap<String, Integer>();
    String file = "/tmp/lorem.txt";
    BufferedReader br = new BufferedReader(new FileReader(file));
    String line;
    while( (line = br.readLine()) != null){
         String [] tokens = line.split("\\s+");
      for (String token : tokens) {
        token = removePunctuation(token);
        if (!wordsFrequencyMap.containsKey(token.toLowerCase())) {
          wordsFrequencyMap.put(token.toLowerCase(), 1);
        } else {
          int count = wordsFrequencyMap.get(token.toLowerCase());
          wordsFrequencyMap.put(token.toLowerCase(), count + 1);
        }
      }
    }
    return wordsFrequencyMap;
  }
  private static String removePunctuation(String token) {
    token = token.replaceAll("[^a-zA-Z]", "");
    return token;
  }

main method for testing is shown below. For getting the percentages, you could get count of all the words by iterating through the map and adding all the values and then do a second pass for getting the percentages. By the way, if this is part of a larger work, you could also take a look at apache commons math library for calculating Frequency distributions. If you use their Frequency class, you can keep adding all the words to it and then get the descriptive statistics at the end.

  public static void main(String[] args) {
    try {
      int totalWords = 0;   
      TreeMap<String, Integer> freqMap = generateFrequencyList();
      for (String key : freqMap.keySet()) {
        totalWords += freqMap.get(key);
      }
      System.out.println("Word\tCount\tPercentage");
      for (String key : freqMap.keySet()) {
         System.out.println(key+"\t"+freqMap.get(key)+"\t"+((double)freqMap.get(key)*100.0/(double)totalWords));    
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

forum

Read a .txt file and return a list of words with their frequency in the file

Anonymous User

Can you answer this question?

1 Answers

Liked By