Read a .txt file and return a list of words with their frequency in the file

I have this so far but it only prints the .txt file to the screen:


public class ReadFile {
    public static void main(String[] args) throws IOException {
        String Wordlist;
        int Frequency;

        File file = new File("file1.txt");
        BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        String line = null;

        while( (line = br.readLine()) != null) {
            String [] tokens = line.split("\\s+");
Can anyone help me so it prints a word list and the words frequency?
Last updated:12/30/2014 1:03:53 AM

1 Answers

Samuel Fernandes
Samuel Fernandes

Do something like this. I'm assuming only comma or period could occur in the file. Else you'll have to remove other punctuation characters as well. I'm using a TreeMap so the words in the map will be stored their natural alphabetical order

  public static TreeMap<String, Integer> generateFrequencyList()
    throws IOException {
    TreeMap<String, Integer> wordsFrequencyMap = new TreeMap<String, Integer>();
    String file = "/tmp/lorem.txt";
    BufferedReader br = new BufferedReader(new FileReader(file));
    String line;
    while( (line = br.readLine()) != null){
         String [] tokens = line.split("\\s+");
      for (String token : tokens) {
        token = removePunctuation(token);
        if (!wordsFrequencyMap.containsKey(token.toLowerCase())) {
          wordsFrequencyMap.put(token.toLowerCase(), 1);
        } else {
          int count = wordsFrequencyMap.get(token.toLowerCase());
          wordsFrequencyMap.put(token.toLowerCase(), count + 1);
    return wordsFrequencyMap;
  private static String removePunctuation(String token) {
    token = token.replaceAll("[^a-zA-Z]", "");
    return token;
main method for testing is shown below. For getting the percentages, you could get count of all the words by iterating through the map and adding all the values and then do a second pass for getting the percentages. By the way, if this is part of a larger work, you could also take a look at apache commons math library for calculating Frequency distributions. If you use their Frequency class, you can keep adding all the words to it and then get the descriptive statistics at the end.

  public static void main(String[] args) {
    try {
      int totalWords = 0;   
      TreeMap<String, Integer> freqMap = generateFrequencyList();
      for (String key : freqMap.keySet()) {
        totalWords += freqMap.get(key);
      for (String key : freqMap.keySet()) {
    } catch (Exception e) {