Home > DeveloperSection > Forums > C# regular expression to get words between 4 to 10 characters
Takeshi Okada
Takeshi Okada

Total Post:89

Points:629
Posted on    January-30-2014 11:14 PM

 C# C# 
Ratings:


 1 Reply(s)
 948  View(s)
Rate this:

I am trying to get all the words in a string, that are at least 4 characters long and less than 10 characters. When I use the following regular expression, it just returned the whole string as one word. Can you please look at the following example and tell me how should I write this regular expression?

string result = "Overfishing, erosion and warmer waters are feeding jellyfish blooms in coastal regions worldwide. And they're causing damage"

string[] words = Regex.Split(result, @"[\W]{4,10}");

foreach (string line in words)

{

    Console.WriteLine(line);

}



Pravesh Singh

Total Post:411

Points:2881
Posted on    January-30-2014 11:34 PM

Hi Takeshi,

Your code isn't working because the pattern will only match a sequence of 4 to 10 consecutive non-word characters, which doesn't appear in the string. So Regex.Split just returns an array containing the original string.

Try using this pattern:

\b\w{4,10}\b

For example:

string[] words = Regex.Matches(result, @"\b\w{4,10}\b")

                      .Cast<Match>()

                      .Select(m => m.Value)

                      .ToArray();

This will match any sequence of 4 to 10 consecutive word characters, surrounded by word boundaries.


Don't want to miss updates? Please click the below button!

Follow MindStick