Searching with character classes

In the last example we had defined a group of thresholds that we wanted to match:

 

#define threshold_low  2400

#define threshold_high 4800

#define threshold_mid  3600

 

We matched them using the dot character (.) as follows:

 

threshold_...

 

However, if we added a new threshold that doesn't belong to our set:

 

#define threshold_low       2400

#define threshold_high      4800

#define threshold_mid       3600

...

#define max_threshold       0

 

our expression threshold_... will match all four lines with the word threshold in them as follows (note that ... will match part of max_threshold because the dot character . matches spaces):

 

#define threshold_low       2400

#define threshold_high      4800

#define threshold_mid       3600

...

#define max_threshold...    0

 

If we want to reduce our match to only get the first three we could introduce the concept of character class. A character class is represented by the [...] sequence where the ... is replaced by the set of ASCII characters that our match must be within.

 

As a first attempt we could use the search string:

 

threshold_[lowhigmd]

 

This would give us the set of all unique characters from our three thresholds while at the same time, not matching max_threshold. However, this is a little unwieldy so we want to introduce a shortcut to simplify things. We can do this by using the character class [A-Za-z]. This will match a character sequence with any characters having the value A-Z or a-z. Our new search string would become:

 

threshold_[A-Za-z]

 

and it would match:

 

#define threshold_low       2400

#define threshold_high      4800

#define threshold_mid       3600

...

#define max_threshold...    0

 

In addition to defining a character class we can also define the complement or inverse character class. We'll look at how we can use this in the next example.

 

  

 

Back: The Dot Character

Forward: Inverse Character Classes