An introduction to regular expressions

Share on facebook
Share on google
Share on twitter
Share on linkedin
How to contribute to open source projects—without writing code

If you’re new to the world of Linux administration and open source software, you’ve probably only just started scratching the surface of the power this new world offers. Eventually, however, you’ll start mining deeper depths. When that fateful moment arrives, chances are you’re going to need to use a regular expression or two.

For the uninitiated, that can be a bit daunting. Say, for instance, you run into this:


What does that bit of cryptic nonsense mean? Well, it’s actually not nonsense. The above regular expression searches for a string of characters (/^ marks the beginning of the string and $/ marks the end), between three and 16 characters, that includes lowercase letters, the numbers 0-9, or an underscore or hyphen.

Every regular expression has meaning and use. Although they might seem a bit complicated for new users, it’s important to understand how they work.

Let’s take a crash course in learning how to compose a regular expression. 

What are regular expressions?

Simply put, regular expressions, or regex, use letters and symbols to define patterns to find matching character sequences in a file or data stream. Regular expressions are a language unto themselves and can be simple or highly complex. Regular expressions can be used in commands, in bash scripts, and even within GUI applications.

What we’re going to do is create a file and then use regular expressions to search that file for strings of characters. The file we’ll create will contain a collection of names and email addresses (fake, of course). We’ll then craft regular expressions to search through that file.

Create the new file with the command:

nano email-list.txt

Paste the following into that file:

Jo St. Claire
Wil Jackson
Miguel Santos
Ashley Tate
Olivia Nightingale
Nathan Gage
Bethany Nitshimi
Jessie Blake
Ralph Moore
Tim Tomas
Tom James

Save and close the file. 

Now, let’s do some searching.

How to search for repeated characters

For our example, we’re going to be employing the egrep tool (which is the same as “grep -E”). The egrep command is quite powerful and easy to use. Say, for example, you don’t remember Jessie Blake’s email address, but you remember the name. You could issue the command:

egrep "Jessie Blake" email-list.txt

That would print out the line containing “Jessie Blake,” which would include the email address. But what if you couldn’t remember Jessie’s name, but knew (for whatever reason) there was a single “ss” string in the name? You could use egrep and a regular expression to search for that string. 

To search for repeated characters, you employ the {x} string (where x is the number of repetition). Since we’re searching for the s character repeated twice, that would look like:


So we employ this with egrep like so:

egrep 's{2}' email-list.txt

The output would display the same results as the original grep command, only highlighting the two repeated “s” characters (Figure A).

Figure A


Repeated “s” characters found with egrep.

The regular expression in the above command is ‘s{2}’

Let’s use that same regular expression and make it a bit more complicated. Say you’re looking for the name that includes “es.” You have one instance of e and two instances of s. How do you use the same type of regular expression as you did before, but search for that string? That regular expression would look like:


So now our egrep command looks like:

egrep 'es{1,2}' email-list.txt

An even more complicated twist on this is to search for all sequences of two or more vowels. This would reveal strings like ai, ue, ey, ia, ea, and ie.

To do this, we’ll employ the [ ] and the { }. The [ ] will encase our vowels to indicate we’re searching for any combination of the characters contained within. Since we’re searching for two or more vowels, we’ll use {2,}. Because we don’t indicate the second numerical value, we leave it open ended.

For this, our regular expression will be:


The egrep command using that string is:

egrep '[aeiouy]{2,}' email-list.txt

The results will highlight the discovered strings (Figure B).

Figure B


Our regular expression does include y as a vowel.

Or what if we want email addresses and last names of anyone in the list with the first names Tim and Tom? That’s possible as well, with the help of our [aeiouy] regular expression. With the help of egrep, we issue the command:

egrep 'T[aeiou]m' email-list.txt

The command will catch both Tim and Tom in Tim Tomas and Tom James (Figure C).

Figure C


Tim and Tom are found.

Notice, however, that regex doesn’t catch “tim” and “tom” in the email addresses. Why? Because regex is case sensitive. To overcome that, we’d have to add another piece to the regular expression like so:

egrep '[Tt][aeiou]m' email-list.txt

As you can see, what we’ve done here is indicate that we’re searching for a string that starts with either T or t, has any combination of vowels, and ends with m. The output would then highlight the email addresses as well (Figure D).

Figure D


Tim and Tom and tim and tom found with regex.

You can also use regex to exclude certain characters in a search, using the ^ symbol. Say, for instance, you want to search that list for any name that starts with T, ends with M, but the middle letter is not o. That regular expression would be:

egrep 'T[^o]m' email-list.txt

The output would only include instances of Tim and not Tom.

Interestingly enough, if you used the ^ character as in “^T”, it would only list lines that began with T. So how you use a character is just as important as what a character can do.

And that’s the beginning of your journey with regex. We’ll continue this journey later on, and build on what we’ve learned so far. However, you should be able to use what you’ve learned here and work it into your bash scripts and commands.

See Also:
Share on facebook
Share on google
Share on twitter
Share on linkedin

Sign up To our Newsletter:

Keep connected to the latest Technology news and Products