3.4 KiB
Regular Expressions
First of all, this will be a bit painful but as with vim
once you overcome the initial learning curve you start to see the potential regular expressions bring to the table.
To make matters even worse, there are multiple flavors of regexes.
An overview and comparison between different flavors can be found on wikipedia.
Don't see this as a reason not to learn some basic expressions though, a little experience goes a long way.
What are they?
A regular expression (shortened as regex or regexp;[1] also referred to as rational expression[2][3]) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.
You can see regular expressions as find (and replace) on steroids.
As a practical example, I used a lot of regular expressions to clean up the multiple choice LPI questionnaires.
This was done in vim
so I used the vim flavor regex but it's not too much different from the main one you should know, grep
.
From a practical system administer point of view you'll probably use regexes in this order:
- with
grep
- with
sed
(went copy pasting commands found online) - with
vim
- with a scripting language such as
python3
How to learn them?
Some tips and pointers before we head into the actual syntax.
Vim
There is a setting in vim
that is disabled by default but highly advised to learn vim regexes.
By setting set incsearch
in your ~/.vimrc
or in the expert command line vim will highlight whatever matches the pattern you're searching for.
This can be a tremendous help when building complex patterns.
Grep
By default grep
only interprets basic regular expressions.
If you want, or more likely need to use extended expressions you should use grep -E
or egrep
instead.
For completeness's sake I should mention there is a third version of grep
invoke with grep -P
that interprets the patterns as perl regex.
One of the advantages of perl regexes is reverse matching.
The basics
Exercises
Below are some practical exercises and files to go with them. Use them to test out you grepping skills and as inspiration for personal challenges.
- configuration file
- print only lines with actual configuration settings (ignore comments)
- css file
- extract all the hex color codes
- html file
- html extract pictures
- just jpg
- jpg and png at the same time
- html extract pictures
- log file
- extract all IP addresses
- plus only the unique ones
- extract all wrong logins for known users
- extract all unknown users (this is tricky and requires backwards searching using
grep -P
)
- extract all IP addresses
- mail dump file
- extract all unique email addresses
- extract all web links
- only the base link (https://www.example.co.uk)
- both http and https links
There are some very good regex exercises online as well. This is a good starting point.