linux_course_doc/modules/qualifying/learning_regex.md

3.4 KiB

Regular Expressions

First of all, this will be a bit painful but as with vim once you overcome the initial learning curve you start to see the potential regular expressions bring to the table. To make matters even worse, there are multiple flavors of regexes. An overview and comparison between different flavors can be found on wikipedia. Don't see this as a reason not to learn some basic expressions though, a little experience goes a long way.

What are they?

A regular expression (shortened as regex or regexp;[1] also referred to as rational expression[2][3]) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.

wikipedia

You can see regular expressions as find (and replace) on steroids. As a practical example, I used a lot of regular expressions to clean up the multiple choice LPI questionnaires. This was done in vim so I used the vim flavor regex but it's not too much different from the main one you should know, grep.

From a practical system administer point of view you'll probably use regexes in this order:

  1. with grep
  2. with sed (went copy pasting commands found online)
  3. with vim
  4. with a scripting language such as python3

How to learn them?

Some tips and pointers before we head into the actual syntax.

Vim

There is a setting in vim that is disabled by default but highly advised to learn vim regexes. By setting set incsearch in your ~/.vimrc or in the expert command line vim will highlight whatever matches the pattern you're searching for. This can be a tremendous help when building complex patterns.

Grep

By default grep only interprets basic regular expressions. If you want, or more likely need to use extended expressions you should use grep -E or egrep instead. For completeness's sake I should mention there is a third version of grep invoke with grep -P that interprets the patterns as perl regex. One of the advantages of perl regexes is reverse matching.

The basics

Exercises

Below are some practical exercises and files to go with them. Use them to test out you grepping skills and as inspiration for personal challenges.

  • configuration file
    • print only lines with actual configuration settings (ignore comments)
  • css file
    • extract all the hex color codes
  • html file
    • html extract pictures
      • just jpg
      • jpg and png at the same time
  • log file
    • extract all IP addresses
      • plus only the unique ones
    • extract all wrong logins for known users
    • extract all unknown users (this is tricky and requires backwards searching using grep -P)
  • mail dump file
    • extract all unique email addresses
    • extract all web links

There are some very good regex exercises online as well. This is a good starting point.