Table of Contents

Using Regular Expressions (lynda.com)

http://www.regexpal.com/

file:///Users/stedwar1/Desktop/Lynda.com/Ex_Files_UsingRegEx/Exercise%20Files/regexpal/index.html

Modes

Characters

Literal Characters

Regular expressions are eager. The are eager to return a match so the earliest match is preferred.

Metacharacters

The Wildcard Metacharacter

Escaping Metacharacters

Other Special Characters

Character Sets

Defining a Character Set

Character Ranges

Negative Character Sets

Metacharacter inside character sets

Shorthand Character Sets

Shorthand Meaning Equivalent
\d Digit [0-9]
\w Word Character [a-zA-Z0-9]
\s Whitespace [ \t\r\n]
\D Not Digit `[0-9]`
\W Not Word Character `[a-zA-Z0-9_`
\S Not whitespace `[ \t\r\n]`

POSIX Bracket Expressions

Class Meaning Equivalent
[:alpha:] Alpabetic characters A-Za-z
[:digit:] Numeric characters 0-9
[:alnum:] Alphanumeric characters A-Za-z0-9
[:lower:] Lowercase alphabetic characters a-z
[:upper:] Uppercase alphabetic characters A-Z
[:punct:] Punctuation characters
[:space:] Space Characters \s
[:blank:] Blank characters (space, tab)
[:print:] Printable characters, spaces
[:graph:] Printable characters, no spaces
[:cntrl:] Control characters (non-printable)
[:cdigit:] hexadecimal characters A-Fa-f0-9
ps aux | grep --regexp="s[[digit:]]" works
ps aux | grep --regexp="s[digit:]" returns s followed by either :, d, i, g, t

Repetition Expressions

Repetition Metacharacters

Metacharacter Meaning
* Preceding item zero or more times
+ Preceding item one or more times
? Preceding item zero or one time

Quantified repetition

Metacharacter Meaning
{ Start quantified repetition of preceding item
} End quantified repetition of preceding item

Greedy expressions

Lazy Expressions

Metacharacter Meaning
? Make preceding quantifier lazy

Efficiency When Using Repetition

Grouping and Alternation Expressions

Grouping Metacharacters

Metacharacter Meaning
( Start grouped expression
) End grouped expression

* Group portions of the expression

Alternation metacharacter

Metacharacter Meaning
`` Match Previous or next expression

a.k.a Pipe, OR

Writing Logical and Efficient Alternations

So far we have learned that:

`/(peanut|peanutbutter)/` matches “peanut” in “peanutbutter”. It is eager to return a result and the leftmost item gets priority.

`/peanut(butter)?/` matches “peanutbutter” because “butter” is preferred because it is greedy even though “butter” is optional.

`/(w+|FY\d{4}_report\.xls)/` This is an alternation: word character one or more times, or the second choice is “FY four digits, _report.xls”. Using `FY2003_report.xls`, it matches the words “FT2003_report” and “xls”, not the whole thing because it is eager to return a result and never tried the second part.

`/abc|def|ghi|jkl/` using string “abcdefghijlkmnopqrstuvwxyz” matches “abc” with global off. `/xyz|abc|def|ghi|jkl/` using string “abcdefghijlkmnopqrstuvwxyz” also matches “abc” because it tries the second alternation before it ever gets to the end of the string.

`/(three|see|thee|tree)/` “I think those are thin trees.” Moves forward and backward checking each character one at a time starting over each time as it tries the four options.

Repeating and nesting alternations

`/(\d\d|[A-Z][A-Z]){3}/` matches “112233”, “AABBCC”, “AA66ZZ”, and “11AA44”

`/(apple (juice|sauce)|mile(shake)?|sweet (peas|corn|potatoes))/`

`/(apple juice|apple sauce|mile|milkshake|sweet peas|sweet corn|sweet potatoes)/` is the same, just not nested.

`/[\w ]+/` matches all words.

Additional notes from steve

`?>` is a non-backtracking group: http://stackoverflow.com/questions/15413594/what-does-mean-in-a-pcre-regex

Anchored Expressions

Start and end anchors

Metacharacter Meaning
`` Start of string/line. Note this is a dual meaning for ``. The other is to negate character set if it is the first chararacter in the set. `[abcd]`
`$` Endo of string/line
\A Start of string, never end of line
\Z Endo of string, never end of line

* Reference a postion, not an actual character

Line Breaks and Multiline Mode

Word Boundaries

Metacharacter Meaning
`\b` Word baoundary (start/end of word)
`\B` Not a word boundary

* Refence a position, not an actual character

Capturing Groups and Backreferences

Backreferences

Metacharacter Meaning
`\1 through \9` Backreference for positions 1 to 9
`\10 through \99` Backreference for positions 10 to 99
Paris in the
the spring

Backreferences to optional expressions

Finding and replacing using backreferences

U.S. Presidents example

Non-captureing group expressions

Third use of a ? mark. First was an optional character, the second was to signify non-greedy.

MetacharacterMeaning
`?:`Specify a non-capturing group

8. Lookaround Assertions

Positive lookahead assertions