Learn Regex the Easy Way, Part 5: The Dot, Escaping, and Special Characters
Quick Recap #
In Part 4, we covered quantifiers: * (zero or more), + (one or more), ? (optional), and {n,m} (specific counts). We also learned that quantifiers apply to the element directly before them. Now let’s talk about the characters regex treats as special and how to handle them.
The 14 Metacharacters #
Regex has 14 characters that have special meaning. These are called metacharacters:
. * + ? ^ $ { } [ ] ( ) | \
When the engine sees any of these, it doesn’t treat them as literal characters. It interprets them as instructions. We’ve already used several: ^ and $ as anchors, * + ? as quantifiers, [] for character classes.
One note: This lesson isn’t about what these special characters do. A future lesson will cover what meta characters are used for in more detail.
Escaping with Backslash #
What if you actually want to match a literal dot, or a literal asterisk? You put a backslash \ before it. This is called escaping. The backslash tells the engine “treat the next character as a plain, literal character.”
| You Type | It Matches |
|---|---|
\. |
A literal period/dot |
\* |
A literal asterisk |
\+ |
A literal plus sign |
\? |
A literal question mark |
\^ |
A literal caret |
\$ |
A literal dollar sign |
\{ and \} |
Literal curly braces |
| ` |
This is a good example of how a regex with an escaped . would look:
[[:digit:]] # A digit
\. # A literal dot
[[:digit:]] # Another digit
MATCH THESE
3.14 (matches "3.1")127.0.0.1 (matches "127.0" etc.)file.txt (matches "e.t")DO NOT MATCH THESE
3x14127x0x0x1filetxt
The Dot (.) as a Wildcard #
Without a backslash, the dot is a wildcard. It matches any single character except a newline (by default).
c # Literal "c"
. # ANY single character (wildcard)
t # Literal "t"
Compact: c.t
MATCH THESE
catcotcutc9tc!tDO NOT MATCH THESE
ct (no character between c and t)cart (two characters between c and t)
The dot is tempting because it matches everything, but that’s also its weakness. If you know you’re looking for a letter, use [[:alpha:]]. If you know it’s a digit, use [[:digit:]]. Be specific. The dot is lazy regex writing and often matches things you didn’t intend.
Practical Example: Matching Prices #
\$ # Literal dollar sign (escaped)
[[:digit:]]+ # One or more digits (dollars)
\. # Literal dot (escaped)
[[:digit:]]{2} # Exactly 2 digits (cents)
Compact: \$[[:digit:]]+\.[[:digit:]]{2}
MATCH THESE
$19.99$5.00$1234.56DO NOT MATCH THESE
19.99 (no dollar sign)$19 (no cents)$19.9 (only one cent digit)
What to Practice #
- Write a regex that matches an IP address format: four groups of digits separated by literal dots (don’t worry about number ranges yet).
- Write a regex that matches a question mark at the end of a line. (You’ll need to escape it AND use an anchor.)
- Write a regex using the dot wildcard that matches any three-character string starting with “a” and ending with “z”.
- Rewrite the regex from exercise 3 to be more specific: match “a” followed by exactly one lowercase letter, followed by “z”.
Definitions #
- Anchor - A regex element that matches a position, not a character.
- Backslash - The
\character used to escape metacharacters, turning them into literal characters. - Character Class - A set of characters in square brackets matching any ONE character from the set.
- Escaping - Placing a backslash before a metacharacter to match it literally.
\.matches an actual dot. - Exact Count ({n}) - A quantifier matching exactly n occurrences.
- Greedy - Quantifier behavior where the engine matches as much as possible.
- Line Anchor (, $) -
^matches start of line,$matches end. - Literal Character - A character in a regex that matches itself, with no special meaning.
- Metacharacter (full list) - The 14 special characters in regex:
. * + ? ^ $ { } [ ] ( ) | \ - Negation (caret inside brackets) -
[^abc]matches any character NOT in the set. - One or More (+) - A quantifier requiring at least one occurrence.
- Optional (?) - A quantifier: zero or one occurrence.
- POSIX Character Class - A named character class like
[[:digit:]]. - Position (in regex context) - A point between characters matched by anchors.
- Quantifier - Specifies how many times an element should be matched.
- Range Count ({n,m}) - Between n and m occurrences.
- Range (in character classes) - A dash specifying a contiguous set.
- Shorthand Character Class -
\d,\w,\sand their uppercase negations. - Wildcard (dot) - The unescaped dot
.matches any single character except newline (by default). - Word Boundary (\b) - Matches the boundary between word and non-word characters.
- Zero or More (*) - Zero or more occurrences.
Series Navigation #
- Part 1: Make Regular Expressions the Easy Way
- Part 2: Anchors and Boundaries
- Part 3: Character Classes
- Part 4: Quantifiers
- Part 5: The Dot, Escaping, and Special Characters (this post)