Learn Regex the Easy Way, Part 2: Anchors and Boundaries
Learn Regex the Easy Way, Part 2: Anchors and Boundaries #
Quick Recap #
In Part 1, we talked about what regular expressions are, why they look intimidating (they really do look like a cat sat on your keyboard), and why verbose mode with comments is the way to learn them. We also set up regex101.com in PCRE2 mode. Now let’s write some actual regex.
Anchors Match Positions, Not Characters #
Here’s the first counterintuitive thing about regex: not everything matches a character. Some symbols match a position in the text. These are called anchors.
Think of it this way. When you type the letter a in a regex, the engine finds the letter “a” in your text and consumes it. The match moves forward one character. Anchors don’t do that. They check “am I at a certain position?” and if yes, the match continues, but the engine doesn’t move forward. Nothing is consumed. No character is matched. Just a position.
The Line Anchors: ^ and $ #
These two are the anchors you’ll use constantly:
^matches the position at the beginning of a line$matches the position at the end of a line
When you use ^ and $ together, you’re saying “match the entire line, start to finish. Nothing extra before, nothing extra after.”
Here’s a verbose regex that matches the exact words “cat”, “dog”, or “bird” as complete lines:
^ # Start of line
(cat|dog|bird) # Match one of these words
$ # End of line
The compact version: ^(cat|dog|bird)$
MATCH THESE
catdogbirdDO NOT MATCH THESE
catfishhotdogbirdsthe cat sat
Without the anchors, the regex cat would match the “cat” inside “catfish” and “the cat sat.” The anchors prevent that by requiring the match to span the entire line.
Word Boundaries: \b #
Sometimes you don’t want to match an entire line. You just want to match a whole word. That’s what \b does. It matches the position between a word character and a non-word character. Think of it like the edge of a word.
A “word character” is any letter, digit, or underscore (the same characters matched by [[:word:]] or \w). Everything else is a “non-word character.” The boundary \b sits right at that edge.
\b # Word boundary (edge of word)
cat # The literal text "cat"
\b # Word boundary (other edge)
Compact: \bcat\b
MATCH THESE
the cat sat on the mat(matches “cat”)I saw a cat today(matches “cat”)cat(matches “cat”)DO NOT MATCH THESE
concatenate(no match)category(no match)catfish(no match)scat(no match)
There’s also \B (capital B), which matches a position that is NOT a word boundary. You’ll rarely use it, but it exists.
Practical Example: Matching .txt Filenames #
Let’s match filenames that end with .txt but not .txt.bak:
^ # Start of line
[[:alnum:]_.-]+ # One or more valid filename characters
\.txt # Literal ".txt" (dot is escaped)
$ # End of line (nothing after .txt)
Compact: ^[[:alnum:]_.-]+\.txt$
MATCH THESE
report.txtmy-file.txtdata_2025.txtDO NOT MATCH THESE
report.txt.bakimage.pngreadme.md
The $ anchor at the end is doing the heavy lifting here. It ensures nothing comes after .txt, which is what excludes .txt.bak files. Open regex101.com, set it to PCRE2, and try this yourself.
What to Practice #
- Write a regex that matches lines containing only a single digit (0 through 9). Use anchors.
- Write a regex that finds the word “error” as a whole word (not inside “errors” or “terrorism”). Use word boundaries.
- Write a regex that matches filenames ending in
.logbut not.log.gz. - Write a regex that matches lines that are completely empty (hint:
^$).
Definitions #
- Anchor — A regex element that matches a position in the text, not a character. Anchors don’t consume any characters; they just check “am I at this position?”
- Line Anchor (
^,$) — The caret^matches the start of a line; the dollar sign$matches the end of a line. Together, they require the pattern to span the entire line. - Metacharacter — A character that has special meaning in regex instead of representing itself literally. Examples include
.,*,^,$, and\. - Position (in regex context) — A point between characters (or before the first/after the last character) in the text. Anchors and boundaries match positions, not characters.
- Word Boundary (
\b) — Matches the position between a word character (letter, digit, or underscore) and a non-word character. Think of it as the edge of a word.
Series Navigation #
- Part 1: Make Regular Expressions the Easy Way
- Part 2: Anchors and Boundaries (this post)