Learn Regex the Easy Way, Part 2: Anchors and Boundaries

Learn Regex the Easy Way, Part 2: Anchors and Boundaries #

Quick Recap #

In Part 1, we talked about what regular expressions are, why they look intimidating (they really do look like a cat sat on your keyboard), and why verbose mode with comments is the way to learn them. We also set up regex101.com in PCRE2 mode. Now let’s write some actual regex.

Anchors Match Positions, Not Characters #

Here’s the first counterintuitive thing about regex: not everything matches a character. Some symbols match a position in the text. These are called anchors.

Think of it this way. When you type the letter a in a regex, the engine finds the letter “a” in your text and consumes it. The match moves forward one character. Anchors don’t do that. They check “am I at a certain position?” and if yes, the match continues, but the engine doesn’t move forward. Nothing is consumed. No character is matched. Just a position.

The Line Anchors: ^ and $ #

These two are the anchors you’ll use constantly:

When you use ^ and $ together, you’re saying “match the entire line, start to finish. Nothing extra before, nothing extra after.”

Here’s a verbose regex that matches the exact words “cat”, “dog”, or “bird” as complete lines:

^                # Start of line
(cat|dog|bird)   # Match one of these words
$                # End of line

The compact version: ^(cat|dog|bird)$

MATCH THESE

  • cat
  • dog
  • bird

DO NOT MATCH THESE

  • catfish
  • hotdog
  • birds
  • the cat sat

Without the anchors, the regex cat would match the “cat” inside “catfish” and “the cat sat.” The anchors prevent that by requiring the match to span the entire line.

Word Boundaries: \b #

Sometimes you don’t want to match an entire line. You just want to match a whole word. That’s what \b does. It matches the position between a word character and a non-word character. Think of it like the edge of a word.

A “word character” is any letter, digit, or underscore (the same characters matched by [[:word:]] or \w). Everything else is a “non-word character.” The boundary \b sits right at that edge.

\b     # Word boundary (edge of word)
cat    # The literal text "cat"
\b     # Word boundary (other edge)

Compact: \bcat\b

MATCH THESE

  • the cat sat on the mat (matches “cat”)
  • I saw a cat today (matches “cat”)
  • cat (matches “cat”)

DO NOT MATCH THESE

  • concatenate (no match)
  • category (no match)
  • catfish (no match)
  • scat (no match)

There’s also \B (capital B), which matches a position that is NOT a word boundary. You’ll rarely use it, but it exists.

Practical Example: Matching .txt Filenames #

Let’s match filenames that end with .txt but not .txt.bak:

^                  # Start of line
[[:alnum:]_.-]+    # One or more valid filename characters
\.txt              # Literal ".txt" (dot is escaped)
$                  # End of line (nothing after .txt)

Compact: ^[[:alnum:]_.-]+\.txt$

MATCH THESE

  • report.txt
  • my-file.txt
  • data_2025.txt

DO NOT MATCH THESE

  • report.txt.bak
  • image.png
  • readme.md

The $ anchor at the end is doing the heavy lifting here. It ensures nothing comes after .txt, which is what excludes .txt.bak files. Open regex101.com, set it to PCRE2, and try this yourself.

What to Practice #

  1. Write a regex that matches lines containing only a single digit (0 through 9). Use anchors.
  2. Write a regex that finds the word “error” as a whole word (not inside “errors” or “terrorism”). Use word boundaries.
  3. Write a regex that matches filenames ending in .log but not .log.gz.
  4. Write a regex that matches lines that are completely empty (hint: ^$).

Definitions #


Series Navigation #

 
1
Kudos
 
1
Kudos

Now read this

Skill: BBQ, revealing something more

I’ve seen a new trend on LinkedIn that I think is pretty subtle yet revealing. LinkedIn added a new feature to let others mark that you have skills in certain things. For instance if you know about Project Management, 4 people could... Continue →