Learn Regex the Easy Way, Part 4: Quantifiers

Quick Recap #

In Part 3, we covered character classes: square brackets that match one character from a set, POSIX classes like [[:digit:]], and negation with [^...]. Now let’s talk about how many characters to match.

The Four Core Quantifiers #

A quantifier tells the regex engine “how many of the previous thing do I want?” There are four you’ll use constantly:

Quantifier Meaning Example
* Zero or more ab*c matches ac, abc, abbc
+ One or more ab+c matches abc, abbc, but NOT ac
? Zero or one (optional) colou?r matches color and colour
{n} Exactly n a{3} matches aaa

A Critical Detail: What Gets Quantified #

The quantifier applies to the thing directly before it. This is probably the most common beginner mistake.

abc+        # Matches: ab followed by one or more c's
            # (abcc, abccc, etc.)
            # Does NOT mean "one or more abc"

If you want “one or more abc,” you need parentheses: (abc)+. We’ll cover grouping properly in Part 6.

Zero or More (*) #

The asterisk * is the most permissive quantifier. “Zero or more” means the thing might not be there at all, and that’s still a match.

a           # Literal "a"
b*          # Zero or more "b" characters
c           # Literal "c"

Compact: ab*c

MATCH THESE

  • ac (zero b's)
  • abc (one b)
  • abbc (two b's)
  • abbbc (three b's)

DO NOT MATCH THESE

  • a (no c)
  • bc (no a)

One or More (+) #

The plus sign + requires at least one match. This is probably the quantifier you’ll use most often.

Optional (?) #

The question mark ? makes the preceding element optional: it can appear zero or one time.

colou?r     # The "u" is optional

MATCH THESE

  • color
  • colour

DO NOT MATCH THESE

  • colouur
  • colr

Exact and Range Counts: {n}, {n,}, {n,m} #

For precise control:

^                   # Start of line
[[:alpha:]]         # A letter
{3,7}               # Between 3 and 7 letters
$                   # End of line

MATCH THESE

  • abc (3 chars)
  • abcde (5 chars)
  • abcdefg (7 chars)

DO NOT MATCH THESE

  • ab (2 chars, too few)
  • abcdefgh (8 chars, too many)

Practical Example: US ZIP Codes #

ZIP codes are either 5 digits or 5 digits, a dash, and 4 more digits. Here we briefly introduce grouping with (...)? to make the dash-plus-four part optional:

^                   # Start of line
[[:digit:]]{5}      # Exactly 5 digits
(                   # Start optional group
  -                 # Literal dash
  [[:digit:]]{4}    # Exactly 4 digits
)?                  # End optional group (zero or one)
$                   # End of line

Compact: ^[[:digit:]]{5}(-[[:digit:]]{4})?$

MATCH THESE

  • 77001
  • 90210
  • 77001-1234

DO NOT MATCH THESE

  • 7700 (too few digits)
  • 770011 (too many digits)
  • 77001- (dash without 4 digits)

What to Practice #

  1. Write a regex for a string of exactly 8 alphanumeric characters (like a simple password format).
  2. Write a regex that matches one or more digits followed by an optional period and more digits (like an integer or decimal number).
  3. Write a regex for “ha”, “haha”, “hahaha” (hint: you’ll need grouping from Part 6, but try it).
  4. What does ab?c+ match? List five strings it would match and three it wouldn’t. Think carefully.

Definitions #


Series Navigation #

 
0
Kudos
 
0
Kudos

Now read this

Building Totem Poles

It seems like forever ago. Chipping away at the long poles for hours, not paying attention to anything except for the chisel, the hammer and the wood. There were more than 30 of us and we were all chipping away. Totem Poles remind me how... Continue →