Learn Regex the Easy Way, Part 4: Quantifiers

Quick Recap #

In Part 3, we covered character classes: square brackets that match one character from a set, POSIX classes like [[:digit:]], and negation with [^...]. Now let’s talk about how many characters to match.

The Four Core Quantifiers #

A quantifier tells the regex engine “how many of the previous thing do I want?” There are four you’ll use constantly:

Quantifier Meaning Example
* Zero or more ab*c matches ac, abc, abbc
+ One or more ab+c matches abc, abbc, but NOT ac
? Zero or one (optional) colou?r matches color and colour
{n} Exactly n a{3} matches aaa

A Critical Detail: What Gets Quantified #

The quantifier applies to the thing directly before it. This is probably the most common beginner mistake.

abc+        # Matches: ab followed by one or more c's
            # (abcc, abccc, etc.)
            # Does NOT mean "one or more abc"

If you want “one or more abc,” you need parentheses: (abc)+. We’ll cover grouping properly in Part 6.

Zero or More (*) #

The asterisk * is the most permissive quantifier. “Zero or more” means the thing might not be there at all, and that’s still a match.

a           # Literal "a"
b*          # Zero or more "b" characters
c           # Literal "c"

Compact: ab*c

MATCH THESE

  • ac (zero b's)
  • abc (one b)
  • abbc (two b's)
  • abbbc (three b's)

DO NOT MATCH THESE

  • a (no c)
  • bc (no a)

One or More (+) #

The plus sign + requires at least one match. This is probably the quantifier you’ll use most often.

Optional (?) #

The question mark ? makes the preceding element optional: it can appear zero or one time.

colou?r     # The "u" is optional

MATCH THESE

  • color
  • colour

DO NOT MATCH THESE

  • colouur
  • colr

Exact and Range Counts: {n}, {n,}, {n,m} #

For precise control:

^                   # Start of line
[[:alpha:]]         # A letter
{3,7}               # Between 3 and 7 letters
$                   # End of line

MATCH THESE

  • abc (3 chars)
  • abcde (5 chars)
  • abcdefg (7 chars)

DO NOT MATCH THESE

  • ab (2 chars, too few)
  • abcdefgh (8 chars, too many)

Practical Example: US ZIP Codes #

ZIP codes are either 5 digits or 5 digits, a dash, and 4 more digits. Here we briefly introduce grouping with (...)? to make the dash-plus-four part optional:

^                   # Start of line
[[:digit:]]{5}      # Exactly 5 digits
(                   # Start optional group
  -                 # Literal dash
  [[:digit:]]{4}    # Exactly 4 digits
)?                  # End optional group (zero or one)
$                   # End of line

Compact: ^[[:digit:]]{5}(-[[:digit:]]{4})?$

MATCH THESE

  • 77001
  • 90210
  • 77001-1234

DO NOT MATCH THESE

  • 7700 (too few digits)
  • 770011 (too many digits)
  • 77001- (dash without 4 digits)

What to Practice #

  1. Write a regex for a string of exactly 8 alphanumeric characters (like a simple password format).
  2. Write a regex that matches one or more digits followed by an optional period and more digits (like an integer or decimal number).
  3. Write a regex for “ha”, “haha”, “hahaha” (hint: you’ll need grouping from Part 6, but try it).
  4. What does ab?c+ match? List five strings it would match and three it wouldn’t. Think carefully.

Definitions #


Series Navigation #

 
0
Kudos
 
0
Kudos

Now read this

Growl in Retirement

Growl is being retired after surviving for 17 years. With the announcement of Apple’s new hardware platform, a general shift of developers to Apple’s notification system, and a lack of obvious ways to improve Growl beyond what it is and... Continue →