Learn Regex the Easy Way, Part 4: Quantifiers
Quick Recap #
In Part 3, we covered character classes: square brackets that match one character from a set, POSIX classes like [[:digit:]], and negation with [^...]. Now let’s talk about how many characters to match.
The Four Core Quantifiers #
A quantifier tells the regex engine “how many of the previous thing do I want?” There are four you’ll use constantly:
| Quantifier | Meaning | Example |
|---|---|---|
| * | Zero or more | ab*c matches ac, abc, abbc |
| + | One or more | ab+c matches abc, abbc, but NOT ac |
| ? | Zero or one (optional) | colou?r matches color and colour |
| {n} | Exactly n | a{3} matches aaa |
A Critical Detail: What Gets Quantified #
The quantifier applies to the thing directly before it. This is probably the most common beginner mistake.
abc+ # Matches: ab followed by one or more c's
# (abcc, abccc, etc.)
# Does NOT mean "one or more abc"
If you want “one or more abc,” you need parentheses: (abc)+. We’ll cover grouping properly in Part 6.
Zero or More (*) #
The asterisk * is the most permissive quantifier. “Zero or more” means the thing might not be there at all, and that’s still a match.
a # Literal "a"
b* # Zero or more "b" characters
c # Literal "c"
Compact: ab*c
MATCH THESE
ac (zero b's)abc (one b)abbc (two b's)abbbc (three b's)DO NOT MATCH THESE
a (no c)bc (no a)
One or More (+) #
The plus sign + requires at least one match. This is probably the quantifier you’ll use most often.
Optional (?) #
The question mark ? makes the preceding element optional: it can appear zero or one time.
colou?r # The "u" is optional
MATCH THESE
colorcolourDO NOT MATCH THESE
colouurcolr
Exact and Range Counts: {n}, {n,}, {n,m} #
For precise control:
{3}matches exactly 3{3,}matches 3 or more{3,7}matches between 3 and 7
^ # Start of line
[[:alpha:]] # A letter
{3,7} # Between 3 and 7 letters
$ # End of line
MATCH THESE
abc (3 chars)abcde (5 chars)abcdefg (7 chars)DO NOT MATCH THESE
ab (2 chars, too few)abcdefgh (8 chars, too many)
Practical Example: US ZIP Codes #
ZIP codes are either 5 digits or 5 digits, a dash, and 4 more digits. Here we briefly introduce grouping with (...)? to make the dash-plus-four part optional:
^ # Start of line
[[:digit:]]{5} # Exactly 5 digits
( # Start optional group
- # Literal dash
[[:digit:]]{4} # Exactly 4 digits
)? # End optional group (zero or one)
$ # End of line
Compact: ^[[:digit:]]{5}(-[[:digit:]]{4})?$
MATCH THESE
770019021077001-1234DO NOT MATCH THESE
7700 (too few digits)770011 (too many digits)77001- (dash without 4 digits)
What to Practice #
- Write a regex for a string of exactly 8 alphanumeric characters (like a simple password format).
- Write a regex that matches one or more digits followed by an optional period and more digits (like an integer or decimal number).
- Write a regex for “ha”, “haha”, “hahaha” (hint: you’ll need grouping from Part 6, but try it).
- What does
ab?c+match? List five strings it would match and three it wouldn’t. Think carefully.
Definitions #
- Anchor - A regex element that matches a position in the text, not a character.
- Character Class - A set of characters in square brackets that matches any ONE character from the set.
- Exact Count ({n}) - A quantifier that matches the preceding element exactly n times.
- Greedy - A quantifier behavior where the engine matches as much as possible (more detail in Part 8).
- Line Anchor (, $) -
^matches start of line,$matches end of line. - Metacharacter - A character with special meaning in regex.
- Negation (caret inside brackets) -
[^abc]matches any character NOT in the set. - One or More (+) - A quantifier requiring at least one occurrence of the preceding element.
- Optional (?) - A quantifier that makes the preceding element optional (zero or one occurrence).
- POSIX Character Class - A named character class like
[[:digit:]]. - Position (in regex context) - A point between characters. Anchors match positions.
- Quantifier - A regex element that specifies how many times the preceding element should be matched.
- Range Count ({n,m}) - A quantifier that matches the preceding element between n and m times.
- Range (in character classes) - A dash specifying a contiguous set:
[a-z]. - Shorthand Character Class - Abbreviated character classes like
\d,\w,\s. - Word Boundary (\b) - Matches the boundary between word and non-word characters.
- Zero or More (*) - A quantifier allowing zero or more occurrences of the preceding element.
Series Navigation #
- Part 1: Make Regular Expressions the Easy Way
- Part 2: Anchors and Boundaries
- Part 3: Character Classes
- Part 4: Quantifiers (this post)