DevToolsForYou
Text Processing

How to Write a Regex Pattern Step by Step

A practical guide to building regular expressions from scratch — breaking down the problem, choosing the right tokens, testing incrementally, and avoiding common traps.

2 min readUpdated Apr 11, 2026

The process: describe before you code

Before writing a single character, write out in plain English what the string must look like. 'An email address: one or more word characters or dots or plus signs, then an @ sign, then a domain with at least one dot.' That sentence maps almost directly to regex tokens. Skipping this step leads to regex written by trial and error that breaks on edge cases.

Start with a literal match, then generalise

Write the simplest pattern that matches one example, then expand it to match more cases. Do not try to write the final pattern in one go.

text
# Match a specific date: 2024-01-15
2024-01-15

# Generalise: any 4-digit year
\d{4}-01-15

# Generalise: any month (01–12)
\d{4}-(0[1-9]|1[0-2])-15

# Generalise: any day (01–31)
\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])

# Anchor so it matches the whole string, not just part of it
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

Key building blocks

text
# Character classes
\d       one digit (0–9)
\w       word character (a-z A-Z 0–9 _)
\s       whitespace (space, tab, newline)
[aeiou]  one of: a, e, i, o, u
[^aeiou] anything except a, e, i, o, u
[a-z]    a through z

# Quantifiers
*        0 or more (greedy)
+        1 or more (greedy)
?        0 or 1
{3}      exactly 3
{2,5}    2 to 5
*?       0 or more (lazy — match as few as possible)

# Anchors
^        start of string
$        end of string
\b       word boundary

# Groups
(abc)    capturing group
(?:abc)  non-capturing group
(?=abc)  lookahead — what follows must match abc

Common ready-to-use patterns

text
# Email (pragmatic — not RFC-complete)
[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}

# URL
https?:\/\/[^\s/$.?#].[^\s]*

# Date YYYY-MM-DD
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

# Time HH:MM (24-hour)
^([01]\d|2[0-3]):[0-5]\d$

# IPv4 address
^(\d{1,3}\.){3}\d{1,3}$

# UUID v4
^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$

# Hex colour
^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

# Slug (lowercase, hyphens)
^[a-z0-9]+(?:-[a-z0-9]+)*$

Testing strategy

Always test against: a valid example that should match, a valid example with edge cases (minimum length, special chars), an invalid example that is close (off by one character), and an empty string. Use a regex tester with a live match view so you can see exactly which part of the string matched.

Using regex in JavaScript

javascript
const pattern = /^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$/;

// Test — returns true/false
pattern.test('user@example.com');  // true
pattern.test('not-an-email');       // false

// Match — returns array or null
const m = 'Order #12345 placed'.match(/Order #(\d+)/);
console.log(m?.[1]);  // '12345'

// Replace
'hello world'.replace(/\b\w/g, c => c.toUpperCase());
// 'Hello World'

// Split
'one,two,,three'.split(/,+/);  // ['one', 'two', 'three']

// Named groups (ES2018)
const { year, month } = '2024-01'.match(/(?<year>\d{4})-(?<month>\d{2})/).groups;
Frequently asked questions

Should I use a regex or write a parser?

Regex is good for pattern matching on flat strings. It cannot handle recursive or nested structures — use a proper parser for HTML, JSON, or code. A common rule: if your regex needs more than 2–3 levels of grouping to explain, consider whether a small parsing function would be clearer.

What is catastrophic backtracking?

Some regex patterns on certain inputs cause the engine to try exponentially many combinations before giving up. This can freeze a Node.js server. It usually involves nested quantifiers like (a+)+ on a string that almost but doesn't quite match. Avoid nested quantifiers on variable-length content and test your patterns against long non-matching strings.

Related cheatsheetsAll cheatsheets →
Related guidesAll guides →