DevToolsForYou
Text Processing

How to Use Regular Expressions

A practical guide to regular expressions — learn the core syntax, common patterns, and how to write regex for real-world tasks like email validation, URL matching, and log parsing.

2 min readUpdated Apr 11, 2026

Core syntax building blocks

Regular expressions describe patterns in strings. Everything else builds on a small set of primitives: literal characters, the dot wildcard, character classes, anchors, and quantifiers.

text
.        Any character (except newline by default)
\d       Digit [0-9]
\w       Word character [a-zA-Z0-9_]
\s       Whitespace (space, tab, newline)
\D \W \S Negated versions of the above

^        Start of string (or line in multiline mode)
$        End of string (or line in multiline mode)
\b       Word boundary

[aeiou]  Character class — matches any one of a, e, i, o, u
[^aeiou] Negated class — matches any character NOT in the set
[a-z]    Range — any lowercase letter

Quantifiers

Quantifiers control how many times a preceding element must match. By default they are greedy — they match as much as possible. Append ? to make them lazy (match as little as possible).

text
*        0 or more (greedy)
+        1 or more (greedy)
?        0 or 1 (optional)
{n}      Exactly n times
{n,}     n or more
{n,m}    Between n and m times (inclusive)

*?  +?  {n,m}?   Lazy versions

Groups and capturing

Parentheses create a capturing group that you can back-reference or extract in your code. Use (?:...) for a non-capturing group when you only need grouping for quantifiers or alternation, not extraction.

javascript
// JavaScript example
const date = "2024-11-14";
const m = date.match(/(\d{4})-(\d{2})-(\d{2})/);
if (m) {
  console.log(m[1]); // "2024" — year
  console.log(m[2]); // "11"   — month
  console.log(m[3]); // "14"   — day
}

// Named capturing groups
const { groups } = date.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
console.log(groups.year); // "2024"

Common real-world patterns

These patterns cover the most common regex tasks. Adapt them to your exact requirements rather than treating them as definitive validators.

text
# Email (simplified — full spec is complex)
^[\w.+-]+@[\w-]+\.[a-zA-Z]{2,}$

# IPv4 address
^(\d{1,3}\.){3}\d{1,3}$

# Hex color code
^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$

# ISO 8601 date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$

# URL (http or https)
https?:\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?

# Semantic version (1.2.3 or 1.2.3-beta)
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-[\w.]+)?$

# Whitespace-only line
^\s*$

# HTML tag (simple, not full parser)
<([a-zA-Z][a-zA-Z0-9]*)\b[^>]*>

Flags

Flags change how the engine interprets the pattern. The most important are case-insensitive (i), global (g, find all matches), and multiline (m, make ^ and $ match line boundaries).

javascript
// JavaScript
const str = "Hello World";
str.match(/hello/i);    // case-insensitive → ["Hello"]
str.match(/\w+/g);     // global → ["Hello", "World"]

// Multiline — ^ matches start of each line
"line1\nline2".match(/^\w+/gm); // ["line1", "line2"]
Frequently asked questions

How do I match a literal dot or other special character?

Escape it with a backslash: \. matches a literal dot, \* matches a literal asterisk, \( matches a literal open parenthesis, and so on. The special characters that need escaping are: . * + ? ^ $ { } [ ] | ( ) \.

What does greedy vs. lazy mean?

A greedy quantifier (like .+) matches as much as possible, then backtracks. A lazy quantifier (.+?) matches as little as possible. For example, against '<b>bold</b>', .+ matches the entire string while .+? matches just '<b>bold</b>' (or less, depending on context).

Is regex the right tool for parsing HTML or JSON?

Generally no. HTML and JSON are nested, recursive structures that regex cannot fully handle. Use a proper HTML parser (like the browser's DOMParser or Cheerio in Node) or JSON.parse() for those formats.

Related cheatsheetsAll cheatsheets →
Related guidesAll guides →