Why does an emoji count as more than one byte?

UTF-8 is a variable-width encoding. ASCII characters (code points 0–127) use 1 byte. Characters in the range 128–2047 use 2 bytes. Characters from 2048–65535 use 3 bytes. Most emoji are in the supplementary planes above 65535 and use 4 bytes each. Some emoji sequences (like skin tone modifiers or flag sequences) use even more bytes by combining multiple code points.

What is the byte limit of a VARCHAR(255) column?

In MySQL and MariaDB with utf8mb4 encoding (which supports full Unicode including emoji), VARCHAR(255) means 255 characters but up to 1020 bytes, because each character can use up to 4 bytes. In PostgreSQL, VARCHAR(255) is a character limit, not a byte limit, so it always allows 255 characters regardless of encoding.

Does JavaScript's string length give character or byte count?

JavaScript string .length counts UTF-16 code units, not Unicode code points or bytes. Basic Multilingual Plane characters count as 1. Emoji and characters above U+FFFF count as 2 (a surrogate pair). To get the UTF-8 byte count in JavaScript, use new TextEncoder().encode(str).length.

Word Count vs Byte Count — DevToolsForYou

Word count, character count, and byte count all measure text size, but they diverge significantly for multilingual content, emoji, and systems with byte-level limits. Using the wrong metric can cause silent data truncation, broken API calls, or rejected database writes.

Word count and byte count measure different things and diverge significantly for Unicode text. Learn when each metric matters and why character count alone is not enough for storage and API limits.

Updated Apr 11, 2026

Word Count

Open Word Count →

Word count measures the number of words, sentences, paragraphs, and estimated reading time in a text. It is the human-readable measure of content length and is the standard metric for writing, publishing, and content planning.

Use cases

Checking article or blog post length before publishing
Meeting minimum or maximum word count requirements for academic work
Estimating reading time for content previews
Checking pull request description or commit message length

Strengths

Intuitively meaningful — reflects how long content feels to read
Language-independent for most practical purposes
Standard metric for content, SEO, and editorial workflows

Limitations

Irrelevant for storage and transmission — systems use bytes, not words
Definition of 'word' varies (hyphenated words, contractions, URLs)
Does not predict API payload size or database column usage

Byte Count

Open Byte Count →

Byte count is the actual storage size of a string in a specific encoding, almost always UTF-8 on the modern web. ASCII characters use 1 byte each, but characters outside the basic Latin range use 2–4 bytes, meaning emoji and CJK characters consume significantly more storage than their character count suggests.

Use cases

Checking whether a value fits within a database column byte limit (VARCHAR(255))
Verifying that an HTTP header or cookie does not exceed size restrictions
Ensuring a message queue payload stays under the per-message byte cap
Sizing buffers and network packets for binary protocol implementations

Strengths

The accurate measure for storage, transmission, and protocol limits
Deterministic — same string always produces the same byte count in the same encoding
Catches encoding issues with emoji and multibyte characters that character count misses

Limitations

Not intuitive for humans — byte count does not reflect perceived length
Varies by encoding — UTF-8, UTF-16, and Latin-1 produce different counts
Requires knowing the target encoding to be meaningful

When to use which

Use word count when you are measuring content for humans — writing, SEO, editorial guidelines, and reading time estimates. Use byte count when you are interacting with any system that enforces limits in bytes — databases, APIs, HTTP headers, message queues, and network protocols. When in doubt, check the byte count: a string of 100 emoji is 100 characters but 400 bytes in UTF-8.

Frequently asked questions

Why does an emoji count as more than one byte?: UTF-8 is a variable-width encoding. ASCII characters (code points 0–127) use 1 byte. Characters in the range 128–2047 use 2 bytes. Characters from 2048–65535 use 3 bytes. Most emoji are in the supplementary planes above 65535 and use 4 bytes each. Some emoji sequences (like skin tone modifiers or flag sequences) use even more bytes by combining multiple code points.
What is the byte limit of a VARCHAR(255) column?: In MySQL and MariaDB with utf8mb4 encoding (which supports full Unicode including emoji), VARCHAR(255) means 255 characters but up to 1020 bytes, because each character can use up to 4 bytes. In PostgreSQL, VARCHAR(255) is a character limit, not a byte limit, so it always allows 255 characters regardless of encoding.
Does JavaScript's string length give character or byte count?: JavaScript string .length counts UTF-16 code units, not Unicode code points or bytes. Basic Multilingual Plane characters count as 1. Emoji and characters above U+FFFF count as 2 (a surrogate pair). To get the UTF-8 byte count in JavaScript, use new TextEncoder().encode(str).length.

More comparisons

Word Count

Use cases

Strengths

Limitations

Byte Count

Use cases

Strengths

Limitations

Why does an emoji count as more than one byte?

What is the byte limit of a VARCHAR(255) column?

Does JavaScript's string length give character or byte count?

Base64 vs URL Encoding

JSON vs YAML

MD5 vs SHA-256