HTML Validation

HTML Guides for unicode

Learn how to identify and fix common HTML validation errors flagged by the W3C Validator — so your pages are standards-compliant and render correctly across every browser. Also check our Accessibility Guides.

Scan Your Site Free

Character reference expands to a control character (U+0002).

What Are Control Characters?

Control characters occupy code points U+0000 through U+001F and U+007F through U+009F in Unicode. They were originally designed for controlling hardware devices (e.g., U+0002 is "Start of Text," U+0007 is "Bell," U+001B is "Escape"). These characters have no visual representation and carry no semantic meaning in a web document.

The HTML specification explicitly forbids character references that resolve to most control characters. Even though the syntax  is a structurally valid character reference, the character it points to is not a permissible content character. The W3C validator raises this error to flag references like , , , , and others that fall within the control character ranges.

Why This Is a Problem

Standards compliance: The WHATWG HTML Living Standard defines a specific set of "noncharacter" and "control character" code points that must not be referenced. Using them produces a parse error.
Unpredictable rendering: Browsers handle illegal control characters inconsistently. Some may silently discard them, others may render a replacement character (�), and others may exhibit unexpected behavior.
Accessibility: Screen readers and other assistive technologies may choke on or misinterpret control characters, degrading the experience for users who rely on these tools.
Data integrity: Control characters in your markup often indicate a copy-paste error, a corrupted data source, or a templating bug that inserts raw binary data into HTML output.

How to Fix It

Identify the offending reference — look for character references like , , , , or similar that point to control character code points.
Determine intent — figure out what character or content was actually intended. Often, a control character reference is the result of a bug in a data pipeline or template engine.
Remove or replace — either delete the reference entirely or replace it with the correct printable character or HTML entity.

Examples

Incorrect: Control character reference

This markup contains , which expands to the control character U+0002 (Start of Text) and triggers the validation error:

Some text more text

Incorrect: Hexadecimal form of a control character

The same problem occurs with the hexadecimal syntax:

Data:

Correct: Remove the control character reference

If the control character was unintentional, simply remove it:

Some text more text

Correct: Use a valid character reference instead

If you intended to display a special character, use the correct printable code point or named entity. For example, to display a bullet (•), copyright sign (©), or ampersand (&):

Item • Details

Tom & Jerry

Correct: Full document without control characters

<!DOCTYPE html>

<htmllang="en">

<head>

<title>Example Page</title>

</head>

<body>

This paragraph uses only valid character references: &<>©

</body>

</html>

Common Control Character Code Points to Avoid

Reference	Code Point	Name
``	U+0000	Null
``	U+0001	Start of Heading
``	U+0002	Start of Text
``	U+0007	Bell
``	U+0008	Backspace
``	U+000B	Vertical Tab
``	U+000C	Form Feed
``	U+007F	Delete

If your content is generated dynamically (from a database, API, or user input), sanitize the data before inserting it into HTML to strip out control characters. Most server-side languages and templating engines provide utilities for this purpose.

Document uses the Unicode Private Use Area(s), which should not be used in publicly exchanged documents.

Private Use Area (PUA) characters are reserved ranges in Unicode whose interpretation is not specified by any encoding standard. Their meaning is determined entirely by private agreement between cooperating parties—such as a font vendor and its users. This means that a PUA character that renders as a custom icon in one font may appear as a blank square, a question mark, or a completely different glyph when that specific font is unavailable.

This warning commonly appears when using icon fonts like older versions of Font Awesome, Material Icons, or custom symbol fonts. These fonts map their icons to PUA code points. While this approach works visually when the font loads correctly, it creates several problems:

Accessibility: Screen readers cannot interpret PUA characters meaningfully. A visually impaired user may hear nothing, hear "private use area character," or hear an unrelated description depending on their assistive technology.
Portability: If the associated font fails to load (due to network issues, content security policies, or user preferences), the characters become meaningless boxes or blank spaces.
Interoperability: Copy-pasting text containing PUA characters into another application, email client, or document will likely produce garbled or missing content since the receiving system won't know how to interpret those code points.
Standards compliance: The W3C and Unicode Consortium both recommend against using PUA characters in publicly exchanged documents for exactly these reasons.

Sometimes PUA characters sneak into your HTML unintentionally—through copy-pasting from word processors, PDFs, or design tools that use custom encodings. Other times, they are inserted deliberately via CSS content properties or HTML entities by icon font libraries.

To fix this, identify where the PUA characters appear and replace them with standard alternatives. Use inline SVG for icons, standard Unicode symbols where appropriate (e.g., ✓ U+2713 instead of a PUA checkmark), or CSS background images. If you must use an icon font, hide the PUA character from assistive technology using aria-hidden="true" and provide an accessible label separately.

Examples

Problematic: PUA character used directly in HTML

Status: 

Without the specific icon font loaded, PUA characters like U+E001 render as missing glyphs or blank spaces.

Fixed: Using inline SVG with accessible label

Status:

<svgaria-hidden="true"width="16"height="16"viewBox="0 0 16 16">

<pathd="M6 10.8L2.5 7.3 1.1 8.7 6 13.6 14.9 4.7 13.5 3.3z"/>

</svg>

Complete

Problematic: Icon font via CSS content property

<style>

.icon-check::before{

font-family:"MyIcons";

content:"\e001";/* PUA character */

}

</style>

<spanclass="icon-check">

Fixed: Icon font with accessibility safeguards

If you must continue using an icon font, hide the PUA character from assistive technology and provide an accessible alternative:

<style>

.icon-check::before{

font-family:"MyIcons";

content:"\e001";

}

</style>

<spanclass="icon-check"aria-hidden="true">

<spanclass="sr-only">Checkmark

Note that this approach still triggers the validator warning if the PUA character is detectable in the markup. The most robust fix is to avoid PUA characters entirely.

Fixed: Using a standard Unicode character

Status: ✓ Complete

The character ✓ (U+2713, CHECK MARK) is a standard Unicode character that is universally understood and renders consistently across platforms.

Problematic: PUA character from copy-paste

Click here to download

Invisible or unexpected PUA characters sometimes hide in text pasted from external sources. Inspect your source code carefully—many code editors can highlight non-ASCII characters or reveal their code points.

Fixed: Cleaned-up text

Click here to download

If you've audited your document and determined that the PUA characters are intentional and rendering correctly in your target environments, you may choose to accept this warning. However, for publicly accessible web pages, replacing PUA characters with standard alternatives is always the safer and more accessible choice.

🌍 Trusted by teams worldwide

Validate at scale.
Ship accessible websites, faster.

Automated HTML & accessibility validation for large sites. Check thousands of pages against WCAG guidelines and W3C standards in minutes, not days.

Scheduled Reports

API Access

Open Source Standards

$7 / 7 days

Pro Trial

Full Pro access. Cancel anytime.

Start Pro Trial →

Join teams across 40+ countries

HTML Guides for unicode

What Are Control Characters?

Why This Is a Problem

How to Fix It

Examples

Incorrect: Control character reference

Incorrect: Hexadecimal form of a control character

Correct: Remove the control character reference

Correct: Use a valid character reference instead

Correct: Full document without control characters

Common Control Character Code Points to Avoid

Examples

Problematic: PUA character used directly in HTML

Fixed: Using inline SVG with accessible label

Problematic: Icon font via CSS content property

Fixed: Icon font with accessibility safeguards

Fixed: Using a standard Unicode character

Problematic: PUA character from copy-paste

Fixed: Cleaned-up text

Validate at scale. Ship accessible websites, faster.

Pro Trial

Validate at scale.
Ship accessible websites, faster.