HTML Validation

HTML Guides for character reference

Learn how to identify and fix common HTML validation errors flagged by the W3C Validator — so your pages are standards-compliant and render correctly across every browser. Also check our Accessibility Guides.

Scan Your Site Free

A numeric character reference expanded to the C1 controls range.

The C1 controls range spans Unicode code points U+0080 through U+009F (decimal 128–159). These are control characters inherited from older encoding standards, and the HTML specification explicitly forbids numeric character references that resolve to them. When the W3C validator encounters a reference like  (decimal 151, which is U+0097), it flags it because these code points are not valid content characters.

Why This Happens

This issue almost always stems from a confusion between the Windows-1252 (or CP-1252) character encoding and Unicode. Windows-1252 is a legacy encoding that repurposes the byte range 0x80–0x9F to store useful characters like curly quotes, em dashes, and the euro sign. In Unicode, however, those same code point positions are reserved as C1 control characters and carry no printable meaning.

When text originally encoded in Windows-1252 gets converted to HTML numeric references byte-by-byte — without proper re-mapping to Unicode — you end up with references like  instead of the correct ” for a right double quotation mark. The byte value was meaningful in Windows-1252, but the corresponding Unicode code point is a control character.

Why It Matters

Standards compliance: The HTML specification forbids these references. Browsers may handle them inconsistently, with some silently remapping them and others ignoring them entirely.
Portability: While some browsers apply the Windows-1252 remapping as a compatibility quirk, this behavior is not guaranteed across all user agents, platforms, or contexts (such as XML or XHTML, where these references cause parse errors).
Accessibility: Screen readers and other assistive technologies may not interpret C1 control characters at all, resulting in missing or garbled content for users who rely on them.
Data integrity: If your HTML is processed by tools, APIs, or parsers that follow the spec strictly, these invalid references can cause failures or data loss.

How to Fix It

Use named character references where available — they’re the most readable option (e.g., —, ’, €).
Use correct Unicode code points in numeric references if a named reference isn’t available (e.g., — or — for an em dash).
Use the literal UTF-8 character directly in your source file. If your document is saved as UTF-8 (which it should be), you can simply type —, ', or € directly.
Audit legacy content that may have been migrated from older systems or databases using Windows-1252 encoding. A search for numeric references in the 128–159 decimal range will find all instances.

Examples

Invalid: C1 control range references

These references resolve to C1 control code points, not the intended characters:

<p>Price: &#128;50</p>
<p>She said, &#147;Hello.&#148;</p>
<p>2020&#150;2024</p>
<p>Wait &#151; what?</p>
<p>It&#146;s a beautiful day.</p>

Fixed: Correct Unicode references

Replace each invalid reference with the proper Unicode code point or named reference:

<p>Price: &euro;50</p>
<p>She said, &ldquo;Hello.&rdquo;</p>
<p>2020&ndash;2024</p>
<p>Wait &mdash; what?</p>
<p>It&rsquo;s a beautiful day.</p>

Fixed: Using numeric Unicode code points

If you prefer numeric references, use the correct Unicode values:

<p>Price: &#x20AC;50</p>
<p>She said, &#8220;Hello.&#8221;</p>
<p>2020&#8211;2024</p>
<p>Wait &#8212; what?</p>
<p>It&#8217;s a beautiful day.</p>

Fixed: Using literal UTF-8 characters

The simplest approach — just use the characters directly in a UTF-8 encoded document:

<p>Price: €50</p>
<p>She said, "Hello."</p>
<p>2020–2024</p>
<p>Wait — what?</p>
<p>It's a beautiful day.</p>

Character reference expands to a control character (U+0002).

What Are Control Characters?

Control characters occupy code points U+0000 through U+001F and U+007F through U+009F in Unicode. They were originally designed for controlling hardware devices (e.g., U+0002 is “Start of Text,” U+0007 is “Bell,” U+001B is “Escape”). These characters have no visual representation and carry no semantic meaning in a web document.

The HTML specification explicitly forbids character references that resolve to most control characters. Even though the syntax  is a structurally valid character reference, the character it points to is not a permissible content character. The W3C validator raises this error to flag references like , , , , and others that fall within the control character ranges.

Why This Is a Problem

Standards compliance: The WHATWG HTML Living Standard defines a specific set of “noncharacter” and “control character” code points that must not be referenced. Using them produces a parse error.
Unpredictable rendering: Browsers handle illegal control characters inconsistently. Some may silently discard them, others may render a replacement character (�), and others may exhibit unexpected behavior.
Accessibility: Screen readers and other assistive technologies may choke on or misinterpret control characters, degrading the experience for users who rely on these tools.
Data integrity: Control characters in your markup often indicate a copy-paste error, a corrupted data source, or a templating bug that inserts raw binary data into HTML output.

How to Fix It

Identify the offending reference — look for character references like , , , , or similar that point to control character code points.
Determine intent — figure out what character or content was actually intended. Often, a control character reference is the result of a bug in a data pipeline or template engine.
Remove or replace — either delete the reference entirely or replace it with the correct printable character or HTML entity.

Examples

Incorrect: Control character reference

This markup contains , which expands to the control character U+0002 (Start of Text) and triggers the validation error:

<p>Some text &#2; more text</p>

Incorrect: Hexadecimal form of a control character

The same problem occurs with the hexadecimal syntax:

<p>Data: &#x02;</p>

Correct: Remove the control character reference

If the control character was unintentional, simply remove it:

<p>Some text more text</p>

Correct: Use a valid character reference instead

If you intended to display a special character, use the correct printable code point or named entity. For example, to display a bullet (•), copyright sign (©), or ampersand (&):

<p>Item &#8226; Details</p>
<p>Copyright &#169; 2024</p>
<p>Tom &amp; Jerry</p>

Correct: Full document without control characters

<!DOCTYPE html>
<html lang="en">
<head>
  <title>Example Page</title>
</head>
<body>
  <p>This paragraph uses only valid character references: &amp; &lt; &gt; &#169;</p>
</body>
</html>

Common Control Character Code Points to Avoid

Reference	Code Point	Name
**	U+0000	Null
**	U+0001	Start of Heading
**	U+0002	Start of Text
**	U+0007	Bell
**	U+0008	Backspace
**	U+000B	Vertical Tab
**	U+000C	Form Feed
**	U+007F	Delete

If your content is generated dynamically (from a database, API, or user input), sanitize the data before inserting it into HTML to strip out control characters. Most server-side languages and templating engines provide utilities for this purpose.

Character reference was not terminated by a semicolon.

Character references are how HTML represents special characters that would otherwise be interpreted as markup or that aren’t easily typed on a keyboard. They come in three forms:

Named references like &, <, ©
Decimal numeric references like <, ©
Hexadecimal numeric references like <, ©

All three forms share the same structure: they begin with & and must end with ;. When you omit the trailing semicolon, the HTML parser enters error recovery mode. Depending on the context, it may still resolve the reference (browsers are lenient), but this behavior is not guaranteed and varies across situations. For example, &copy without a semicolon might still render as ©, but &notit could be misinterpreted as the ¬ (¬) reference followed by it, producing unexpected output like “¬it” instead of the literal text “&notit”.

Why this matters

Unpredictable rendering: Without the semicolon, browsers use heuristic error recovery that can produce different results depending on surrounding text. What looks fine today might break with different adjacent characters.
Standards compliance: The WHATWG HTML specification requires the semicolon terminator. Omitting it is a parse error.
Maintainability: Other developers (or future you) may not realize the ampersand was intended as a character reference, making the code harder to read and maintain.
Data integrity: In URLs within href attributes, a missing semicolon on a character reference can corrupt query parameters and produce broken links.

How to fix it

Add the missing semicolon to the end of every character reference.
If you meant a literal ampersand, use & instead of a bare &. This is especially common in URLs with query strings.
Search your document for patterns like &something without a trailing ; to catch all instances.

Examples

❌ Missing semicolon on named references

<p>5 &lt 10 and 10 &gt 5</p>
<p>&copy 2024 All rights reserved</p>

✅ Properly terminated named references

<p>5 &lt; 10 and 10 &gt; 5</p>
<p>&copy; 2024 All rights reserved</p>

❌ Missing semicolon on numeric references

<p>The letter A: &#65</p>
<p>Hex example: &#x41</p>

✅ Properly terminated numeric references

<p>The letter A: &#65;</p>
<p>Hex example: &#x41;</p>

❌ Bare ampersand in a URL (common mistake)

<a href="https://example.com/search?name=alice&age=30">Search</a>

Here the validator sees &age and tries to interpret it as a character reference without a semicolon.

✅ Escaped ampersand in a URL

<a href="https://example.com/search?name=alice&amp;age=30">Search</a>

❌ Ambiguous reference causing wrong output

<p>The entity &notit; doesn't exist, but &not without a semicolon resolves to ¬</p>

✅ Use & when you want a literal ampersand

<p>The text &amp;notit is displayed literally when properly escaped.</p>

A quick rule of thumb: every & in your HTML should either be the start of a complete, semicolon-terminated character reference, or it should itself be written as &.

✓ HTML validation on every page

✓ Accessibility checks (WCAG / Axe)

✓ Scheduled automatic reports

✓ Up to 5,000 pages per report

From €19/month Compare plans →

Category: HTML Validation
Engine: W3C Validator
Total guides: 3

Browse by Tag

C1 controls range CSS a aria auto autocomplete bad value button character reference charset control character css div doctype encoding end tag heading height href html http-equiv id iframe img input label lang link meta name nesting not allowed obsolete role script section select semicolon sizes span src srcset start tag stray svg table td tel th type unicode utf-8 video width xmlns

Ready to validate your sites?
Start your free trial today.

Pro Trial Free Trial

HTML Guides for character reference

Why This Happens

Why It Matters

How to Fix It

Examples

Invalid: C1 control range references

Fixed: Correct Unicode references

Fixed: Using numeric Unicode code points

Fixed: Using literal UTF-8 characters

What Are Control Characters?

Why This Is a Problem

How to Fix It

Examples

Incorrect: Control character reference

Incorrect: Hexadecimal form of a control character

Correct: Remove the control character reference

Correct: Use a valid character reference instead

Correct: Full document without control characters

Common Control Character Code Points to Avoid

Why this matters

How to fix it

Examples

❌ Missing semicolon on named references

✅ Properly terminated named references

❌ Missing semicolon on numeric references

✅ Properly terminated numeric references

❌ Bare ampersand in a URL (common mistake)

✅ Escaped ampersand in a URL

❌ Ambiguous reference causing wrong output

✅ Use &amp; when you want a literal ampersand

✅ Use & when you want a literal ampersand