HTML Character Encodings

Ensuring proper text rendering in web pages

About Character Encodings

Character encoding determines how bytes are mapped to characters in your HTML documents. Using the correct encoding is essential for displaying text properly across different languages and scripts.

Global Support
Multilingual
Text Rendering

Character Encoding Basics

What is Character Encoding?

Character encoding is a system that pairs each character in a character set with something else—such as a number or sequence of bits—to facilitate storage and transmission of text.

  • Defines how bytes map to characters
  • Essential for multilingual content
  • Affects text rendering and processing
  • Must be declared early in the HTML document
<!-- Modern HTML5 charset declaration -->
<meta charset="UTF-8">

<!-- Legacy HTML4 declaration -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<!-- XHTML declaration -->
<?xml version="1.0" encoding="UTF-8"?>

Why UTF-8 is Recommended

UTF-8 has become the dominant character encoding for the web because:

  • Supports all Unicode characters
  • Backward compatible with ASCII
  • Efficient for English and Western languages
  • Variable-width (1-4 bytes per character)
  • Supported by all modern browsers and devices
  • Default encoding for HTML5

Did you know? As of 2023, UTF-8 is used by 98.2% of all websites, and 100% of modern websites when considering only those that declare an encoding.

Common Character Encodings

EncodingDeclarationDescriptionUsageLanguagesAction
UTF-8meta charset="UTF-8"Unicode Transformation Format (8-bit) - supports all Unicode characters98% of all websitesAll modern languages
ISO-8859-1meta charset="ISO-8859-1"Latin-1 Western European - limited to 256 charactersLegacy systemsWestern European
Windows-1252meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"Western European extension of ISO-8859-1Older Windows systemsWestern European
Shift_JISmeta charset="Shift_JIS"Japanese character encodingJapanese websitesJapanese
EUC-JPmeta charset="EUC-JP"Extended Unix Code for JapaneseJapanese Unix systemsJapanese
GB2312meta charset="GB2312"Simplified Chinese encodingChinese websitesSimplified Chinese
Big5meta charset="Big5"Traditional Chinese encodingTaiwan and Hong KongTraditional Chinese

How to Declare Character Encoding

Proper Declaration Methods

The character encoding should be declared as early as possible in your HTML document:

  1. Must appear within the first 1024 bytes
  2. Should be in the <head> section
  3. Preferably the first element after <head>
  4. Only one encoding declaration per document
<!DOCTYPE html>
<html>
<head>
  <!-- Best practice: HTML5 charset declaration -->
  <meta charset="UTF-8">
  <title>Page Title</title>
  
  <!-- Other meta tags and links -->
</head>
<body>
  <!-- Content -->
</body>
</html>

HTTP Headers and Encoding

Character encoding can also be specified in HTTP headers, which takes precedence over in-document declarations:

  • Content-Type: text/html; charset=UTF-8
  • Must match the document's actual encoding
  • Useful for non-HTML files (CSS, JS, etc.)
  • Can be set in server configuration
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Date: Wed, 01 Jan 2023 12:00:00 GMT
Server: Apache

<!DOCTYPE html>
<html>
...
</html>

Troubleshooting Encoding Issues

Common Encoding Problems and Solutions

1. Gibberish or Question Marks

Symptoms: Text appears as random characters or question marks (���)

Solution: Ensure the declared encoding matches the actual file encoding. Save files as UTF-8 and declare <meta charset="UTF-8">.

2. Mixed Encoding in Same Page

Symptoms: Some text renders correctly while other parts don't

Solution: Ensure all external resources (CSS, JS) are also UTF-8 encoded. Check database connections if content is dynamic.

3. Double Encoding

Symptoms: Characters appear with extra symbols (é instead of é)

Solution: The text has been encoded multiple times. Ensure your server isn't applying additional encoding transformations.

4. BOM (Byte Order Mark) Issues

Symptoms: Strange characters at start of file () or layout issues

Solution: Save files without BOM or ensure your server handles BOM correctly.

Encoding Best Practices

Do's:

  • Always use UTF-8 for new projects
  • Declare encoding early in the document
  • Ensure your editor saves files in UTF-8
  • Set encoding in HTTP headers when possible
  • Test with multilingual content
  • Check database connection encodings

Don'ts:

  • Don't rely on default encodings
  • Avoid legacy encodings unless necessary
  • Don't mix encodings in the same document
  • Avoid BOM in UTF-8 for web content
  • Don't forget to check external resources
  • Avoid server-side encoding conversions

Try Our Encoding Tester

Experiment with different character encodings in our interactive editor:

  • Test multilingual text rendering
  • Try different charset declarations
  • See how encoding affects special characters
  • Experiment with encoding-related issues
<!DOCTYPE html>
<!-- Try changing the charset to see effects -->
<html>
<head>
  <meta charset="ISO-8859-1">
  <title>Encoding Test</title>
</head>
<body>
  <p>English: Hello</p>
  <p>French: Bonjour</p>
  <p>Chinese: 你好</p>
  <p>Russian: Привет</p>
  <p>Special: © € π</p>
</body>
</html>
< Previous (HTML Entities)Next (Tag Reference) >