Untitled Document

HTML Character sets

There are so many languages through out the world and displaying them properly for the users at any end of the world need the perfect encoding.

Computer systems identify each character by its code position in the ASCII character set. But the ASCII character set is not sufficient for global information system as Web, so HTML uses the more complete list of character set called the Universal Character Set (UCS) which is defined in ISO-10646 and UNICODE.

But for the web document the common use of character encoding are ISO-8859-1 (also referred to as "Latin-1"; usable for most Western European languages), ISO-8859-5 (which supports Cyrillic), SHIFT_JIS (a Japanese encoding), EUC-JP (another Japanese encoding), and UTF-8 (an encoding of ISO 10646 using a different number of bytes for different characters).

Since the ISO character sets do not cover all characters and not compatible with all languages, but the unicode supports all characters, symbols in the world , the UTF(Unicode Transformation Format)-8 is the best choice for web coders.

To refer UTF-8 generally we use the <meta> tag in head section.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />