First, HTML entities
1. What is an HTML entity?
In HTMl, some characters are reserved. Less than (<) and greater than sign (>), the browser is mistaken for a label
If you want to display reserved characters correctly, you must use the character entity (HTML entities) in the HTML source code.
2. Character entity class
&entity_name or & #entity_number;
Tips:
The advantage of using entity names instead of numbers is that names are easy to remember.
However, the browser may not support all entity names (support for entity numbers is good)
3. Non-breaking space (non-breaking space)
The characters commonly used character body in 4.HTML is a nonbreaking space
Useful character entities in 5.HTML
Detailed address : http://www.w3school.com.cn/html/html_entities.asp
Second, the HTMl character set
If the HTML page is displayed correctly, the browser must know which character set to use.
1. The character set used by the World Wide Web to get up early is ASCII. ASCII supports 0-9 of digits, uppercase and lowercase English letters, and some special characters.
Since many international characters are not ASCII, the default character set for modern browsers is iso-8859-1;
If the page uses characters different from iso-8859-1, it should be specified in the <meta> tab.
2.ISO Character Set
The ISO character set is the standard set of standards defined by the International Standards Organization (ISO) for different alphabets/languages.
3.Unicode Standard
The advent of Unicode is resolved because the character sets listed above have capacity limitations and are incompatible with the multilingual environment, the Unicode Federation has developed the Unicode standard
The Unicode standard covers all characters, punctuation, and symbols in the world. Regardless of the platform, program, or language, Unicode is capable of processing, storing, and exchanging text data.
Unicode can be compatible with different character sets. The most common encoding methods are UTF-8 and utf=16.
The characters in the UTF-8 can make 1-4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 backwards compatible with ASCII.
UTF-8 is a common encoding for Web pages and e-mail.
Note: All HTML 4 processors have support for UTF-8, and all XHTML and XML processors support UTF-8 and UTF-16
Third, HTML ASCII
HTML and XHTML transmit data over the network with standard 7-bit ASCII code.
A 7-bit ASCII code can provide 128 different character values.
Four, HTML ISO-88591
HTML 4.01 supports the ISO 8859-1 character set
The lower part of ISO 8859-1 (code from 1 to 127) is the original 7-bit ASCII;
The higher parts of ISO 8859-1 (code from 160 to 255) all have entity names.
Most of these symbols can be used without an entity reference, but the entity name or entity provides a way to express the expression as a symbol that is not easily entered by the keyboard.
Five, HTML 4.01 symbol entity
Includes mathematical symbols, Greek characters, various arrow symbols, technical symbols, and shapes
VI. HTML URL encoding
The URL-encoded form represents ASCII characters (in hexadecimal format)
The hexadecimal format is used to display non-standard letters and characters in browsers and plugins .
URL encoding converts characters into a format that can be transmitted over the Internet.
URL Uniform Resource Locator
Web browser requests pages from Web server via URL
URL encoding
URLs can only be sent over the Internet using the ASCII character set.
Because URLs often contain characters outside of the ASCII collection, URLs must be converted to valid ASCII formats.
URL encoding uses the% followed by two-bit hexadecimal instead of non-ASCII characters.
URLs cannot contain spaces, and URL encodings typically use "+" to replace spaces.
Resources:
http://www.oschina.net/translate/what-every-web-developer-must-know-about-url-encoding# Thereservedcharactersarenotwhatyouthinktheyare
Http://www.w3schools.com/html/html_entities.asp
Http://www.w3school.com.cn/tags/html_ref_language_codes.asp
Http://www.w3school.com.cn/html/html_entities.asp
Http://en.wikipedia.org/wiki/Percent-encoding
http://blog.csdn.net/wusuopubupt/article/details/8817826
http://blog.163.com/chenzhenhua_007/blog/static/12849264920108119449881/
Http://www.qianxingzhem.com/post-1989.html
http://unicode-table.com/en/#cherokee
Summary: The basic background of HTMl, the standard has a preliminary understanding, but also need in-depth learning.
Considerations for character sets, ASCII, iso-8859-1, relationships between symbols, and HTML URL coding in HTML