Today, because of the need to read the directory and files in Windows, fortunately before doing this work (see "Under Linux and Windows Traversal directory method and how to achieve a consistent operation", encapsulated in Windows and Linux read
Basics of UTF-8 Character Set
Brief character set history
Among all character sets, the most well-known number is the 7-bit ASCII character set. It is short for American Standards Committee for information interchange. It is designed for American
The principle is very simple, because GB2312/GBK is Chinese two bytes, these two bytes is the range of values, and utf-8 in the Chinese characters are three bytes, also each byte has a value range. The English language is less than 128 and takes
Network Working Group D. crockford
Request for comments: 4627 json.org
Category: informational July 2006 the application/JSON media type for JavaScript Object Notation (JSON) status of this memo provides information for the Internet community. it
Unicode in JavaScriptby Jinya"Reprint please indicate the source, Http://blog.csdn.net/EI__Nino"Noun Explanation:BMP: (basicmultilingual Plane) It is also referred to as "0th plane", Plane 0UCS: Universal Character Set (Universal Character set,
In the previous blog post, we described how to output bmpstring content in the console interface of Windows, but the methods there are not available under Linux. If you put the sample code there to execute under Linux, the output is garbled. The
In the previous blog post, we described how to output bmpstring content in the console interface of Windows, but the method there does not apply to Linux. Suppose you put the demo sample code there and run it under Linux. The result of the output is
Characters are the general name of various words and symbols, including the national characters, punctuation marks, graphic symbols, numbers and so on. A character set is a collection of multiple characters, with many character sets, and each
With regard to the relationship between Unicode and UTF, it can be simple to remember: Unicode is a coding organization, a coding specification, in the Java middle Finger utf-16;utf is a Unicode encoded translation conversion format, so as to be
Characters are the general name of various words and symbols, including the national characters, punctuation marks, graphic symbols, numbers and so on. A character set is a collection of multiple characters, with many character sets, and each
The file_get_contents function in PHP5 obtains the file content, which is actually read in binary format. Therefore, when you use file_get_contents to obtain a UTF-8 file with BOM, it does not remove the BOM of the UTF-8, and some unexpected results
JAVA outputs a UTF-8 encoded file with BOMWhen the CSV file is output from the http response, it is not included by default when it is set to utf8.Bom, But windows Excel uses bom to confirm utf8 encoding, and all bom needs to be written to the
For Windows Notepad: Ansi:gb2312java should use GBK to decode Unicode: signed Utf-16lejava should use UTF-16 decoding Unicode big endian: Signed Utf-16bejava should use UTF-16 decoding UTF-8: Signed Utf-8java can only manually remove the signature
Document directory
ASCII
GB2312
GBK
UTF Encoding
Setlocale
Miserable programmer
ASCII
Table language English and Western European language.
ASCII is represented by 7 characters, which can represent 128 characters. Its Extension uses 8
The following table defines the available code page identifiers.
NoteANSI code pages can be different on different computers, or can be changed for a single computer, leading to data upload uption. for the most consistent results, applications
Some problems are knowledge problems. For example, in the program, we often use char szstr [32], strcpy (), sprintf (). When different encodings are involved, some situations may exceed our image. This is because we ignore some common sense. For
Character Set
ASCII character setAmerrican stadard code for information interchange is abbreviated as ISO/IEC 646.ASCII is stored in seven bits (7-bit, 0-127) and is a single-byte encoding system. The hexadecimal format is 0-7f. For example,
Recently caused by character encoding problems very headache, many encoding methods can be described as "familiar with the unknown", gb2312, ANSI, UTF-8, Unicode .... So calm down and study hard.
References:
Http://www.ruanyifeng.com/blog/2007/10
Unicode in JavaScript, unicodejavascript
Unicode in JavaScript
By Jinya
[For more information, see http://blog.csdn.net/ei1_nino]
Glossary:
BMP :( BasicMultilingual Plane) It is also referred to as "Zero plane", Plane 0
UCS: Universal Character
C ++ Study Notes (summary of some new features 1)C ++ Study Notes (summary of some new features 1)
Although I have been using C ++ for more than a year, I have always followed the principle that I can only use it. I have not thoroughly studied C ++
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.