CodingA string is a data type, but a particular string is a coding problem.Because a computer can only handle numbers, if you are working with text, you must convert the text to a number before processing it. The oldest computer was designed with 8 bits (bit) as a byte (byte), so a single word energy-saving representation of the largest integer is 255 (binary 11111111 = decimal 255), if you want to represent a larger integer, you must use more bytes. For example, two bytes can represent the larg
The first thing to figure out is that in Python, string object and Unicode object are two different types.String object is a sequence consisting of characters, and Unicode object is a sequence of Unicode code units.Character in string are encoded in a variety of ways, such as Single-byte ASCII, Double-byte GB2312, and so on, such as UTF-8. Obviously to interpret
Brief introduction
If you are writing programs that target non-English-speaking users, such as China, Japan, Eastern Europe, and the Middle East, then you must be familiar with the UNICODE character set. Especially if you are writing a program for users in these countries and regions with Visual C++/MFC, if you want your application to have a wider audience, you must consider code UNICODE compatibility, wh
The representation of a string inside Python is Unicode encoding, so in encoding conversion, it is usually necessary to use Unicode as the intermediate encoding, that is, decoding the other encoded string (decode) into Unicode first. From Unicode encoding (encode) to another encoding.The default encoding of the string
Today, using Unicode as a string is a common sense, but it's still a headache for some programming languages with a long history. Without the support of a third-party library, C + + does not actually support Unicode effectively, even if it is UTF8. (Note: This article discusses the encoding scheme of strings in memory, not file or network traffic.) )When the STL's string template is born,
>
Unicode is commonly used in the UCS-2, it uses two bytes to encode a character, such as the Chinese character "warp" encoding is 0X7ECF, 0X7ECF converted to decimal is 32463,ucs-2 with two bytes to encode characters, 2 16 is equal to 65536, so ucs- 2 can encode a maximum of 65,536 characters. Encoding from 0 to 127 characters like ASCII-encoded characters, such as the letter "a" Unicode encoding is 0x006
Unicode Environment SettingsWhen installing Visual Studio, you must add the Unicode option when selecting VC ++ to ensure that the relevant library files can be copied to system32.
Unicode compilation settings:C/C ++, Preprocessor difinitions remove _ MBCS, add _ Unicode, UnicodeSet entry to wwinmaincrtstartup in proje
I searched for the Unicode range of Chinese characters on the Internet and generally gave "U + 4e00 .. u + 9fa5 ". But in fact, this range is incomplete, and even the basic fullwidth (Chinese) punctuation is not included. Based on the latest Unicode 5.0 version, sort it out as follows:
Note: In most application scenarios, we can use only (1), (2), (3), (4), (5) sets as the basis for CJK judgment.
1) Stand
UTF code
The UTF-8 is to encode the UCS in 8-bit units. The encoding method from UCS-2 to UTF-8 is as follows:
UCS-2 encoding (16-in-system)
UTF-8 byte stream (binary)
0000-007f
0xxxxxxx
0080-07ff
110xxxxx 10xxxxxx
0800-ffff
1110xxxx 10xxxxxx 10xxxxxx
For example, the Unicode encoding of the word "Han" is 6c49. 6c49 between 0800-FFFF, so be sure to use the 3-byte template: 1110xxxx 10xxxxxx 10xxxxxx. The 6c49 is written as binary: 0110 110001 001001,
Unicode programming in VC ++
Author: Han yaoxu
Download source code
1. What is Unicode?
Start with ASCII. ASCII is an encoding standard used to represent English characters. Each ASCII character occupies 1 byte. Therefore, the maximum number of characters that can be represented by ASCII encoding is 255 (00H-FFH ). In fact, there are not so many English characters, generally only the first 128 (00H-7FH, the
Python's coding problems should be plagued by every child's shoe that writes Python code.Python2 and Python3 's default encoding is different, so it is necessary to find out, otherwise search on the internet a bunch of answers a try, or quite a waste of time.first of all, the Python 2.x str
s = "I'm not garbled"
S is a string that itself stores a byte code (bytes).So what is the format of this byte code?If this code is entered on the interpreter, then the S format is the interpreter's encoding
Re-understanding Unicode and UTF8 encoding
Until today, to be exact, I just realized that UTF-8 encoding and Unicode coding are not the same, and that there is a difference between embarrassingThere is a certain connection between them, to see the difference between them:The length of the UTF-8 is not necessarily, it may be 1, 2, 3 bytesUnicode length must be 2 bytes (USC-2)UTF-8 can convert to and from
IntroducedIn front-end development, in order to make Chinese in different environments can be very good display, the general is to convert Chinese into Unicode format, that is, \u4f60, such as: "Hello Ah," The Unicode Encoding "\u4f60\u597d\u554a".JS to convert Chinese into Unicode encoding is very simple.function Convert2unicode (str) {return str.replace (/[\u00
A, Copy a Unicode file to an ANSI file
Witoansi.vbs file:
Copy Code code as follows:
' Utility to rewrite a Unicode text file as a ANSI text file
' For use with Windows scripting Host, CScript.exe or WScript.exe
' Copyright (c) 1999, Microsoft Corporation
'
Option Explicit
' Filesystemobject.createtextfile and Filesystemobject.opentextfileConst openasascii = 0Const Openasunicode =-1
Manual Unicode Turn Utf-8
Colleagues asked, if the Unicode into a utf-8, there are still 0 characters. It's a bit of a question to know.
So I checked the Unicoe turn utf-8. This view of the process, or to correct the original many mistakes of the understanding.
Just log down as some of your own references.
Originally thought Unicode is
Today engaged in Sxna, encountered the problem of coding conversion, engaged in one hours, experienced hardships, finally summed up the following Chinese characters converted to various UTF-8 Unicode ANSI coding program. Do not dare to enjoy the exclusive, hereby dedicate to everyone programming enthusiasts!
--------------------------------------------------------------------------------
Matches Google's UTF code
Chinese characters go through encode
// Note:
Regular Expression matching Chinese characters: [\ u4e00-\ u9fa5]
In QT, the following regular expression can be used for verification:
// CJK Unified ideographic charactersQregexp rx ("[\ u4e00-\ u9fa5] + ");Qvalidator * validator = new qregexpvalidator (RX, this );UI-> lineeditcustom-> setvalidator (validator );
This contains CJK Unified ideographic characters (probably basic Chinese characters), which is basically enough.
To view all
Reprint: http://www.jb51.net/article/56096.htm
This article mainly introduces the Java string string and Unicode character conversion code, the need for friends can refer to the following
After the Java environment is installed, the JDK's Bin directory has a native2ascii.exe that can achieve similar functions, but it can also achieve the same functionality through Java code.
String Conversion Unicode Java
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.