Source: Unicode to Gbk,gnk to Unicode, to solve the problem of FATFS the ROM occupied by the Chinese Code tableBefore the use of the 512KB ROM STM32, but recently used only 128KB, want to use FATFS display support long filenames, found that add cc936.c after the ROM is not enough, decided to store this bidirectional code table in the external memory, flash or SD
[Unicode] character encoding table information, unicode character encoding
The UTF-8 is somewhat similar to the Haffman encoding, which encodes Unicode:
0x00-0x7F characters, expressed in a single byte;
The character 0x80-0x7FF is expressed in two bytes;
0x800-0xFFFF characters are represented in 3 bytes;
① The
1.1. Question ProblemYou need to deal with data, doesn ' t fit in the ASCII character set.You need to handle data that is not suitable for the ASCII character set.1.2. Resolve SolutionUnicode strings can be encoded in plain strings in a variety of ways, according to whichever encoding you choose:Unicode strings can be encoded in a number of ways as normal strings, according to the encoding you choose (encoding):1#将Unicode转换成普通的
I have been using the KB Rom stm32, but I only recently used kb. I want to use fatfs to display the support for long file names. I found that the Rom is not enough after cc936.c is added, it is decided to store this two-way code table in external memory, flash or SD card, only can read;
Encoding conversion function in cc936.c after modificationWchar ff_convert (/* converted code, 0 means Conversion error */Wchar SRC,/* Character code to be converted *
where the Python system defaults to storing all installed codec. Here we can find all the codecs that come with the Python release. In practice, every new codec will install itself here. It is important to note that the Python system does not really require that all codec be installed here. The user can place the new codec in any place they like, as long as the
corresponding codec, you must register it in Python. To register a new codec, you must use the aliases. py file under the encodings directory. This file only defines a hash table aliases. Each of its keys corresponds to the name of each Codec in use, that is, the second parameter value of the Unicode () built-in function; the value corresponding to each key is a
Unicode range and Presentation languageUnicode is a universal character set that contains 65,535 characters. The computer stores Unicode as an encoding when it handles special characters (all characters except the ASCII table). Of course, Unicode unification has taken a lot of effort, and there are some incompatibiliti
Python _ str _ (self) and _ unicode _ (self) ,__ str ____ unicode __
Official documents: https://docs.python.org/2.7/reference/datamodel.html? Highlight =__ mro __
Object.
_ Str __
(
Self
)
Called byStr ()Built-in function and byPrintStatement to compute the "informal" string representation of an object. This differs from_ Repr __()In that it does n
characters. The first 127 are the same as ASCII, and the next 127 are other characters defined by windows-1252.
A windows-1252 encoded string looks like this:[) [98] [[] [] [[]] = "abc–"
Windows-1252 is still a byte string, but you have not seen the last byte value is greater than 126. If Python tries to decode the byte stream with the default ASCII standard, it will give an error. Let's see what happens when Pyt
characters. The first 127 are the same as ASCII, and the next 127 are other characters defined by windows-1252.
A windows-1252 encoded string looks like this:[) [98] [[] [] [[]] = "abc–"
Windows-1252 is still a byte string, but you have not seen the last byte value is greater than 126. If Python tries to decode the byte stream with the default ASCII standard, it will give an error. Let's see what happens when Pyt
11111111 Convert to decimal 255 encoding Chinese I'm afraid it's not enough, Chinese has tens of thousands of, and later on the original 1 bytes (11111111) base and added 1 bytes ASCII English 11111111 maximum support 255 characters occupy 1 bytes of Unicode Support Chinese 11111111 11111111 100w+ occupies 2-4 bytes Unicode in order to be compatible with 8-bit ASCII, the original ASCII 8-bit based on the u
, or difficult to use correctly. So developers often use the wrong.
A little better for Unicode support. These languages appear only after Unicode is widely prevalent, but the way Unicode is manipulated in languages is a serious error. Although these languages are born later, they still contain all the shortcomings of the first language. In my experience, one
A good article on STR and Unicode
To sort out the Python code-related content
Note: The following discussion is for the python2.x version, py3k to be tried
Begin
When handling Chinese in Python, read files or messages, HTTP parameters, and so on
A run, found garbled (string processing, read-write file, print)
Then, most people's practice is to invoke Encode
absrtact : When writing Python scripts, if we use Python to process Web page data or work with Chinese characters, this error message often occurs: syntaxerror:non-ascii character ' \ Xe6 ' in file./filename.py of Line 3, but no encoding declared. This article focuses on issues related to Unicode and Chinese, and special character encoding in
In Python processing Chinese, reading files or messages, if found garbled (string processing, read and write files, print), most people's practice is to call Encode/decode for debugging, and did not explicitly consider why garbled, today we discuss how to deal with the coding problem.
Note: The following discussion is python2.x version, not tested under PY3K
Errors that occur most frequently during debugging
Error 1
Traceback (most recent): File "
Er
The string also has an encoding problem.Because a computer can only handle numbers, if you are working with text, you must convert the text to a number before processing it. The oldest computer was designed with 8 bits (bit) as a byte (byte), so the largest integer that a Word energy saver represents is 255 (binary 11111111 = decimal 255), and 0-255 is used to denote uppercase and lowercase letters, numbers, and some symbols. This Code table is called
Unicode and Utf-8 in Python
The history of the character set mentioned in this article is a brief explanation of the relationship between Unicode and Utf-8, briefly summarizing:Utf-8 and Utf-16, Utf-32 is a class, the realization of the function is the same, but the most widely used utf-8, but Unicode and utf-8 is
Unicode strings can be encoded in a number of ways as normal strings, according to the encoding you choose (encoding):Toggle Line Numbers1#将Unicode转换成普通的Python字符串:"encoding (encode)" 2unicodestring = u"Hello World" 3utf8string = Unicodestring.encode ("Utf-8") 4asciistring = Unicodestring.encode ("ASCII") 5isostring = Unicodestring.encode ("iso-8859-1")
The first thing to figure out is that in Python, string object and Unicode object are two different types.String object is a sequence consisting of characters, and Unicode object is a sequence of Unicode code units.Character in string are encoded in a variety of ways, such as Single-byte ASCII, Double-byte GB2312, and
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.