Python Learning note 1 (variable, character encoding)

Source: Internet
Author: User

Follow the old boys ' education of Alex's video to learn python, write a blog to record their learning process, and record the points of knowledge taught.

This microblog begins with a ritual "Hello word" learning a program, such as the introduction of Python, History, Python2 and 3 differences, installation, advantages and disadvantages, and will not be recorded here.

This is my first time to write a blog, whether it is a summary of content or the appearance of the form will certainly be deficient. But this blog is mainly for their own record of learning process, if you want to self-study, it is recommended to you Alex Gold Horn king and Eric Silver Horn King's Weibo.

Golden Horn King Portal: http://www.cnblogs.com/alex3714

Silver Horn King Portal: Http://www.cnblogs.com/wupeiqi

1.Hello Word Programs

Create a hello.py file in Pycharm, and write down the following code:

1 Print ("hello,world! ")
Hello Word

Then, when my Python program has a life, it opens up a new adventure.

2. Variables

Rules for variable definitions

    • Variable names can only be any combination of letters, numbers, or underscores
    • The first character of a variable name cannot be a number
    • The following keywords cannot be declared as variable names
      [' and ', ' as ', ' assert ', ' Break ', ' class ', ' Continue ', ' Def ', ' del ', ' elif ', ' Else ', ' except ', ' exec ', ' finally ', ' for ', ' F ' Rom ', ' Global ', ' if ', ' import ', ' in ', ' was ', ' lambda ', ' not ', ' or ', ' pass ', ' print ', ' raise ', ' return ', ' try ', ' while ', ' WI Th ', ' yield ']

Assigning values to variables

A Python variable assignment does not require a declared type.

The left side of the equals sign (=) is the variable name, and the right is the value stored in the variable, for example:

1 "ZGCM "

A variable named school is declared, and the value of the variable school is ZGCM. With school, school will point to the ZGCM memory address and find the ZGCM value. Therefore, when you declare a variable and then use school, you are actually using ZGCM (regardless of the scope of the existence).

Assigning values to multiple variables

1 a = b = c = 12'zifuchuan'

The first line of code, creating an Integer object with a value of 1 and three variables allocated to the same memory space.

The second line of code, creating an Integer object 1 is assigned to the variable name x, the floating-point object 1.1 is assigned to the variable name y, and the string object is assigned to the variable name Z.

Other types of assignment

1str ="This is a string 1"                         #string Assignment2List = [' This',' is','List', 2]#List Assignment3Tuple = (' This',' is','tuple', 3)#Tuple Assignment4Dict = {1:' This', 2:' is', 3:'Dictionary', 4:4}#Dictionary Assignment

Assignment Content Substitution

1 name1 =  "Zhangyan"2 name2 = name13Print  (name1,name2)4"zy"5print(name1 , name2)

The result of the above code output is:

Zhangyan Zhangyan
Zy Zhangyan

The 1.zhangyan assignment to nama1,name1 stores the memory address of the Zhangyan.

2.name1 and assigned to the name2, in fact, Zhangyan memory address gave name2 (equivalent to name1 to Name2 pointed to the road, let Name2 found Zhangyan).

3. So when the third line prints out, both name1 and name2 are Zhangyan.

4. When the Zy is assigned to NAME1, the Name1 store becomes the Zy memory address and no longer points to Zhangyan. At this point the name2 is not changed, pointing to the Zhangyan memory address.

5. So the fifth line outputs the name1 of zy,name2 to Zhangyan.

3. Character encoding

The Python interpreter encodes the content when it loads the code in the. py file (default Ascill)

ASCII (American Standard Code for Information interchange, United States Standards Information Interchange Code) is a set of computer coding systems based on the Latin alphabet, mainly used to display modern English and other Western European languages, which can be used up to 8 Bit to represent (one byte), that is: 2**8 = 256-1, so the ASCII code can only represent a maximum of 255 symbols.

About Chinese

To deal with Chinese characters, programmers designed GB2312 for Simplified Chinese and big5 for traditional Chinese.

GB2312 (1980) contains a total of 7,445 characters, including 6,763 Kanji and 682 other symbols. The inner code range of the Chinese character area is high byte from B0-f7, low byte from A1-fe, occupy code bit is 72*94=6768. 5 of these seats are d7fa-d7fe.

GB2312 supports too few Chinese characters. The 1995 Chinese character extension specification GBK1.0 contains 21,886 symbols, which are divided into Chinese characters and graphic symbol areas. The Chinese character area consists of 21,003 characters. The 2000 GB18030 is the official national standard for replacing GBK1.0. The standard contains 27,484 Chinese characters, as well as Tibetan, Mongolian, Uyghur and other major minority characters. Now the PC platform must support GB18030, the embedded products are not required. So mobile phones, MP3 generally only support GB2312.

From ASCII, GB2312, GBK to GB18030, these coding methods are backwards compatible, meaning that the same character always has the same encoding in these scenarios, and the latter standard supports more characters. In these codes, English and Chinese can be handled in a unified manner. The method of distinguishing Chinese encoding is that the highest bit of high byte is not 0. According to the programmer, GB2312, GBK, and GB18030 belong to the double-byte character set (DBCS).

Some Chinese Windows default internal code or GBK, you can upgrade to GB18030 through the GB18030 upgrade package. But GB18030 relative GBK increases the character, the ordinary person is difficult to use, usually we still use the GBK to refer to the Chinese Windows inside code.

It is clear that the ASCII code cannot represent all the words and symbols in the world, so it is necessary to create a new encoding that can represent all the characters and symbols, namely: Unicode

Unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer. Unicode is created to address the limitations of the traditional character encoding scheme, which sets a uniform and unique binary encoding for each character in each language, which specifies that characters and symbols are represented by at least 16 bits (2 bytes), i.e. 2 **16 = 65536.
Note: Here is a minimum of 2 bytes, possibly more

UTF-8, which is compression and optimization of Unicode encoding, does not use a minimum of 2 bytes, but instead classifies all characters and symbols: the contents of the ASCII code are saved with 1 bytes, the characters in Europe are saved in 2 bytes, and the characters in East Asia are saved in 3 bytes ...

We have learned how to use Python to output "Hello, world!", but if you output Chinese characters "Hello, the World" you may encounter Chinese coding problems.

The default encoding format in Python is the ASCII format, which fails to print correctly when the encoding format is not modified, so the error occurs when reading Chinese.

The workaround is to add #-*-coding:utf-8-*- or #coding =utf-8 at the beginning of the file.

1 # !/usr/bin/python 2 # -*-coding:utf-8-*- 3  4 Print (" Hello, World ");

Note: python3.x source files use utf-8 encoding by default, so you can parse Chinese normally without specifying UTF-8 encoding.

Note: If you use the editor and you need to set the PY file storage format to UTF-8, you will receive an error message similar to the following:

inch position 0:invalid Continuation byte

Pycharm Setup Steps:

However, if you have already specified the encoding #-*-coding:utf-8-*-in your file, you cannot modify it here:

Python Learning note 1 (variable, character encoding)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.