Python__str and List

Last Update:2014-12-06 Source: Internet

Author: User

Tags iterable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python Basics

Python Basics--str and List

This article focuses on the very many two built-in types used in Python, namely STR and list. They all belong to the Sequnce type (sequence type) in Python. There are altogether 7 sequence types in Python, namely str (string), Unicode (U string), List (list), tuple (tuple), ByteArray (byte array), buffer (buffer memory), xrange (range). Their common operation is as follows:

Operation--------------Result
x in s-------------------determine if x is in S
x not in s---------------Judge X is not in S
s + t---------------------two sequence merges, after adding T to S
s * n, n * s-------------is equal to n s sum
s[i]----------------------returns the value of index value I, starting with 0 index
s[i:j]--------------------returns the sequence from the index value I to J, with a step of 1, excluding S[J]
s[i:j:k]------------------slice operation with a step of K
len(s)--------------------returns the length of the sequence s
min(s)--------------------returns the minimum value in the sequence s
max(s)--------------------returns the maximum value in the sequence s
s.index(x)----------------returns the index value of the first sequence value of x in S
s.count(x)----------------returns the number of occurrences of x in S

The above actions apply to all sequence types, some of which are noted, and can be consulted in the python2.7.8 official documentation . The following section begins with the introduction of STR and list.

STR string 1. Character encoding issues 1.1 basic concepts

Before I talk about character encoding in Python, I think it's very necessary for you to figure out the concepts of ASCII, Unicode, and coding such as utf-8, or continue to look at the following content confused. This is a lot of information on the web, here are some of the main list of things to say good:

Python character encoding in detail
Programmers fun reading: Talking about Unicode encoding
Talk about Unicode encoding, briefly explain UCS, UTF, BMP, BOM and other names
Unicode,ascii is a character set, and UTF is a coding method

Believe that the smart you read the above articles, must be on what is ASCII, what is Unicode, what is the encoding method has a certain understanding. What, you say you don't understand! --！ Well, let me just explain. ASCII is a character set that only represents English, and only one byte is used in a computer to represent one character. Unicode is a character set that can represent all the characters in the world, as it is used in several bytes in the calculation and what bytes are represented, depending on how the encoding is different. such as the most common utf-8,gb2312 and so on.

We know that computers, whether they are saving files or transferring files on the network, take the form of bytes. And we're dealing with strings or files in a program that can't handle those bytes that don't understand, so there are two questions:

encodes a string in some way (encode) into bytes for saving or transferring
encode the programmed bytes (decode) into a string that we can read and handle

With the two questions above, let's take a look at how the two problems are dealt with in python2.x.

STR and Unicode in 1.2 python

The STR type, which is actually a byte representation of a string, is encoded using the default encoding of the current system and can be viewed by a function locale.getdefaultlocale()
The Unicode type, which is the true meaning of the string, holds the contents of a Unicode code that is a character, using a prefix \u to identify it.

The following code can be used to see the more clearly:

>>> s = "下雪"  >>> s  ‘\xcf\xc2\xd1\xa9‘          #十六进制的字节表示方法>>> u = unicode(s, "cp936") #我的电脑上默认的是cp936编码方式>>> uu‘\u4e0b\u96ea‘

1.3 decode () and encode () methods

The decode () method decodes the STR (byte form) into Unicode form

The input decode must be encoded in STR, otherwise an error will occur.
If you do not enter a decoding method, the default mode of adoption is sys.defaultencoding() decoded, generally ASCII. At this point, if Str denotes English or punctuation, it will not cause an error, if Chinese is present in STR, it will cause a decoding error (ASCII cannot decode Chinese).

The code for the example is as follows:

>>> print s.decode()            #采用ascii解码snow>>> print s.decode("cp936")     #采用cp936解码snow>>> s = "下雪">>> print s.decode("cp936")下雪>>> print s.decode("utf-8")     #解码方式不对,引起错误Traceback (most recent call last):  File "<pyshell#48>", line 1, in <module>    print s.decode("utf-8")  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode    return codecs.utf_8_decode(input, errors, True)UnicodeDecodeError: ‘utf8‘ codec can‘t decode byte 0xcf in position 0: invalid continuation byte>>> print s.decode()            #ascii不能解释中文，引起错误Traceback (most recent call last):  File "<pyshell#49>", line 1, in <module>

The encode () method encodes the Unicode form into str (in bytes) in a coded way

This method encodes the Unicode string into str (byte form). The input parameters represent the encoding method, such as "cp936", "Utf-8", "GBK" and so on.
If you do not enter a parameter, the system's default encoding is generally ASCII. When encoding Chinese, using ASCII will produce an error.

The sample code is as follows:

>>> s = "下雪">>> u = s.decode("cp936")              #解码>>> print u.encode("cp936")            #编码下雪>>> print u.encode("utf-8")            #以utf-8编码涓嬮洩>>> print u.encode("gb2312")           #以gb2312编码下雪>>> print u.encode("gbk")              #以gbk编码下雪>>> print u.encode()                   #以ascii编码，产生错误Traceback (most recent call last):  File "<pyshell#59>", line 1, in <module>

Hopefully the above code will help you understand the encode () method. But you may be curious, why utf-8 encoded string output, it will be garbled?? In fact, this is related to the console encoding method, only if it is consistent with the encoding of the string, it will be displayed normally.

1.4 str Encode () method, Unicode decode () method

I don't want to elaborate on these two methods because I don't think they have much use. Only the following instructions are made:

str.encode("encodeway")<==>str.decode().encode("encodeway")
unicode.decode("decodeway")<==> unicode.encode().decode("decodeway") from Unicode to Unicode, it doesn't make any sense!!

Since the focus of this article is not on coding in Python, you can only make a brief description. Online on this part of the material is also a lot of information, here is a simple list of some of my feelings write a good:

The pain of Unicode
The difference between Python encoding processing-str and Unicode
Python character encoding and decoding

2. Format of STR

In Python, there are usually two ways to control the format of a string. The first is the ancient, many other languages such as the format control used in C; Here are not many introductions, you can refer to the following two blog posts. The first of these is the format control, and the second one, which speaks the formatting method.

Python supplement 05 string formatting (% operator)
Python format string (translated)

3. Str Common Methods

S.Center (width[, Fillchar])
The return string is the way in which s is used as an intermediate string and expands to the width length, with Fillchar padding at both ends, and padding by default with spaces. Example:

>>> s = "love">>> s.center(8, "_")‘__love__‘>>>

S.Find (sub[, start[, end]])
Returns the first index value of a neutron string sub, start, end as an optional parameter, indicating the starting and ending position of the index position, if no return-1 is found. The Find method is used when you are sure that the sub exists in S, otherwise the operation should be used sub in s . Example:

s = "http://www.baidu.com">>> s.find("w", 1, -1)7>>>

S.Index (sub[, start[, end]])
Similar to the Find method, except that it causes a valueerror error if the sub is not found

S.count (sub[start[, end]])
Returns the number of occurrences of a sub substring in s

S.isspace ()
Returns TRUE if s consists of a space character. Returns False if S is an empty string, or if there is a string in s that is not a space. Note that s consists of a space string, bool (s) is true, S is an empty string, and bool (s) is false

S.Join (iterable)
Adds s to each of the two elements in the iterable and returns the result. Where iterable represents data types that can be iterated, such as list, tuple, and so on. The requirement is that the element can be used with the string s + = operation. Example:

>>> s = "love">>> s.join("you")‘yloveoloveu‘>>>

S.Ljust (width[, Fillchar])
is s.center() similar to a method, except that it operates on the left side of the string s.

S.strip ([chars])
Starting from both sides of the string s, all the characters appearing in the chars are removed until the first character that is not in the chars terminates the operation. If you do not pass in chars, the space is removed by default. Example:

>>> s = "   hello world   ">>> s.strip()‘hello world‘>>>

S. partition (seq)
The inverse value is a tuple of three elements, the first representing the string preceding the SEQ sequence, the second representing the SEQ sequence, and the third representing the remaining string. If the SEQ is not found in S, the return (s, ‘‘, ‘‘)

S.replace (old, new[, Count])
Returns a copy of S that replaces old with new, and the optional parameter count represents the number of substitutions.

S.split ([seq[, Maxsplit]]) splits the copy of S by a seq as a split character, splitting it into elements and forming a list to return. If SEQ is not given, the default is split by the space character, Maxsplit represents the maximum number of splits. Example:

S.Splitlines ([keepends])
The character that represents the newline in s ("\ n" or "\ r \ n") divides s into elements and a list is returned, and if Keepends passes true, the newline character is preserved.

For example, ab c\n\nde fg\rkl\r\n‘.splitlines() returns [‘ab c‘, ‘‘, ‘de fg‘, ‘kl‘] , and the same call with splitlines(True) returns [‘ab c\n‘, ‘\n‘, ‘de fg\r‘, ‘kl\r\n‘] .

Note the point :

The above string operations are performed on a copy of the original string and do not change the value of the original string
Some functions will have "left version" and "right version", respectively, the original function name based on the addition of ' l ' or ' r ', the implementation of similar functions.
For a more detailed description of the method, refer to the official documentation .

List lists

The list type is used very frequently in python. In addition to the built-in functions in Python listed in the beginning, there are several ways to manipulate the list.

Operation---------------------result
s[i] = x--------------------------Replace S[i] with X
s[i:j] = t------------------------Replace the contents of S[i:j] with the content in T, T must be a type that can be iterated
del s[i:j]------------------------Delete S[i:j], equivalent to s[i:j] = []
s[i:j:k] = tThe elements in----------------------s[i:j:k] are replaced by T, where T must have the same length
del s[i:j:k]----------------------equal to s[i:j:k] = []
s.append(x)-----------------------is equivalent to S[len (s): Len (s)] = [x], which inserts x as an element at the end of the list
s.extend(x)-----------------------equivalent to S[len (s): Len (s)] = X is the addition of elements from X to the last of the list
s.count(x)------------------------returns the number of X elements in S
s.index(x[, i[, j]])--------------returns the smallest index value of the X element in S, ranging from I to J
s.insert(i, x)--------------------Insert X at position I
s.pop([i])------------------------takes the value of the I position in S and returns, and the default value for I is-1
s.remove(x)-----------------------Delete the X in S
s.reverse()-----------------------The elements in s are reversed
s.sort([cmp[, key[, reverse]]])--Sort

Note the point:

All of the above is done on the basis of the original list, if you want to keep the original list, you'd better copy the original list and then manipulate it on its copy.
The slice operation, which returns the list element, does not change the original list of elements. Similarly, the range of slices is also to be noted, in fact the last value is not taken.

Ok! The basics of getting started with the string and list in Python are first organized here. Continue!! In addition, the following is a leetcode on the exercise, the main use of string with the list of some operations, interested students can practice!!

Exercise:reverse Words in a String

Python__str and List

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More