Python learning notes sorting (4) strings in Python..., python learning notes

Source: Internet
Author: User
Tags string methods vars

Python learning notes sorting (4) strings in Python..., python learning notes

A string is an ordered Character Set combination used to store and present text-based information.
Common string constants and expressions
T1 = ''empty string
T2 = "diege's" Double quotation marks
T3 = "..." "Triple quotation marks
T4 = R' \ temp \ diege 'raw string suppression (cancel) Escape, completely Print \ tmp \ diege, without tabs
T5 = u'diege' Unicode string
T1 + T2 merge
Repeated T1 * 3
T2 [I] Index
T2 [I: j] parts
Len (T2) Length
"A % s parrot" % type string formatting
T2.find ('ie ') string method call: Search
T2.rstrip () string method call: Remove Spaces
T2.replace ('ie', 'efk') string method call: replace
T2.split (',') string method call: Split
T2.isdigit () string method call: Content test
T2.lower () string method call: Convert uppercase to lowercase
For x in T2: Iteration
'Ie' in T2 member relationship
I. string constants
1. The single double quotation mark string is the same
Python automatically merges adjacent string constants in any expression. Although the + operator can be added between them to indicate that this is a merge operation.
>>> T2 = "Test" 'for' "diege"
>>> T2
'Test for diege'
>>> T2 = "Test" + 'for '+ "diege"
>>> T2
'Test for diege'
You cannot add a comma (,) to a string to create a tuple instead of a string. Python tends to print all these form strings as single quotes, unless the strings contain single quotes.
However, quotation marks can also be embedded through backslash escape.
>>> T2 = "Test" + 'for '+ "diege's"
>>> T2
"Test for diege's"
>>> 'Diege \ s'
"Diege's"
2. Use escape sequence to represent special bytes
\ Newline ignoring (continuous)
\ Backslash (Reserved \)
\ 'Single quotation marks (retain ')
\ "Double quotation marks (Reserved ")
\ N line feed
\ F form feed
\ T horizontal Tab
\ V vertical Tab
\ B is missing
\ A bell
\ R returns the missing character
\ N {id} Unicode Database ID
\ Uhhhh Unicode16 hexadecimal value
\ Uhhhh Unicode32-bit hexadecimal value
\ Xhh hexadecimal value
\ Ooo octal value
\ 0 NULL (not the end of a string)
\ Other is not escaped (retained)
3. Escape Character string suppression
Myfile = open ('C: \ new \ text. data', 'w ')
This call will try to open the file C :( line feed) ew (Tab) ext. data, rather than the expected result.
Solution: Use the raw string. If the letter r (in upper or lower case) appears before the first quotation mark of the string, the escape function is disabled.
Myfile = open (r 'C: \ new \ text. data', 'w ')'
Another way is to escape \
Myfile = open ('C: \ new \ text. data', 'w ')'
4. Write multi-line string blocks in triple quotes
Block string, convenient syntax for writing multi-line text data.
This form starts with triple quotation marks (single double quotation marks are acceptable) and follows the code of any number of rows, and ends with the same triple quotation marks. Double quotation marks embedded in the string text will also be escaped. Double quotation marks are also commonly used as a hacker-style method in the development process to abolish some code. If you want to make some code not work, and then run the code again, you can simply add triple quotation marks before and after the lines.
X = 10
"""
Import OS
Print OS. getcwd ()
"""
Y = 19
5. Character Set with larger string Encoding
A Unicode string is sometimes called a "width" string. Because each string may occupy more than one byte in memory.
Unicode string is typically used in applications that support internationalization (i18)
You can add the letter u (case-sensitive) before the quotation marks to write a Unicode string.
>>> T9 = u'diege' # This syntax generates a unicode String object.
>>> T9
U'diege'
>>> Type (T9)
<Type 'unicode '>
Python allows expressions to freely mix Unicode strings and general strings. Convert the mixed type result to Unicode.
Unicode strings can also be merged, indexed, and sharded. The re module is used for matching and cannot be modified in the field. It is the same as a general string.
Python treats General strings and Unicode strings in the same way
To convert a string to a Unicode string, you can use the built-in str and unicode functions.
>>> Str (u'diege ')
'Diege'
>>> Unicode ('diege ')
U'diege'
Unicode is used to process multi-byte characters. Therefore, special "\ u" and "\ U" Escape strings can be used to encode binary values greater than 8 bits.
U' AB \ x20cd'
The sys module includes obtaining and setting the default Unicode encoding scheme (usually ASCII by default)
You can mix raw and Unicode strings.
Ii. Strings actually used
1. Basic operations
String Length acquisition method: built-in function len ()
>>> Len ('test ')
4
String connection: +
>>> 'Test' + 'diege'
'Testdiege'
Python does not allow + expressions to mix strings and numbers.
Duplicate string :*
>>> 'Test' * 3
'Testtesttest'
Used in Split prompt
>>> Print '-' * 80
Iteration: Use the for statement to iterate in a string
>>> For s in myname: print s
...
D
I
E
G
E
A for Loop assigns a variable to obtain the elements in a sequence, and executes one or more statements for each element.
Member relationship test: Use the in expression operator to test the member relationship.
>>> 'G' in myname
True
>>> 'K' in myname
False
2. indexing and partitioning
The characters in a string are strings that are obtained at a specific position by indexing (extracted by providing the numeric offset of the required element in square brackets after the string.
The Ptyhon offset starts from 0. Supports the negative offset.
Index
>>> T [0], T [-2]
('D', 'G ')
Parts
T [start: end] contains the start position, not the end position
>>> T [1: 3], T [1:], T [: 3], T [-1], T [0:-1]
('Ie', 'ege', 'die', 'E', 'dig ')
>>> T [:]
'Diege'
Summary:
* The index (S [I]) gets the element of the specific offset.
-- The first element offset is 0.
-- (S [0]) gets the first element.
-- Negative offset index indicates counting from the last or right reverse
-- (S [-2]) obtain the second to last element (like S [len (s)-2]
* Parts [S [I: j] are extracted as a sequence.
-- The right boundary is not included.
-- The shard boundary is 0 by default and the sequence length. If not, S [:]
-- (S []) obtains the element whose offset is 1 and does not include the element whose offset is 3.
-- (S [1:]) obtains the element from the offset of 1 to the end.
-- (S [: 3]) obtain the element whose offset is 0 and does not include the element whose offset is 3.
-- (S [:-1]) obtains the element from the offset of 0 until but does not include the element between the last element.
-- (S [:]) gets the element from the offset of 0 to the end, which effectively implements the top-layer S copy
An object with the same value but different memory regions is copied. Immutable objects such as object strings are not very useful, but useful for objects that can be modified in the field.
For example, list.
3. Extended parts: Step 3
Complete Form: X [I: J: K]: This identifies the element of the index X object, from offset to I until the J-1, indexed every K element. The third limit value, K. The default value is 1.
Instance
>>> S = 'abcdefhijk'
>>> S [1: 10]
'Bcdefghij'
>>> S []
'Bdfhj
You can also use a negative number as a step.
Partition expression
>>> "Hello" [:-1]
'Olleh'
Through negative numbers, the meaning of the two boundary is actually reversed.
Import sys
Print sys. argv
# Python echo. py-a-B-c
['Echo. py', '-A','-B ','-C']
Echo. py content
Import sys
Print sys. argv [1:]
# Python echo. py-a-B-c
['-A','-B ','-C']
4. String Conversion Tool
One of Pyhon's design motto is to reject the temptation to guess.
In Python, numbers and strings cannot be added together, or even real-time strings look like numbers.
>>> '55' + 1
Traceback (most recent call last ):
File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects
+ You can perform addition or merge operations. This session is ambiguous. Avoid this syntax.

Solve this problem by getting a number that appears as a string in the script file and user interface?
The solution is to use the Conversion Tool to pre-process the string as a number or a number as a string.
How to convert a number to a string
>>> Str (55)
'55'
>>> '55'
'55
>>> T = repr (55)
>>> Type (T)
<Type 'str'>
Convert string to numeric
>>> Int ('66 ')
66
>>> D = int ('66 ')
>>> Type (D)
<Type 'int'>
These operations re-create objects.
>>> S = '55'
>>> X = 1
>>> Int (S) + x
56
Similar built-in functions can convert a floating point to a string, or convert a string to a floating point.
>>> Str (3.1415), float ("1.5 ")
('3. 100', 1415)
>>> Text = '1. 234E-10'
>>> Float (text)
1.234e-10
Built-in eval function, which runs a string containing the python Expression Code and can convert a string to any type of object.
The int and float functions can only convert numbers.
** String Code Conversion **
Likewise, a single character can be converted to its corresponding ASCII code by passing it to the built-in ord function-this function actually returns the binary of the character corresponding to the character in the memory.
The built-in chr function converts binary data into characters.
>>> Ord ('T ')
116
>>> Chr (1, 116)
'T'
5. Modify the string
Immutable sequence. You cannot modify a string (for example, assign a value to an index)
If you need to change a string, you need to use tools such as merging and sharding to create and assign a value to a new string. If necessary, assign the result to the original variable name of the string.
>>> S = 'diege'
>>> S = 'My name is '+ S
>>> S
'My name isdiege'
In this way, the original object is not changed, but a new String object is created and connected to the new object with the original variable name.
>>> T = 'diege'
>>> S = T [: 3] + 'bad' + T [:-1]
>>> S
'Diebaddieg'
Every time a string is modified, a New String object is produced.

3. String formatting
To Format a string:
1) place a "string" to be formatted on the left of the "%" operator. This string contains one or more embedded conversion targets starting with "%" (for example, "% d)
2) place an object (or multiple objects in parentheses) on the right side of the % operator. These objects will be inserted to the left side to allow Python to format strings (or multiple) the position of the conversion target.
>>> Name = 'diege'
>>> "My name is: % s" % name
'My name is: diege'
>>> Name = 'diege'
>>> Age = 18
>>> "My name is: % s my age is % d" % (name, age)
'My name is: diege My age is 18'
>>> "% D % s % d you" % (1, 'diege', 4)
'1 diege 4 you
>>> "% S -- % s" % (42, 3.1415, [, 4])
'42 -- 3.1415 -- [1, 2, 4]
In this example, we insert three values, one integer, one floating point number and one table object, but note that all the left-side targets are % s, which indicates converting them into strings. Since any object can be converted to a string (used during printing), every object type involved in the operation with % s can be converted to code. For this reason, unless you want to perform special formatting, you generally only need to remember to use the % s code to format the expression.
Formatting always returns a new string as the result rather than modifying the string on the left. Because the string is unchangeable, you can only perform this operation. If necessary, you can assign a variable name to save the result.
1. More advanced string formatting
Python string formatting supports all the common printf code in C Language (but returns results instead of Displaying results as printf ). Some formatting code in the table provides different options for formatting of the same type.
Code meaning
% S string (or any object)
% R s, but use repr instead of str
% C characters
% D decimal (integer)
% I integer
% U no (integer)
% O octal integer
% X hexadecimal integer
% X x, but uppercase
% E floating point index
% E floating point, but in upper case
% F floating point decimal
% G floating point e or f
% G floating point E or f
% Constant %
The conversion target on the left side of the expression supports multiple conversion operations. These operations have a very rigorous syntax. The general structure of the conversion target looks like this:
$ [(Name)] [flags] [width] [. precision] code

Index key for referencing the dictionary, filling sign, and width
Minus sign left alignment
Align right

> X = 1234
>>> Res = "test:... % d... %-6d... % 06d" % (x, x, x)
>>> Res
'Test:... 1234... 1234... 001234'
Res = "test:... % d... % 6d... %-06d" % (x, x, x)
% 6d right alignment width 6 is not enough space fill
%-06d left alignment width 6 is not 0 complete

2. dictionary-Based String formatting
String formatting also allows the conversion target on the left to reference the keys in the dictionary on the right to extract the corresponding values.
>>> "% (N) d % (x) s" % {"n": 1, "x": 'diege '}
'1 diege'
(N) (x) references the keys in the right dictionary and extracts their corresponding values. This technology is often used to generate programs similar to HTML or XML.
>>> Reply = """
... Greetings.
... Hello % (name)S!
... Your age is % (age)S
..."""
>>> Values = {'name': 'diege', 'age': 18}
>>> Print reply % values

Greetings.
Hello diege!
Your age is 18
These tips are often used together with the built-in function vars. The dictionary returned by this function contains all the variables that exist in this function call.
>>> Name = 'diege'
>>> Age = '18'
>>> Vars ()
{'S ': 'diebaddieg', 'res': 'test :... 1234... 1234... 1234 ', 'D': 66,' _ builtins _ ': <module' _ builtin _ '(built-in)>, 'text': '1. 234e-10', 'age': '18 ',

'Myname': 'diege', '_ package _': None,'s ': 'E', 'values': {'age': 18, 'name': 'diege'}, 't': 'diege', 'x': 1234, 'reply': '\ nGreetings. \ nHello % (name) s! \ NYour

Age is % (age) s \ n', '_ name _': '_ main _', '_ doc _': None, 'name': 'diege '}
>>> "My name is % (name) s age is % (age) s" % vars ()
'My name is diege age is 18'

Iv. String Method
In addition to expression operators, strings also provide a series of methods to implement more complex text processing tasks. The method is related to a specific object in some functions. From a technical point of view, they are attached to object attributes, and these attributes are just some callable functions. In Python, there are different methods for different object types. The string method is only applicable to string objects. The function is also the code package. The method call performs two operations at the same time (one request to obtain the attribute and the callback function call)
Attribute reading
Expressions in the object. attribute format can be understood as "reading the attribute value of an object ".
Function call expression
An expression in the format of a function (parameter) means "to call the function code, pass the parameter objects separated by zero or more commas, and finally return the return value of the function ".

Merging the two allows us to call an object method. Method call expression object. The method (parameter) runs from left to right. That is to say, Python first reads the object method, calls it, and passes parameters. If a method calculates a result, it will be returned as the result of the entire method call expression.

Methods that can be called by most objects. In addition, all objects can be accessed through the same method call syntax. To call the object method, you must ensure that this object exists.

1. String method example: modifying strings
String Method
>>> Dir (S)
['_ Add _', '_ class _', '_ ins INS _', '_ delattr __', '_ doc _', '_ eq _', '_ format _', '_ ge _', '_ getattribute __', '_ getitem _', '_ getnewargs __',

'_ Getslice _', '_ gt _', '_ hash _', '_ init _', '_ le __', '_ len _', '_ lt _', '_ mod _', '_ mul _', '_ ne __', '_ new _', '_ reduce _', '_ performance_ex __',

'_ Repr _', '_ rmod _', '_ rmul _', '_ setattr _', '_ sizeof __', '_ str _', '_ subclasshook _', '_ formatter_field_name_split', '_ formatter_parser ',

'Capitalize', 'center', 'Count', 'decode', 'encode', 'enabledswith', 'pandtabs ', 'Find', 'format', 'index ', 'isalnum', 'isalpha', 'isdigit', 'islower ', 'isspace ',

'Istitle', 'isupper', 'join', 'ljust ', 'lower', 'lstrip', 'partition', 'replace ', 'rfind', 'rindex ', 'partition ust ', 'rpartition', 'rsplit', 'rdstrip', 'split ',

'Splitlines ', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'uppper, 'zfill']
You can use the help () function to view how to use
>>> Help (S. isupper ())
1) replace
Replace the 3rd character
>>> S = 'namediege'
>>> S = S [: 3] + 'xx' + S [5:]
>>> S
'Namxxiege'
You can use the replace method to replace only one substring.
>>> S = 'aabbccdd'
>>> S = S. replace ('bb ', 'gg ')
>>> S
'Aggccdd'
The first parameter of replace is the original string (any length), and the second parameter replaces the string (any length) of the original string)
2) Search
The find method returns the offset at the occurrence of the substring (from the beginning to the end by default) or-1 if the substring is not found.
3) scatter list ()
List Method
>>> S = 'diege'
>>> List (S)
['D', 'I', 'E', 'G', 'E']
Will be split into a list
4) jion () Synthesis Method
>>> S = 'diege'
>>> List (S)
['D', 'I', 'E', 'G', 'E']
>>> T = list (S)
>>> T
['D', 'I', 'E', 'G', 'E']
>>> T [0] = 'P'
>>> T [3] = 'G'
>>> T
['P', 'I', 'E', 'G', 'E']
>>> S = ''. join (T) # use an empty string to split the character list into strings.
>>> S
'Piege'
>>> Y = '|'. join (T)
>>> Y
'P | I | e | G | E' # Use | split to convert the string list to a string

>>> 'X'. join (['eggs', 'toast ', 'moa'])
'Eggsxtoastxmo

2. String method example: Text Parsing
1) Use slice for text Parsing
>>> Line = "aaa bbb ccc"
>>> Cols1 = line [0: 3]
>>> Cols2 = line [8:]
>>> Cols1
'Aaa'
>>> Cols2
'Ccc.
The group data appears at a fixed offset, so it is possible to split the data from the original string through fragments. This technique can be considered as parsing, as long as the required data group key has a fixed offset.
2) split method extraction component
When the required data does not have a fixed offset, use the split method to extract the component. In the string, the data appears at any position. This method can work.
>>> Line = 'aaa bbb ccc'
>>> Cols = line. split ()
>>> Cols
['Aaa', 'bbb ', 'ccc']
The string split method uses a delimiter to split a string into a list of substrings. The default Delimiter is space. This string is divided into multiple groups by one or more spaces, tabs, or linefeeds. Then we get a list of final substrings.
>>> Names = 'diege, kelly, lily'
>>> Names. split ()
['Diege, kelly, lily']
>>> Names. split (',')
['Diege', 'Kelly ', 'lily']
3. Other common string methods in practical application
Other string methods have a more focused role.
Clear the white space at the end of each line, perform case-sensitivity conversion, and check the substring at the end.
>>> Line = 'the python is running! \ N'
>>> Line. rstrip ()
'The python is running!
>>> Line. upper ()
'The python is running! \ N
>>> Line. isalpha ()
False
>>> Line. endswith ('ing! \ N ')
True
>>> Line. find ('in ')! =-1
True
Note that there is no String Support Mode-for Mode-based text processing, you must use the Python re standard library module. The string method is sometimes compared with the re module tool and has the advantage of running speed.
4. Initial string Module
The initial string module is the string module, which contains functions that are equivalent to the current string method set. Currently, only the string method should be used, instead of the original string module.
5. Category in general sense
1. Types of the same category share their operation sets
A string is a sequence that cannot be changed. It cannot be changed in the original position. It is a set of sorted positions. In Python, all sequence data types-all support sequence operations-merge, index, and iteration. Similar to sequential operations, there are three types (and operations) in Ptyhon,
* Number
Supports addition and multiplication.
* Sequence
Supports indexing, sharding, and merging.
* Ing
Supports key-based indexes.
For example, for any sequence objects X and Y:
X + Y creates a new sequence object that contains the content of two operation objects.
X * N will contain N copies of X content of the operation object to the new sequence object.
In other words, these operators are the same for any sequence object, including strings, lists, tuples, and user-defined object types. The object type will tell Python what task to execute.
2. The variable type can be modified in the original directory.
Immutable classification is a constraint that requires special attention. If an object is unchangeable, its value cannot be modified. Instead, you must run the code to create a new object to include this new value. The immutable type has some integrity, so that this object will not be changed by other parts of the program.
The variable type can be modified at the original location, and the original data can be modified as needed.

Summary of methods and expressions:
Methods are type-specific and not universal.
Expressions are generic and can be used for multiple types. For example, slice is used in the string, list, And tuples that support sequences.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.