Self-taught Python Python basics: (vii) string handling tips __python

Source: Internet
Author: User

Reprint of the Boss Please indicate the source: http://blog.csdn.net/cc_xz/article/details/78693772 extremely grateful.

In this article, you will learn:
1. How to split a string.
2. How to judge the end of a string.
3. How to assemble multiple strings into a single string.
4. How to delete or replace characters in a string.

When you output a string, you can align the string, using: Str.ljust (): To align to the left. Str.rjust (): For aligning to the right. Str.center (): For center alignment.

The above function accepts two parameters, the first parameter is the number of elements filled, the second parameter is the filled element type, and the default is a space, for example, you can set the equals number, the colon, and so on.


Split string by separator:
To split a single delimiter string:

txt = "123,234\n345,456 567,678"
print (Txt.split (",")) #默认使用空白例如 \ n, etc. split. You can also enter a string in the argument to specify the partition.

The output results are:

[' 123 ', ' 234\n345 ', ' 456 567 ', ' 678 ']

This applies only to situations where only a single separator is used.

To split a string using a regular expression:

The import re  # RE is a regular expression module that contains a lot of functions about regular expressions.

Str01 = "123,432,325.432.123|231|3123/32443, 123,2.432,1|324" "" "" "" The
split () function can filter a string (str type) based on a regular expression. In the
following code, the first argument is a regular expression, and the second is a string that needs to be filtered.
It is noteworthy that the split () function only supports filtering of strings, such as lists, tuples, and so on.
the type returned by the split () function is a list
"" "
list01 = Re.split (" [,. |,/]+ ", Str01) 

print (list01)

The output results are:

[' 123 ', ' 432 ', ' 325 ', ' 432 ', ' 123 ', ' 231 ', ' 3123 ', ' 32443 ', ' 123 ', ' 2 ', ' 432 ', ' 1 ', ' 324 ']

Of course, more regular expression content, will be in the back of a separate piece for detailed description.

To determine the end of a string:
generate a random string:

From random import sample, Randint  # sample is used to grab random elements from the list; Randint is used to generate random numbers.

# Defines the alphabetical list, which is used to randomly select the letter composition string.
list01 = ["A", "B", "C", "D", "E", "F", "G", "H", "Y", "J", "K", "I", "M", "N", "O", "P", "Q"]
# The following derivation is:
# First of all Use a For loop 20 times to create a list with an index of 0-19.
# Then use the sample () function to randomly grab the letters in the LIST01 alphabetical list.
# Finally, use the Randint () function to randomly define the number of random grab letters each time.
list02 = [Sample (LIST01, Randint (3, Len (list01)) for I in range]
list03 = []  # because at this point the list02 is a 2-d list, you need to convert it to 1 A list of dimensions. Put the final element in the list03.

for x in range (len (list02)):  # The list02 index is derived from the for loop to operate. The
   # join () function converts an iterative object to a str string, and because this method is a string method, you need to add a String object earlier.
   Str01 = "". Join (List02[x])  # Creates its string object that can be used for a join to convert an iterator object to a delimiter after the STR string.
   list03.append (Str01)

print (list03)

The output results are:

[' FJKPB ', ' Inpkdeyoqjhcam ',....... Omit part of the result ' Ecak ', ' Ydkoji ']


To determine the beginning or end of a string:

From random import sample, randint

list01 = ["A", "B", "C", "D", "E", "F", "G", "H", "Y", "J", "K", "I", "M", "N", "O" , "P", "Q"]
list02 = [Sample (LIST01, Randint (3, Len (list01))) for I in Range (m)]
list03 = [] for

x in range (l En (list02)):
   Str01 = "". Join (List02[x])
   list03.append (Str01) for

x in list03:
   str (x)
   if X.endswith (("A", "B", "C")):  # accept a tuple as an argument, and all the arguments in the argument will be used in turn to determine whether this ends.
      Print ("The string at the end of A or B or C is:", x)
   elif X.startswith (("A", "B", "C")): #但endswith () or startswith () accept only strings or tuples as arguments, Do not accept lists, and so on.
      print ("A string starting with a or B or C is:", X)

The output results are:

A string that begins with a or B or C is: Bqcdpkgaemojfh A string ending with a or B or C: IACB A string that begins with a or B or C:
cqhfmn a
string ending with a or B or C: Cmeqinofgpdhyb A string that starts with a or B or C: Cienm A string that starts with a or B or C:
CFBYOKJQ A string that starts with
A or B or C: AOEHGKFMN
The string that begins with a or B or C is: Agnohpiycdqjk a
string ending with a or B or C: GIMKC


To adjust string text formatting:
You can use the Re.sub () function if we expect to format the contents of some strings (such as log, update information, and so on). The Re.sub () function can be used for string substitution, capturing all of the contents of the string with the capturing group of the regular expression, and reordering the resulting sequence according to the content of the matching success.
For example, now open an update file to modify the display format of the date:

import re

txt = open ("Release.txt", "R", encoding= "Utf-8"). Read ()
#txt = Re.sub ("(\d{4})-(\d{2})-(\d{2})", r "\ The 2/\3/\1 ", txt)" ""
sub () function accepts 3 parameters, and the first argument is a regular expression that selects the target element that needs to be modified.
The second element is also a regular expression that sets the format of the element to be modified. The
third element is the target string to be modified.
the difference between the first sub () function and the second sub () function is that the second function uses a regular expression to alias each field, and in the second argument, the target element is selected by an alias. "" "
txt = re.sub (?) P<YEAR>\D{4})-(? P<MONTH>\D{2})-(? P<DAY>\D{2}) ", R" \g<month>/\g<day>/\g<year>/", txt)
print (TXT)

The output results are:

Date: 08/07/2012/
Date: 12/20/2011/
Date: 2011-2-16
Date: 04/13/2010/
Date: 11/20/2009/
Date: 10/27/ 2009/
Date: 2009-6-29

As a result, we can see that the second and third fields defined in the regular expression are 2 bytes, whereas in the date, everything is converted to the same number of bytes as defined, and the number of bytes is different, and there is no successful match.

to assemble multiple strings into a single string:
When working with some parameters, you might want to assemble a string of fragmented strings (possibly arguments). The easiest way to do this is:

list01 = ["<3389>", "<TCP/IP>", "<23ms>", "<1024k>", "<202.106.0.20>", "<49>"]  # define some parameters to put in the list.
Str01 = ""  # defines an empty string, which is used to put the arguments after the iteration.

for X in list01:  # The result of the list that is returned after iteration x is the str type itself.
   Str01 = x  # If the values in the list are all str types, this method works, but if there are other types, such as int, you need to be forced to transition.
   Print (STR01)

The output results are:

<3389>
<3389><TCP/IP>
<3389><TCP/IP><23ms>
<3389>< Tcp/ip><23ms><1024k>
<3389><tcp/ip><23ms><1024k><202.106.0.20 >
<3389><TCP/IP><23ms><1024k><202.106.0.20><49>


To assemble a string using the Join () function:

list01 = ["<3389>", "<TCP/IP>", "<23ms>", "<1024k>", "<202.106.0.20>", "<49>"]  # define some parameters to put in the list.
Str01 = "|". The Join (list01) #join () function converts an iterative object to a str string, and because this method is a string method, you need to add a String object earlier.
# and creates its string object that can be used to join the delimiter that converts an iterator object to a str string.
Print (STR01)

The output results are:

<3389>|<TCP/IP>|<23ms>|<1024k>|<202.106.0.20>|<49>


string addition and join () assembly time consuming:

import time from random import sample "" First create a two-dimensional list of lengths 1 million, each with 5 sub lists, in a list derivation.
But a two-dimensional list is not directly a join () function and a for traversal, so the two-dimensional list is processed as a normal list.
The list length is 1 million, and there are 5 strings in each list element. "" "list01 = [Sample (" Qwertyuiop ", 5) for x in range (1, 1000001)] list02 = [] for x in range (len (list01)): str00 = '.
Join (List01[x]) list02.append (str00) first determines the time that is required to use the Join () function: First, the current time (the start time) is obtained from the Time.clock () function and placed in a variable.
The list is then processed using join (), and the final result is placed in the STR01 variable (str type).
The current time is then recorded as the end time, and then placed in a variable.
Finally, use the end time minus the start time, which ultimately results in time-consuming. "" "STARTTIME01 = Time.clock () Str01 =" ". Join (list02) endTime01 = Time.clock () joinTime01 = endtime01-starttime01 Print

(JOINTIME01, "joinTime01")
"" and then determine how to add the string by using a for traversal: Ditto, first determine the current start time, and then define an empty string to place the string to be processed.
Then use the For loop, iterate over the list, and then use the + = symbol to place the traversed string into the string.
Finally, the end time is determined, and then the time-consuming required is obtained.
"" "STARTTIME02 = Time.clock () Str02 =" "For I in List02:str02 + = i # actually = = operation is the __add__ () function that invokes the string. ENDTIME02 = Time.clock () joinTime02 = ENDTIME02-STARTTIME02 print (JOINTIME02, "JoinTime02") 

The output results are:

0.017965127833818366 joinTime01
1.984568595269575 joinTime02

In fact, the longest time is to create this list of the length of millions ...


to delete characters in a string:
The methods that can be used to delete specific characters in a string are:
1. The strip (), Lstrip (), Rstrip () of the string to remove the characters at both ends of the string.
2. Use slices + stitching to remove characters from a single fixed position.
3. The replace () function of the string or the Re.sub () function of the regular expression can delete characters anywhere.
4. The translate () function of a string can delete many different characters at the same time.
use the Strip (), Lstrip (), Restrip () to delete both characters:

Str01 = "  This is a string, preceded by 2 white space characters, followed by 3 blank strings   "

Str02 = Str01.strip ()  # Default to remove all whitespace characters before and after.
str03 = Str02.strip ("blank string")  # can also be used to delete a string that is specified before and after the position.
str04 = Str02.strip ("blank")  # But the specified character is not deleted if it is not in the beginning and end.
str05 = Str02.strip ("This is a string")  the characters in the argument are at the beginning and the end, but the strip () is still recognizable.
print (Str02, "\ n", str03, "\ n", str04, "\ n", str05)

Str01 = "---test+++++"
str02 = Str01.strip ("+-") 
  # will be deleted as long as the characters in the argument are included before and after.
Print (STR02)

The output results are:

This is a string that has 2 blank characters before it, there are 3 blank strings after 
 this is a string, there are 2 white space characters, there are 3 after 
 this is a string, there are 2 white space characters, then there are 3 blank strings 
 , a string, 2 blank characters, 3 blank
Test

The rest of the Lstrip () and Rstrip () are the same as the strip (), except that the Strip () is the deletion of the characters that meet the criteria at both ends of the line, while Lstrip () and Rstrip () only delete the characters on the left and right sides respectively.


Delete characters using slices and stitching:

Str01 = "abc:123"
Str02 = Str01[:3] + str01[4:]  # Use slices to skip over the parts that you want to delete.
Print (STR02)

The output results are:

abc123


replace characters with the Replace () function or the Re.sub () function:

From re import sub  # import only the sub () function in the RE package

Str01 = "abc\tcbd\nxyz"
Str02 = str01.replace ("\ T", "-")  # You can filter the target string and the replacement string in the string.
# but replace () can only be replaced once in the same function. \ n In the previous example cannot be excluded from the same function.
print (STR02)

# Replaces characters in a string with the sub () function in the regular expression.
# First define the rules for the filter (regular expression), and then define what type of character to replace.
# Finally, the string that needs to be filtered is passed as a parameter to the sub () function.
str03 = Sub ("[\t\n]", "-", Str01)
print (STR03)

The output results are:

ABC-CBD
xyz
abc-cbd-xyz


Use the string translate () function to remove multiple characters:
About 2. The difference between the Maketrans () and translate () two functions in X and 3.X two versions is briefly described below:
In python3.x, the string is reclassified into byte characters (bytes) and text strings (str), and because both are immutable, a mutable byte string type (ByteArray) is added. Due to in 2. In version x, many of the functions in string type and STR and Unicode are duplicates, so in 3. The use of string types is not advocated in version X.

The
Maketrans () function is used to create a character Map relationship conversion table, which is used to convert the source character to the target character. The
Maketrans () function receives two parameters: the
first argument is a string that represents the character that needs to be converted, that is, the source character.
The second argument is also a string that indicates what character needs to be converted, that is, the target character.
the length of the two string must be equal to ensure the corresponding relationship.
This method is also available in functions such as "" "Intab =" ABCD "Outtab =" 1234 "
Strswitchtab = Str.maketrans (Intab, outtab)  # bytes ().

# The Translate () function converts a string that uses the function based on the character mapping relationship conversion table returned by the Maketrans () function.
Textstr = "This A is B one C section D B Test"
textstr = textstr.translate (strswitchtab)
print (TEXTSTR) ""

"
In the python3.0+ version, if you want to delete any character, you need to define it in the third parameter of the Maketrans () function.
If you do not want to replace the other elements, the first two parameters are defined as empty strings. In the third argument, define the character you want to delete.
It is noteworthy that, in this operation, the translate () function does not distinguish between a "word" or "letter" that you define to delete a string.
It will match all the characters you enter with the characters in the string, and if the same, delete them directly. "" "" "
textstr =" This A is B one C section D Test b dacbb "
strdeltab = Str.maketrans (" "," "," DACBB ")
textstr = Textstr.translate (Strdeltab)
print (TEXTSTR)

The output results are:

This 1 is B 13 section D Test 2 Try
This is a test


String Type:
One of the most important new features of Python3 is the very clear distinction between text and binary data, which is always Unicode, and is represented by Str. Binary data is represented by the bytes type. Python3 does not use any implicit way to mix str and bytes, for example, you cannot assemble strings and byte packets together, nor can you search for a string in a byte packet.
However, strings can be decoded into byte packets, and byte packets can also be decoded as strings.

Text = "Paragraph of Chinese characters"
text = Text.encode ("UTF-8")  # decodes the str string into binary byte packet
print (text)
text = Text.decode ("UTF-8" )  # then convert the binary byte packet to the STR string
print (text)

The output results are:

B ' \xe4\xb8\x80\xe6\xae\xb5\xe4\xb8\xad\xe6\x96\x87\xe5\xad\x97\xe7\xac\xa6 ' a
Chinese character

The encoding (the UTF-8 above) is a vital part of this conversion process, and the binary byte packet is a heap of bits, which is encoded to give it meaning, and therefore different encodings, and the meaning of this heap bit is not the same.


the meaning of the preceding letter of a string:

""
adds u before the string, which indicates Unicode encoding of the string.
General English characters in a variety of coding can be normal parsing, so generally do not take u, but the Chinese string, must clearly indicate the required encoding,
otherwise the transcoding will appear garbled.
It is noteworthy that UTF-8 is a way of implementing Unicode. "" "
Str01 = u" This is a string of "
print (Str01)" "" "
Add R before the string, representing the original string not escaped. In a
string, there may be some tilde, "\ n", "\ T", and so on, but if the string starts with R,
then all other characters in the string are normal characters. "" "" "
Str02 = r" This is a paragraph \ n string "
print (Str02)" ""
adds B to the string, representing the bytes binary encoding. ""
"" "
byte = B" Testcode "
print (
type byte)

The output results are:

This is a string
of strings this is a \ n string
B ' testcode '
<class ' bytes ' >
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.