Import Urllib.request
Import re
###<source src= "http://ocs.maiziedu.com/55ca5753cdf0403eb6b700d81dc5a896.mp4" type= ' Video/mp4 '/>
# # # # <source src= "http://ocs.maiziedu.com/55ca5753cdf0403eb6b700d81dc5a896.mp4" type= ' Video/mp4 '/>
# #res = Urllib.request.urlopen (' http://www.maiziedu.com/course/qrsqd/6-164/')
# #html = res.read (). encode (' utf-8 ')
# #decode translated into encode
# # #先用utf-8 solutions, then translated into GBK
# # # #字符串在Python内部的表示是unicode编码, So it is usually necessary to use Unicode as the intermediate encoding when doing the encoding Conversion.
# # #即先将其他编码的字符串解码 (decode) into unicode, and then from Unicode encoding (encode) to another encoding.
# # #decode的作用是将其他编码的字符串转换成unicode编码, such as Str1.decode (' gb2312 '), represents the conversion of gb2312 encoded strings to Unicode Encoding.
# # #encode的作用是将unicode编码转换成其他编码的字符串, such as Str2.encode (' gb2312 '), represents converting a unicode-encoded string into gb2312 encoding.
# #s. decode ("utf-8", "ignore") ignores code that has an exception, showing only valid encodings
# #s. decode ("utf-8", "replace") replaces the encoding of the exception, which is a relative possibility to know the character encoding problem at a glance.
# #str. encode () and bytes (S, Encoding) Convert a string to its raw bytes form, and
# #在此过程中根据一个 STR to create a bytes.
# #bytes. decode () and str (B, encoding) convert the raw bytes to its string form and
# #过程中根据一个bytes创建一个str.
1. Common built-in functions: (can be used directly without import)
Help (obj) online, obj is any type
Callable (obj) to see if an obj can be called like a function
Repr (obj) obtains a representation string of obj that can be used to reconstruct a copy of the object using the string eval
Eval_r (str) represents a valid Python expression that returns the expression
Dir (obj) to view the name visible in the name space of obj
Hasattr (obj,name) See if there is a name in the name space of obj
GetAttr (obj,name) gets a name in the name space of obj
SetAttr (obj,name,value) is a name in the name space of obj pointing to vale, the object
Delattr (obj,name) removes a name from the name space of obj
VARs (obj) Returns the name space of an Object. expressed in Dictionary
Locals () returns a local name space, denoted by dictionary
Globals () Returns a global name space, denoted by dictionary
Type (obj) to view the types of an obj
Isinstance (obj,cls) to see if obj is a CLS instance
Issubclass (subcls,supcls) View Subcls is not a subclass of Supcls
Type conversion function
Chr (i) to turn an ASCII value into a character
Ord (i) turns a character or Unicode character into an ASCII value
Oct (x) turns integer x into a string represented by octal
Hex (x) turns integer x into a hexadecimal-represented string
Str (obj) Gets the string description of obj
List (seq) converts a sequence into a list
Tuple (seq) converts a sequence into a tuple
Dict (), dict (list) converted to a dictionary
int (x) is converted to an integer
Long (x) converts to a long interger
Float (x) converted to a floating-point number
Complex (x) converted to plural
Max (...) to find the maximum value
Min (...) to find the minimum value
Bytes (' qq ', encoding= ' UTF8 ') returns the character form of String B ' QQ '
Built-in functions for executing programs
Complie If a piece of code is often used, it is faster to compile and then Run.
2. Operating system-related Calls
system-related Information Module Import SYS
SYS.ARGV is a list that contains all the command-line arguments.
Sys.stdout Sys.stdin Sys.stderr respectively represents the standard input output, the error output of the file Object.
Sys.stdin.readline () reads a line from the standard input sys.stdout.write ("a") screen output a
Sys.exit (exit_code) Exit Program
Sys.modules is a dictionary that represents all the available module in the system
Sys.platform getting running operating system environment
Sys.path is a list that indicates all paths to find Module,package.
Operating system-related calls and actions import OS
Os.environ A dictionary containing environment variables os.environ["home" can get the value of the environment variable HOME
Os.chdir (dir) Change current directory os.chdir (' d:\\outlook ') Note windows is used to escape
OS.GETCWD () Get current directory
Os.getegid () Gets the valid group ID os.getgid () gets the group ID
Os.getuid () get user ID os.geteuid () get a valid user ID
Os.setegid os.setegid () os.seteuid () os.setuid ()
Os.getgruops () Get a list of user group names
Os.getlogin () Get User login name
OS.GETENV Get Environment variables
OS.PUTENV Setting Environment variables
Os.umask setting Umask
Os.system (cmd) runs the cmd command with system calls
Examples of operations:
Os.mkdir ('/tmp/xx ') os.system ("echo ' Hello ' >/tmp/xx/a.txt") os.listdir ('/tmp/xx ')
Os.rename ('/tmp/xx/a.txt ', '/tmp/xx/b.txt ') os.remove ('/tmp/xx/b.txt ') os.rmdir ('/tmp/xx ')
Write a simple shell in Python
#!/usr/bin/python
Import os, sys
cmd = Sys.stdin.readline ()
While Cmd:
Os.system (cmd)
cmd = Sys.stdin.readline ()
Writing platform-independent programs with Os.path
Os.path.abspath ("1.txt") = = Os.path.join (OS.GETCWD (), "1.txt")
Os.path.split (OS.GETCWD ()) is used to separate the directory part and the file name portion of a directory Name.
Os.path.join (os.getcwd (), os.pardir, ' a ', ' a.doc ') are all path names.
Os.pardir represents the character of the Next-level directory on the current platform:
Os.path.getctime ("/root/1.txt") Returns the CTime (creation Time) timestamp of 1.txt
Os.path.exists (OS.GETCWD ()) To determine if a file exists
Os.path.expanduser (' ~/dir ') to extend the ~ to the user root directory
Os.path.expandvars (' $PATH ') Extended environment variable path
Os.path.isfile (OS.GETCWD ()) To determine if it is a file name, 1 is 0 no
Os.path.isdir (' c:\Python26\temp ') to determine if it is a directory, 1 is 0 no
Os.path.islink ('/home/huaying/111.sql ') is not available under symbolic connections under windows
Os.path.ismout (OS.GETCWD ()) is not available under the file system installation point under Windows
Os.path.samefile (os.getcwd (), '/home/huaying ') take a look at the two filenames that are not referring to the same file
Os.path.walk ('/home/huaying ', test_fun, "a.c")
Traversing/home/huaying All subdirectories include this directory, and the function test_fun is called for each directory.
Example: in a directory, and all of his subdirectories look for names that are A.C files or directories.
def test_fun (filename, dirname, Names)://filename is the walk a.c in DirName is the directory name that is accessed
If filename in Names://names is a list that contains all the contents of the DirName directory
Print Os.path.join (dirname, Filename)
Os.path.walk ('/home/huaying ', test_fun, "a.c")
File operations
Open File
f = Open ("filename", "r") r read-only W writes RW read-write RB reading binary WB write binary w+ write append
Read and write files
F.write ("a") F.write (str) writes a string f.writeline () F.readlines () with the following read similar
F.read () all read out f.read (size) means reading a size character from a file
F.readline () reads a line to the end of the file, returning an empty string. F.readlines () reads all and returns a List. List each element represents a row that contains "\ n" \
F.tell () Returns the current file read location
F.seek (off, Where) locates the file Read-write Location. Off indicates an offset, a positive number moves toward the end of the file, and a negative indicates a move to the Beginning.
Where is 0 means starting from the beginning, 1 means counting from the current position, and 2 means counting from the End.
F.flush () Flush Cache
Close File
F.close ()
Regular expression regular expressions import re
Simple Regexp.
p = re.compile ("abc") if p.match ("abc"): print "match"
In the above example, a pattern is generated first, and if it matches a string, a match object is returned
In addition to some special characters metacharacter metacharacters, most characters Nonalphanumeric and themselves Match.
These special characters Are. ^ $ * + ? { [ ] \ | ( )
Character Set (denoted by [])
Lists characters, such as [abc], that match A or B or c, and most metacharacter only represent and match itself in []. Cases:
A = ". ^$*+?" {\\| () "most Metachar match itself in [], but" ^[]\ "differs
p = re.compile ("[" +a+ "]")
For I in a:
If P.match (i):
Print "[%s] is match"%i
Else
Print "[%s] is not match"%i
Include [] itself in [], which means "[" or "]" matches.
And
Said.
^ appears at the beginning of [], indicating Inversion. [^abc] represents all characters except A,b,c. ^ does not appear at the beginning, that is, to match the Body.
-can represent Range. [a-za-z] matches any one of the English Letters. [0-9] matches any number.
The Magical magic in [].
\d [0-9]
\d [^0-9]
\s [\t\n\r\f\v]
\s [^ \t\n\r\f\v]
\w [a-za-z0-9_]
\w [^a-za-z0-9_]
\ t means match tab, and the other is consistent with the string notation
\x20 representation and hexadecimal ASCII 0x20 match
With \, any character can be represented in []. Note: a separate "." If [] does not appear, it indicates a match for any character other than \ n, similar to [^\n].
Repetition of RegExp
{m,n} indicates the presence of more than M (with M) and N (n). Matches such as Ab{1,3}c and ABC,ABBC,ABBBC do not match ac,abbbc.
M is the Nether and N is the upper Bound. The nether of token in M province is 0,n omitted and the upper bound of the table is infinitely Large.
* indicates {,} + means {1,}? = {0,1}
The maximum and minimum matching python are the maximum matches, and if you want to minimize the match, add one after *,+,?, {m,n}.
The end of the match object can be used to match the position of the last Character.
Re.compile ("a *"). match (' aaaa '). end () 4 maximum match
Re.compile ("a *?"). Match (' AAAA '). end () 0 minimum match
Use raw string
The string representation method uses \ \ to represent characters \. heavy usage affects Readability.
Workaround: add an R in front of the string to indicate the raw format.
A = r "\a" print a result is \a
A = R "\" a "print a result is \" a
Using the RE module
First use Re.compile to get a regexobject to represent a regexp
After using the pattern of the Match,search method, get Matchobject
Then use match object to get the matching position, the matching string and other information
Regxobject commonly used Functions:
>>> re.compile ("a"). Match ("abab") if Abab's beginning and re.compile ("a") match, get Matchobject
<_sre. Sre_match Object at 0x81d43c8>
>>> print re.compile ("a"). match ("bbab")
None note: start matching from the beginning of Str
>>> re.compile ("a"). Search ("abab") searches for the first and re_obj matching sections in Abab
<_sre. Sre_match Object at 0x81d43c8>
>>> print Re.compile ("a"). search ("bbab")
<_sre. Sre_match object at 0x8184e18> and match () do not have to match from the beginning
Re_obj.findall (str) returns STR to search for all and re_obj matching portions.
Returns a tuple in which the element is a matching string.
Common functions of Matchobject
M.start () returns to the starting position, m.end () Returns the end position (the character that does not contain the position).
M.span () Returns a tuple representation (m.start (), M.end ())
M.pos (), m.endpos (), m.re (), m.string ()
M.re (). Search (m.string (), m.pos (), m.endpos ()) will get m itself
M.finditer () can return a iterator that is used to traverse all found Matchobject.
For M in Re.compile ("[ab]"). finditer ("tatbxaxb"):
Print M.span ()
Advanced RegExp
| Represents the union of multiple Regexp. A b two X regexp,a| B means match A or match B.
^ indicates that only the beginning of a row is matched, ^ This special meaning is only at the Beginning.
$ means only the end of a row is matched
\a to match only the beginning of the first line of the string ^ matches the beginning of each line
\z that matches only the end of a row line string matches the first line of the line end
\b Only matches the boundary example of a word: \binfo\b matches only "info" does not match information
\b to match non-word boundaries
Examples are as Follows:
>>> print re.compile (r "\binfo\b"). match ("info") #使用raw格式 \b represents the word boundary
<_sre. Sre_match Object at 0x817aa98>
>>> print re.compile ("\binfo\b"). match ("info") #没有使用raw \b Represents the backspace number
None
>>> print re.compile ("\binfo\b"). match ("\binfo\b")
<_sre. Sre_match Object at 0x8174948>
Example group (group): re.compile ("(a (b) c) d"). match ("abcd"). groups (' ABC ', ' b ')
#!/usr/local/bin/python
Import re
x = "" "
Name:charles
Address:bupt
Name:ann
Address:bupt
"""
#p = re.compile (r "^name: (. *) \n^address: (. *) \ n", Re. M
p = re.compile (r "^name: (? P.*) \n^address: (? P.*) \ n ", re. M
For M in P.finditer (x):
Print M.span ()
Print "here is your friends list"
Print "%s,%s"%m.groups ()
Compile Flag
When using Re.compile to get regxobject, Some flags can be used to adjust the detailed characteristics of the Regxobject.
dotall, S let. match any character, including line break \ n
IGNORECASE, I Ignore case
LOCALES, L let \w \w \b \b Be consistent with the current locale
MULTILINE, M multi-line mode, only affects ^ and $ (see above Example)
VERBOSE, X VERBOSE Mode
Python common functions