Python linecache, glob module, linecacheglob
Today I learned two interesting modules: linecache and glob.
Linecache Module
In python, there is a handy module linecache, which allows any row from any file and is optimized using the cache. It is common to read multiple lines from a single file.
# From the linecache name, we can know that this module is related to the cache.
# Linecache reads the file into the cache. It is unnecessary to read the file from the hard disk when accessing the file in the future.
# So it is often used for files with a high reading frequency.
You can also refer to: open ()
Linecache provides the following functions:
Linecache. getlines (filename, module_globals = None)
Get all content from a file named filename, output as a list format, one element in each behavior list of the file, and stored as a linenum-1 element in the list location
Linecache. getline (filename, lineno, module_globals = None)
Obtain the line lineno from the file named filename. This function will never throw an exception-when an error is generated, it will return "(The linefeed will be included in the row found)
If the file is not found, this function will be searched in sys. path.
Linecache. Sort AchE ()
Clear cache. If you no longer need the rows previously obtained from getline ()
Linecache. checkcache (filename = None)
Check the validity of the cache. If the files in the cache change on the hard disk and you need to update the version, use this function. If filename is omitted, all entries in the cache will be checked.
Linecache. updatecache (filename, module_globals = None)
Update the cache with the file name filename. If the filename file is updated, you can use this function to update the list returned by linecache. getlines (filename ).
The following is an example:
Import linecacheimport pprint # create a file named filename = 'linecacheTest.txt 'myfile = open (filename, 'w') for I in range (1, 5): myfile. write ('this is the '+ str (I) + 'th line \ n') myfile. close () # obtain all rows pprint. pprint (linecache. getlines (filename) # obtain any row of pprint. pprint (linecache. getline (filename, 3) # obtain pprint of Row 3 and 4. pprint (linecache. getlines (filename) [2: 4]) # Release the cache linecache. sort AchE ()
The result is:
['This is the 1th line\n', 'This is the 2th line\n', 'This is the 3th line\n', 'This is the 4th line\n']'This is the 3th line\n'['This is the 3th line\n', 'This is the 4thline\n']
Note: Use linecache. after getlines (filename) opens the file content, if the filename file changes, if you want to use linecache again. the content obtained by getlines (filename) is not the latest content of the file or the previous content. There are two methods:
The latest content of the token;
2. Use linecache.updatecache(filename.pdf to obtain the latest a.txt content.
In addition:
1) after reading a file, you do not need to use the File Cache. You need to clear the cache at the end so that linecache. languache () can clear the cache and release the cache.
2) This module uses memory to cache file content, so it consumes memory. The size and speed of the opened file are related to your memory size.
Glob Module
Globbing is a wildcard. This module is used to find the file path name that complies with specific rules.
Common wildcards include the following:
*Matches everything
?Matches any single character
[Seq]Matchesany character in seq
[! Seq]Matchesany character not in seq
The glob module provides the following functions:
Glob. glob (pathname)
Returns the list of all matched file paths. It has only one pathname parameter and defines the file path matching rule. Here it can be an absolute path or a relative path, and wildcards can be used.
Glob. iglob (pathname)
Obtain an object that can be traversed. You can use it to obtain matching file path names one by one. The difference with glob. glob () Is that glob. glob obtains all matching paths at the same time, while glob. iglob obtains only one matching path at a time, which is generally used to process each path cyclically.
Glob. escape (pathname)
Ignore all wildcards. If the file name contains wildcards but you do not want to use '\' to escape them one by one, use this function to ignore all wildcards.
The following is an example:
>>> Import glob >>> print (glob. glob ('a *. * 'autohomehtml.html ', 'autohomeparser.py', 'autotemp.txt', 'autotempfile1.txt '] >>> print (glob. glob ('*. py ') ['autohomeparser. py', 'beautifulsouptest. py ', 'collectionstest. py', 'itertoolstest. py', 'linecachetest. py ', 'linecachetest _ forBlog. py', 'lxmltest. py', 'myre. py ', 'pyqttest. py ', 'requeststest. py', 'tablibtest. py', 'timeittest. py', 'urllibtest. py ']> print (glob. glob ('* [0-9] *. * 'Too many characters 'autotempfile1.txt '] >>> print (glob. glob ('* .txt'{}'autotemp.txt', 'autotempfile1.txt', 'linecachetest.txt ', 'linecachetext.txt', 'mypage.txt ', 'Install scrapy.txt']> for I in glob. iglob ('*. py '): print (I) autohomeParser. pybeautifulSoupTest. pycollectionsTest. pyitertoolsTest. pylinecacheTest. bytes