Tips on using Python for efficient file I/O operations

Source: Internet
Author: User
How to read and write text files? In actual cases, the encoding format of a text file has been straight (such as UTF-8, GBK, BIG5), how to read these files in python2x and python3x respectively? Solution: differentiate how to read and write text files?

Actual case

A text file encoding format has been straight (such as UTF-8, GBK, BIG5), respectively in python2.x and python3.x how to read these files?

Solution

Distinguish between python2 and python3.

The semantics of the string has changed:

Python2 Python3
Str Bytes
Unicode Str


Python2.x is unicode encoded before being written to a file. after reading the file, the binary string is decoded.

>>> F = open('py2.txt ', 'w') >>> s = u'' >>> f. write (s. encode ('gbk') >>> f. close () >>> f = open('py2.txt ', 'r') >>> t = f. read () >>> print t. decode ('gbk ')

Hi!

In python3.x, the open function specifies the text mode of t, and encoding specifies the encoding format.

>>> F = open('py3.txt ', 'WT', encoding = 'utf-8') >>> f. write (' ') 2 >>> f. close () >>> f = open('py3.txt ', 'RT', encoding = 'utf-8') >>> s = f. read () >>> s 'Hello'
How to set file Buffering

Actual case

When writing file content to a hard disk device, you can use the system call. This type of I/O operation takes a long time. to reduce the number of I/O operations, files usually use a buffer (with enough data for system calling). The Cache behavior of files can be divided into full buffering, row caching, and no buffering.

How to set the buffer for file objects in Python?

Solution

Full Buffer: the buffering of the open function is set to an integer n greater than 1, and n is the buffer size.

>>> F = open('demo2.txt ', 'W', buffering = 2048) >>> f. write ('+' * 1024) >>> f. write ('+' * 1023) # write a file when it is greater than 2048> f. write ('-' * 2) >>> f. close ()

Row buffer: set buffering of the open function to 1.

>>> F = open('demo3.txt ', 'W', buffering = 1) >>> f. write ('ABC') >>> f. write ('20140901') # write to the file as long as \ n is added >>> f. write ('\ n') >>> f. close ()

No buffer: set buffering of the open function to 0.

>>> f = open('demo4.txt', 'w', buffering=0)>>> f.write('a')>>> f.write('b')>>> f.close()
How to map files to memory?

Actual case

When accessing some binary files, you can map the files to the memory for random access. (framebuffer device files)

Some embedded devices include registers in the memory address space. we can map/dev/mem to access these registers.

If multiple processes are mapped to the same file, process communication can also be achieved.

Solution

Using the mmap () function of the mmap module in the standard library, it requires an open file descriptor as a parameter

Create the following file

Root@pythontab.com ~ # Dd if =/dev/zero of = demo. bin bs = 1024 count = 10241024 + 0 records in1024 + 0 records out1048576 bytes (1.0 MB) copied, 0.00380084 s, 276 MB/s # View file content in hexadecimal format [root@pythontab.com ~] # Od-x demo. bin 0000000 0000 0000 0000 0000 0000 0000 0000 0000*4000000
>>> Import mmap >>> import OS >>> f = open ('demo. bin', 'R + B ') # obtain the file descriptor> f. fileno () 3 >>> m = mmap. mmap (f. fileno (), 0, access = mmap. ACCESS_WRITE) >>> type (m)
 
  
# You can obtain content through indexes> m [0] '\ x00'> m [] '\ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00 \ x00' # modify content> m [0] = '\ x88'
 

View

[root@pythontab.com ~]# od -x demo.bin 0000000 0088 0000 0000 0000 0000 0000 0000 00000000020 0000 0000 0000 0000 0000 0000 0000 0000*4000000

Modify slice

>>> m[4:8] = '\xff' * 4

View

[root@pythontab.com ~]# od -x demo.bin 0000000 0088 0000 ffff ffff 0000 0000 0000 00000000020 0000 0000 0000 0000 0000 0000 0000 0000*4000000
>>> m = mmap.mmap(f.fileno(),mmap.PAGESIZE * 8,access=mmap.ACCESS_WRITE,offset=mmap.PAGESIZE * 4) >>> m[:0x1000] = '\xaa' * 0x1000

View

[root@pythontab.com ~]# od -x demo.bin 0000000 0088 0000 ffff ffff 0000 0000 0000 00000000020 0000 0000 0000 0000 0000 0000 0000 0000*0040000 aaaa aaaa aaaa aaaa aaaa aaaa aaaa aaaa*0050000 0000 0000 0000 0000 0000 0000 0000 0000*4000000
How to access the file status?

Actual case

In some projects, we need to obtain the file status, for example:

File type (common file, directory, symbolic link, device file ...)

File access permission

Last file access/modification/node status change time

Size of common files

.....

Solution

The current directory contains the following files:

[root@pythontab.com 2017]# lltotal 4drwxr-xr-x 2 root root 4096 Sep 16 11:35 dirs-rw-r--r-- 1 root root 0 Sep 16 11:35 fileslrwxrwxrwx 1 root root 37 Sep 16 11:36 lockfile -> /tmp/qtsingleapp-aegisG-46d2-lockfile

System call

The three systems in the OS module in the standard library call stat, fstat, and lstat to obtain the file status.

>>> Import OS >>> s = OS. stat ('Files') >>> sposix. stat_result (st_mode = 33188, st_ino = 267646, st_dev = 51713L, st_nlink = 1, st_uid = 0, st_gid = 0, st_size = 0, st_atime = 1486197100, st_mtime = 1486197100, st_ctime = 1486197100) >>> s. st_mode33188 >>> import stat # stat has many S_IS .. method to determine the file type> stat. s_ISDIR (s. st_mode) False # common file >>> stat. s_ISREG (s. st_mode) True

Obtain the object access permission. if it is greater than 0, it is true.

>>> s.st_mode & stat.S_IRUSR256>>> s.st_mode & stat.S_IXGRP0>>> s.st_mode & stat.S_IXOTH0

Get the file modification time

# Access time> s. st_atime1486197100.3384446 # Modification time> s. st_mtime1486197100.3384446 # Status Update Time> s. st_ctime1486197100.3384446

Convert the obtained timestamp

>>> import time>>> time.localtime(s.st_atime)time.struct_time(tm_year=2016, tm_mon=9, tm_mday=16, tm_hour=11, tm_min=35, tm_sec=47, tm_wday=4, tm_yday=260, tm_isdst=0)

Get the size of a common file

>>> s.st_size0

Shortcut functions

Some functions under OS. path in the standard library are more concise to use.

File type determination

>>> os.path.isdir('dirs') True>>> os.path.islink('lockfile')True>>> os.path.isfile('files') True

File time

>>> os.path.getatime('files')1486197100.3384445>>> os.path.getmtime('files')1486197100.3384445>>> os.path.getctime('files')1486197100.3384445

Get file size

>>> os.path.getsize('files') 0
How to use temporary files?

Actual case

In a project, we collect data from sensors. after each 1 GB of data is collected, we perform data analysis and only save the analysis results. if a large amount of temporary data is stored in the memory, this will consume a lot of memory resources. we can use temporary files to store these temporary data (external storage)

Temporary files do not need to be named, and will be deleted automatically after being closed

Solution

Use the TemporaryFile under the tempfile in the standard library, NamedTemporaryFile

>>> From tempfile import TemporaryFile, NamedTemporaryFile # The object f can only be used for access >>> f = TemporaryFile () >>> f. write ('abcdef '* 100000) # access temporary data> f. seek (0) >>> f. read (100) 'delete' >>> ntf = NamedTemporaryFile () # If you want to keep the file from being deleted every time you create a NamedTemporaryFile () object, you can set NamedTemporaryFile (delete = False) >>> ntf. name # return the path of the current temporary file in the file system '/tmp/tmppnvna6'

The above is a detailed description of how to use the file I/O efficient operations and processing techniques in Python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.