Python Learning Journey-day05 (file operations)

Source: Internet
Author: User

Objective:

The previous 5 blogs mainly related to the basic knowledge of Python and focus on difficult issues related to the collation, this blog mainly for file operations related knowledge points to do a systematic comb, in order to help everyone quickly grasp the knowledge of the file operation.

The ability to read and write files on disk is provided by the operating system, and the modern operating system does not allow the normal program to manipulate the disk directly, so the read-write file is requested by the operating system to open a file object (often referred to as a file descriptor) and then read the data from the file object (read file) through the interface provided by the operating or write the data to this file object (write file).

1. Open File

To manipulate the file, we first need to open a file object in the mode of reading the file, using the Python built-in open () function, the absolute path of the incoming file, the mode of reading the file, and the encoding mode of the file, as follows:

 file_obj = open (' Bigdata.txt ', encoding= ' utf-8 ', mode= ' R ') by default, the operation mode of the file is read-only, So if we only want to manipulate the file in a read-only mode, there is usually 
no write operation mode here. In this way, we have successfully opened a file.
However, if the file does not exist, the open () function throws a ioerror error, and once the IOError error is generated, the file_obj will not execute. To ensure that the file is closed correctly, whether or not it is an error, we can use try ... finally to implement the
Try :     = Open ('cisco.txt'w', encoding='utf-8  ')    print(File_obj.read ())finally:     if  file_obj:        file_obj.close ()

But it can be cumbersome to do this often, so we could use the WITH statement to automatically call the close () method for us:

With open ('cisco.txt'w', encoding='utf-8  ') as File_obj:    Print(File_obj.read ())

2.2nd We look at the three methods of reading a file, the read () method, the ReadLine method, the ReadLines () method, and these three methods are common to us and must be mastered. Comb as follows:

1.read method; It reads the entire contents of the file at once, and all the contents of the file are in string form. If the file has 10G, the memory explodes, so for the sake of insurance, you can repeatedly call the read (size) method to read the contents of a size byte up to a maximum of one time.

A file User.txt content in Pycharm is as follows:

spark| ciscohadoop|123456python|123123hive| Cisco123kafka|131420

We use the Read () method to read the entire content, as shown in:

2.readline Method: This method means that a single line of content is read into memory at a time, which saves memory compared to the read () method. Note that this method still gets the string format, but the first line of data in the file is the string form, again, you get the second row of data string format, and so on.

3.readlines () Method: This method reads all content one time and returns the list by row; Notice that unlike the previous two methods, the method gets a list structure, and the elements of the list are strings. Each element in the list is a string format for each row of data in the file.

Note that all of the above methods contain invisible characters-line breaks: \ n; We can print and see,

Therefore, if you are using the above method, be sure to note that there is an invisible character in the following default: \ n. Therefore, it is necessary to use the strip () method to remove ' \ n ' from the original user.txt content, as follows:

You can see that there are spaces between each of the above lines, which do not meet our output requirements, so we need to remove the spaces and use the Strip () method:

The above 3 methods are often used by us, below we use a topic to synthesize the following methods

Requirement: Add suffix for each user name in User.txt: _somebody. The resulting output is:
The content in User.txt is:

Spark|cisco|23
Hadoop|123456|34
python|123123|333
Hive|cisco123|12
Kafka|131420|90

spark_somebody|22| malehadoop_somebody|34| femalehive_somebody|111|female
Import OS
With open (' User.txt ', encoding= ' Utf-8 ') as File_obj1, open (' User1.txt ', ' W ', encoding= ' Utf-8 ') as File_obj2:
For line in File_obj1.readlines ():
User_list = Line.split (' | ')
User_list[0] = user_list[0] + "_somebody"
line = ' | '. Join (User_list)
File_obj2.write (line)

Os.remove (' User.txt ')
Os.rename (' User1.txt ', ' user.txt ')

The author himself in the process of doing this problem encountered a lot of pits, is now sorted as follows:

"001" Here to make changes to the source file, so it is related to write files, so we need to open a file User1.txt, and set the operation mode of the file is read-only, in addition to the mode specified here, the pattern of the designation and encoding sequence can not be reversed, otherwise it will error;

"002" refers to the operation of two files, so you need to use the WITH keyword to open the file, two simultaneous write, you need to use the following form:

With open (' User.txt ', encoding= ' Utf-8 ') as File_obj1, open (' User1.txt ', ' W ', encoding= ' Utf-8 ') as File_obj2:

"003" This way, we're writing a line to a new file each time a row is modified, and Line.strip (' \ n ') is used to start slicing each line of user strings. Split (' | '), But when we wrote the file to User2.txt after modifying the username, we found that all the user information strings were on one line, and finally through the analysis we found that we did not add a newline character when we wrote the string: ' \ n '. So it will lead to such a result. To avoid such errors, modify the following:line= ' | '. Join (user_list) + ' \ n '

"004" Also note the use of the OS module here: OS module is used to interact with the operating system module, because the file operation needs to interact with the operating system, so we are here to delete files and rename the file, import the OS module. Here are two methods, remove () method: Delete a file, rename (old file, new file): Rename the file. Later on the OS module dedicated to a blog to comb.

After finishing the method of reading the file, continue to comb how to write the content to the file: Write (), WriteLine (), Writelines () method.

4.write () Method: writes content to a file, such as writing to a file test.txt, and then writes if the Test.txt file does not exist. If present, the contents of the Test.txt file are emptied first, and then the content is written. The Writelines () method is a list-based operation that takes a list of strings as arguments, writes them to a file, and does not automatically join the newline character, so you need to explicitly add a newline character. We use the following practical examples to illustrate the usage of these 2 methods.

# Write the contents of the list msg into the file Test.txt, as follows:
With open ('Test.txt', mode='W', encoding='Utf-8') as File_obj1:msg= ['Write Date','To Test.txt','Finish'] forIteminchMsg:file_obj1.write (item)
#文件运行完毕后的内容如下:

Because the line break is not shown here, the content in the final file is one line.

Here we show the line break, and then observe the changes in the Test.txt file, note that since the Test,txt file already exists, it will overwrite the contents of the source file when the file is written again, the code is as follows:
With open ('Test.txt', mode='W', encoding='Utf-8') as File_obj1:msg= ['Write spark\n','To test.txt\n','finish\n']     forIteminchmsg:file_obj1.write (item)
This shows that a newline character is specified, so the contents of the file are:

This is the point that you want to pay extra attention to when writing a file, and by default there is no line break.

The Writelines () method is then demonstrated below:

Because the Writelines () method takes a list as a parameter, it can write all the contents of the list to the file at once, so here we don't have to use the For loop to iterate through the contents of the list

Then write again, and take a look at the following code:

With open ('demo.txt', mode='w', encoding='  Utf-8') as file_obj1:    = ['write hadoop\n' to test.txt'finish']    file_ Obj1.writelines (msg)

The final printing result of the program is:

You can see that the contents of the list are all written to the file.

5. We then comb the Python file in the file read mode permissions, this is the author often forget things, so here I do a unified comb, in order to solve the doubts of everyone.

"001" R: Read-only mode (default), error if file does not exist

"002" W: Write-only mode, created if the file does not exist, or overwrite the contents of the original file if the file exists

"003" A: Append mode, non-readable, can only write. If the file exists, the content is added from the bottom and created if the file does not exist. Let's take a look at the following example:

The contents of the file in the original file Demo.txt are:
Welcome to Beijing, I like big data this job
Good morning
Hello
How is it?

Use a mode and then read the file to find the error:
With open (' Demo2.txt ', mode= ' a ', encoding= ' Utf-8 ') as File_obj1:
Print (File_obj1.read ())
#打印结果为: IO. Unsupportedoperation:not readable
proves that a mode is unreadable.

Let's proceed to verify the Append mode of a, what happens when a file exists? The code and printing results are as follows:

The Demo3 file exists with the following file contents:

Welcome to Beijing, I like big data this job
Good morning
Hello
How is it?
 with Open ( "  Demo3.txt   ", Mode="  a  " , Encoding= " Span style= "COLOR: #800000" >utf-8   " ) as File_obj1:msg  = [ " write hadoop\n  Span style= "COLOR: #800000" > ", "  to Test.txt   ", "   Finish   " ] file_obj1.writelines (msg) 
After the program finishes running:

we found that adding the contents of the list through the Writelines () method is toward the end of the original file Demo3.txt How is it added after? This proves the permission of a append mode.

"004" r+: Readable, writable, can be appended. If the file does not exist, it will not be created, and if the file exists, it will begin writing from the top of the original file and overwrite the previous content. Examples of the procedures are:

The contents of the file in Demo2.txt are as follows, and the file exists: Good morning.


To run the program:
With open (' Demo2.txt ', mode= ' r+ ', encoding= ' Utf-8 ') as File_obj1:

File_obj1.write (' Beijing ')

The results after the run are as follows:
Beijing good Hello how is you?

Conclusion: When a file is present, use r+ to manipulate the file, which starts at the top of the original file and overwrites the previous content. Below to verify what happens when a file does not exist?

Delete the file that you just demo2.txt, and then run the following program to see what happens?

running the program: with Open ('Demo2.txt', mode='r+', encoding='Utf-8') as File_obj1:li= ['Beijing Good','Hello','How is it ?'] File_obj1.writelines (LI) will error: Filenotfounderror: [Errno2] No such fileorDirectory'Demo2.txt'Proof R+ does not create a file that does not exist.

"005" w+: Write and reread first; This method opens the file to empty all the contents of the original file, writes the new content, and then reads what has been written. If the file does not exist, it is created. Continue validation:

the original existence file Demo3.txt, the file contents are as follows: Beijing welcome you, I like big data this job good morning how is you?write hadoopto test.txtfinishhadoop|123456|5  Hadoop|123456|9
The procedure is as follows:
With open (' Demo3.txt ', ' w+ ', encoding= ' Utf-8 ') as File_obj:
Li = ["cisco123", "SDWD", "Spark is good"]
File_obj.writelines (LI)
When the program finishes running, the result is:

As a w+, the first thing to do is to put the contents of the source file in context, and then write the new content.

"006" a+:a+ readable writable from the top of the file the content that is added from the bottom of the file does not exist and is created. A + has the same permissions as a, and no program validation is done here.

6. Let's go on to comb the two important methods in the file operation: Seek and Tell methods.

The Seek () method is used to move the file read pointer to the specified location; the Tell () method represents the location where the file read pointer is returned.

The Seek () method syntax is as follows:
Fileobject.seek (offset[, whence])
Parameters
Offset--The starting shift, which represents the number of bytes that need to be shifted
Whence: Optional, default value is 0. Give the offset parameter a definition of where to start the offset, and 0 to start at the beginning of the file, 1 to start at the current position, and 2 for the end of the file.

So we can summarize the following three forms of the Seek method, as follows:

(1) F.seek (p,0) Move when the file is at p Byte, absolute position
(2) F.seek (p,1) moves to p bytes relative to the current position
(3) F.seek (p,2) moves to p bytes after the end of the article

=============================================================
About the file operation related issues, this blog temporarily summed up here! Later, we will combine specific requirements to explain! We mainly need to master the six major permissions of file operations.











Python Learning Journey-day05 (file operations)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.