1. Storing data in Python statements
When writing a file, we mainly use the with open () statement:
With open (name,mode,encoding) as File:file.write () # Note that the statement following the with open () has an indent
Name: A string containing the name of the file, for example: ' Xiaozhu.txt '; mode: Determines the pattern of open files, read/write/append, etc.; encoding: Indicates that we want to write data encoding, generally utf-8 or GBK; file: Represents the naming of the files in our code.
Take a look at the little pig we crawled in front of us, what it actually looks like:
From lxml import etreeimport requestsimport timewith open ('/Users/mac/Desktop/ Xzzf.txt ', ' W ', encoding= ' Utf-8 ') as f: for a in range (1,6): url = ' http://cd.xiaozhu.com/search-duanzufang-p{}-0 /'. Format (a) data = requests.get (URL) .text s=etree. HTML (data) file=s.xpath ('//*[@id = ' page_list ']/ul/li ') time.sleep (3) for div in file: title=div.xpath ("./div[2]/div/a/span/text ()") [0] price=div.xpath ("./div[2]/span[1]/i/text ()") [0] scrible=div.xpath ("./div[2]/div/em/text ()") [0].strip () pic=div.xpath ("./a/img/@lazy_src") [0] f.write ("{},{},{},{}\n". Format (Title,price,scrible,pic))
The file name Xzzf.txt will be written to if it is not created automatically.
/users/mac/desktop/xzzf.txt
Before adding a path to the desktop, it will exist on the desktop, and if you do not add the path, it will exist in your current working directory.
W: Write-only mode, if no files will be created automatically;
Encoding= ' Utf-8 ': Specifies that the encoding to write the file is: Utf-8, the general designation Utf-8 can;
F.write ("{} {} {} {}\n". Format (title,price,scrible,pic)) #将 The value of the title,price,scrible,pic to write to the file
Take a look at how the data is stored:
If you do not specify a file path, how can I find a file that is written locally? There are two ways to do this:
1. Open Cortana (Cortana) in WIN10 and search for your file name
2. Recommended software "Everything", query documents more convenient and quick.
This software is very small, Baidu is easy to find, but it is really the artifact used you will come back to thank me ~
Therefore, it is recommended that you write the code, honestly in front of the file name plus the path you want to store. What, you don't even know how to write a path? OK, like I want to put the file on the desktop, so how to view the path?
Look for a document, such as the desktop document, right-> "Properties", "location" after the information, is the path of the document.
2. Save the file in CSV format
From lxml import etreeimport requestsimport timewith open ('/Users/mac/Desktop/ Xiaozhu.csv ', ' W ', encoding= ' Utf-8 ') as f: for a in range (1,6): url = ' http://cd.xiaozhu.com/search-duanzufang-p{}-0 /'. Format (a) data = requests.get (URL) .text s=etree. HTML (data) file=s.xpath ('//*[@id = ' page_list ']/ul/li ') time.sleep (3) for div in file: title=div.xpath ("./div[2]/div/a/span/text ()") [0] price=div.xpath ("./div[2]/span[1]/i/text ()") [0]  &NBSp; scrible=div.xpath ("./div[2]/div/em/text ()") [0].strip ( ) pic=div.xpath ("./a/img/@lazy_src") [0] f.write ("{},{},{},{}\n". Format (Title,price,scrible,pic))
Also, be aware that the CSV is separated by commas between each field, so the previous space is changed to a comma.
How can the CSV file be opened?
In general, with Notepad can be opened directly, if you open directly with Excel, it is very likely to appear garbled, like the following:
Excel Open CSV garbled what to do?
Open a file in Notepad
Save As – Select Encode as "ANSI"
Then take a look at the previous Watercress TOP250 book written to the file:
From lxml import etreeimport requestsimport timewith open ('/Users/mac/Desktop/ Top250.csv ', ' W ', encoding= ' Utf-8 ') as f: for a in range (10): url = ' https://book.douban.com/top250?start={} '. Format (a*25) data = requests.get (URL) .text s=etree. HTML (data) file=s.xpath ('//*[@id = ' content ']/div/div[1]/div/ Table ') time.sleep (3) for div in file: Title = div.xpath ("./tr/td[2]/div[1]/a/@title") [0] href = div.xpath ("./tr/td[2]/div[1]/a/@href") [0] &NBSp; score=div.xpath ("./tr/td[2]/div[2]/span[2]/text () ") [0] num=div.xpath ("./tr/td[2]/div[2 ]/span[3]/text () ") [0].strip ("). Strip (). Strip (")"). Strip () scrible=div.xpath ("./tr/td[2]/p[2]/span/text ()") if len (scrible) > 0: f.write ("{},{},{},{},{}\n". Format (Title,href,score , Num,scrible[0])) else: f.write ("{},{},{},{}\n"). Format (title,href,score,num))
The last data to be saved is this:
All right, here's the lesson!
Getting started with Python crawlers | 6 to store crawled data locally