Getting started with Python crawlers | 6 to store crawled data locally

Source: Internet
Author: User
Tags xpath scrible





1. Storing data in Python statements

When writing a file, we mainly use the with open () statement:

With open (name,mode,encoding) as File:file.write () # Note that the statement following the with open () has an indent

Name: A string containing the name of the file, for example: ' Xiaozhu.txt '; mode: Determines the pattern of open files, read/write/append, etc.; encoding: Indicates that we want to write data encoding, generally utf-8 or GBK; file: Represents the naming of the files in our code.

Take a look at the little pig we crawled in front of us, what it actually looks like:

From lxml import etreeimport requestsimport timewith open ('/Users/mac/Desktop/ Xzzf.txt ', ' W ', encoding= ' Utf-8 ')  as f:    for a in range (1,6):         url =  ' http://cd.xiaozhu.com/search-duanzufang-p{}-0 /'. Format (a)         data = requests.get (URL) .text         s=etree. HTML (data)         file=s.xpath ('//*[@id = ' page_list ']/ul/li ')          time.sleep (3)              for div in file:             title=div.xpath ("./div[2]/div/a/span/text ()") [0]             price=div.xpath ("./div[2]/span[1]/i/text ()") [0]            scrible=div.xpath ("./div[2]/div/em/text ()") [0].strip ()             pic=div.xpath ("./a/img/@lazy_src") [0]                          f.write ("{},{},{},{}\n". Format (Title,price,scrible,pic))

The file name Xzzf.txt will be written to if it is not created automatically.

/users/mac/desktop/xzzf.txt

Before adding a path to the desktop, it will exist on the desktop, and if you do not add the path, it will exist in your current working directory.

W: Write-only mode, if no files will be created automatically;

Encoding= ' Utf-8 ': Specifies that the encoding to write the file is: Utf-8, the general designation Utf-8 can;

F.write ("{} {} {} {}\n". Format (title,price,scrible,pic)) #将 The value of the title,price,scrible,pic to write to the file

Take a look at how the data is stored:

If you do not specify a file path, how can I find a file that is written locally? There are two ways to do this:

1. Open Cortana (Cortana) in WIN10 and search for your file name

2. Recommended software "Everything", query documents more convenient and quick.

This software is very small, Baidu is easy to find, but it is really the artifact used you will come back to thank me ~

Therefore, it is recommended that you write the code, honestly in front of the file name plus the path you want to store. What, you don't even know how to write a path? OK, like I want to put the file on the desktop, so how to view the path?

Look for a document, such as the desktop document, right-> "Properties", "location" after the information, is the path of the document.


2. Save the file in CSV format

From lxml import etreeimport requestsimport timewith open ('/Users/mac/Desktop/ Xiaozhu.csv ', ' W ', encoding= ' Utf-8 ')  as f:    for a in range (1,6):         url =  ' http://cd.xiaozhu.com/search-duanzufang-p{}-0 /'. Format (a)         data = requests.get (URL) .text         s=etree. HTML (data)         file=s.xpath ('//*[@id = ' page_list ']/ul/li ')          time.sleep (3)              for div in file:             title=div.xpath ("./div[2]/div/a/span/text ()") [0]             price=div.xpath ("./div[2]/span[1]/i/text ()") [0]  &NBSp;         scrible=div.xpath ("./div[2]/div/em/text ()") [0].strip ( )             pic=div.xpath ("./a/img/@lazy_src") [0]                          f.write ("{},{},{},{}\n". Format (Title,price,scrible,pic))

Also, be aware that the CSV is separated by commas between each field, so the previous space is changed to a comma.

How can the CSV file be opened?

In general, with Notepad can be opened directly, if you open directly with Excel, it is very likely to appear garbled, like the following:

Excel Open CSV garbled what to do?

    1. Open a file in Notepad

    2. Save As – Select Encode as "ANSI"

Then take a look at the previous Watercress TOP250 book written to the file:

From lxml import etreeimport requestsimport timewith open ('/Users/mac/Desktop/ Top250.csv ', ' W ', encoding= ' Utf-8 ')  as f:    for a in range (10):         url =  ' https://book.douban.com/top250?start={} '. Format (a*25)         data = requests.get (URL) .text         s=etree. HTML (data)         file=s.xpath ('//*[@id = ' content ']/div/div[1]/div/ Table ')         time.sleep (3)          for div in file:             Title = div.xpath ("./tr/td[2]/div[1]/a/@title") [0]             href = div.xpath ("./tr/td[2]/div[1]/a/@href") [0] &NBSp;          score=div.xpath ("./tr/td[2]/div[2]/span[2]/text () ") [0]            num=div.xpath ("./tr/td[2]/div[2 ]/span[3]/text () ") [0].strip ("). Strip (). Strip (")"). Strip ()              scrible=div.xpath ("./tr/td[2]/p[2]/span/text ()")              if len (scrible)  > 0:                 f.write ("{},{},{},{},{}\n". Format (Title,href,score , Num,scrible[0]))             else:                 f.write ("{},{},{},{}\n"). Format (title,href,score,num))

The last data to be saved is this:

All right, here's the lesson!



Getting started with Python crawlers | 6 to store crawled data locally

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.