Python captures a car network data parsing html saved to excel example

Source: Internet
Author: User

1. address of a car website

2. After using firefox to view the information, I found that the website does not use json data, but is simply an html page.

3. Use pyquery in the PyQuery library for html Parsing

Page Style:

Copy codeThe Code is as follows:
Def get_dealer_info (self ):
"Get dealer information """
Css_select = 'html body div. box div. news_wrapper div. main div. news_list div. service_main div table tr'
# Use the auto-copy css path in Firefox to obtain the desired location data
Page = urllib2.urlopen (self. entry_url). read ()
# Reading pages
Page = page. replace ('<br/> ','&')
Page = page. replace ('<br/> ','&')
# Because the br line breaks are used in the phone information on the page, problems may occur during crawling.
# Problem: if the data in a pair of tags is obtained, including <br/>, data worth before the br will appear, and the subsequent data will not be available, the reason is that html Parsing is the ending standard of the task/>.
D = pq (page)
# Use PyQuery to parse the page. pq = PyQuery here, because from pyquery import PyQuery as pq
Dealer_list = []
# Create a list for submission to the storage method
For dealer_div in d (css_select ):
# Locate tr here. The specific data is in the td tag of this tag.
P = dealer_div.findall ('td ')
# Here p is a set of all td data in a tr tag.
Dealer = {}
# The dictionary stores the information of a store and submits it to the list.
If len (p) = 1:
# Here the Togo if judgment is used to process the data, because some formats do not meet the requirements of the final data and need to be removed, this fast code depends on the needs
Print '@'
Elif len (p) = 6:
Strp = p [0]. text. strip ()
Dealer [Constant. CITY] = p [1]. text. strip ()
Strc = p [2]. text. strip ()

Dealer [Constant. PROVINCE] = p [0]. text. strip ()
Dealer [Constant. CITY] = p [1]. text. strip ()
Dealer [Constant. NAME] = p [2]. text. strip ()
Dealer [Constant. ADDRESSTYPE] = p [3]. text. strip ()
Dealer [Constant. ADDRESS] = p [4]. text. strip ()
Dealer [Constant. TELPHONE] = p [5]. text. strip ()
Dealer_list.append (dealer)
Elif len (p) = 5:
If p [0]. text. strip ()! = U'province ':
Dealer [Constant. PROVINCE] = strp
Dealer [Constant. CITY] = p [0]. text. strip ()
Dealer [Constant. NAME] = p [1]. text. strip ()
Dealer [Constant. ADDRESSTYPE] = p [2]. text. strip ()
Dealer [Constant. ADDRESS] = p [3]. text. strip ()
Dealer [Constant. TELPHONE] = p [4]. text. strip ()
Dealer_list.append (dealer)
Elif len (p) = 3:
Print '@@'
Print '@@@'
Self. saver. add (dealer_list)
Self. saver. commit ()

4. The final code is successfully executed. The corresponding data is obtained and saved to excel.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.