1. address of a car website
2. After using firefox to view the information, I found that the website does not use json data, but is simply an html page.
3. Use pyquery in the PyQuery library for html Parsing
Page Style:
Copy codeThe Code is as follows:
Def get_dealer_info (self ):
"Get dealer information """
Css_select = 'html body div. box div. news_wrapper div. main div. news_list div. service_main div table tr'
# Use the auto-copy css path in Firefox to obtain the desired location data
Page = urllib2.urlopen (self. entry_url). read ()
# Reading pages
Page = page. replace ('<br/> ','&')
Page = page. replace ('<br/> ','&')
# Because the br line breaks are used in the phone information on the page, problems may occur during crawling.
# Problem: if the data in a pair of tags is obtained, including <br/>, data worth before the br will appear, and the subsequent data will not be available, the reason is that html Parsing is the ending standard of the task/>.
D = pq (page)
# Use PyQuery to parse the page. pq = PyQuery here, because from pyquery import PyQuery as pq
Dealer_list = []
# Create a list for submission to the storage method
For dealer_div in d (css_select ):
# Locate tr here. The specific data is in the td tag of this tag.
P = dealer_div.findall ('td ')
# Here p is a set of all td data in a tr tag.
Dealer = {}
# The dictionary stores the information of a store and submits it to the list.
If len (p) = 1:
# Here the Togo if judgment is used to process the data, because some formats do not meet the requirements of the final data and need to be removed, this fast code depends on the needs
Print '@'
Elif len (p) = 6:
Strp = p [0]. text. strip ()
Dealer [Constant. CITY] = p [1]. text. strip ()
Strc = p [2]. text. strip ()
Dealer [Constant. PROVINCE] = p [0]. text. strip ()
Dealer [Constant. CITY] = p [1]. text. strip ()
Dealer [Constant. NAME] = p [2]. text. strip ()
Dealer [Constant. ADDRESSTYPE] = p [3]. text. strip ()
Dealer [Constant. ADDRESS] = p [4]. text. strip ()
Dealer [Constant. TELPHONE] = p [5]. text. strip ()
Dealer_list.append (dealer)
Elif len (p) = 5:
If p [0]. text. strip ()! = U'province ':
Dealer [Constant. PROVINCE] = strp
Dealer [Constant. CITY] = p [0]. text. strip ()
Dealer [Constant. NAME] = p [1]. text. strip ()
Dealer [Constant. ADDRESSTYPE] = p [2]. text. strip ()
Dealer [Constant. ADDRESS] = p [3]. text. strip ()
Dealer [Constant. TELPHONE] = p [4]. text. strip ()
Dealer_list.append (dealer)
Elif len (p) = 3:
Print '@@'
Print '@@@'
Self. saver. add (dealer_list)
Self. saver. commit ()
4. The final code is successfully executed. The corresponding data is obtained and saved to excel.