Record acquisition data through

Source: Internet
Author: User

First look at the results

Approximate process:

Use jquery to get the mass of content rendered in a page

You can get the large amount of data through the page one time, get the original data through jquery and show it in the console panel. Each of the pieces I was directly stitching into the SQL display.

Open chrome and go to address http://www.autozi.com/carBrandLetter/.html. Press F12 to point to the console panel. Paste the following content

$ ("Tr.event_scroll"). each (function(i) {var_this = $ ( This); //Audi, BMW all main brands   varMainbrandname = _this.find (' Th>h4 ')). text (); varSerieslist = $ ( This). Find ('. Car-series-list Li '); $.each (Serieslist,function(i, EL) {//Sub-brands under each brand, such as Audi under the import Audi and FAW Audi       varSubbrandname = $ (EL). Find (' H4 '). text (); //various car systems, such as the Audi A6,A4       varModels = $ (EL). Find (' A.carmodel ') $.each (models,function(j, Element) {varModel =$ (Element). text (); varCarseriesid = Getcarseriesid ($ (Element). attr (' S_href ')); //stitching into SQL statements, inserting the database withGetSQL (SUBBRANDNAME,MODEL,CARSERIESID); })   });});//get parameter IDs by address//such as http://www.autozi.com:80/carmodels.do?carSeriesId=1306030951244661 get 1306030951244661functionGetcarseriesid (str) {returnStr.slice (str.indexof (' = ') +1);}//stitching into SQL statements, inserting the database with//INSERT INTO TableName (brandname, name, Carseriesid) VALUES ("Faw Audi", "A6", "425");functionGetSQL (subbrandname,model,carseriesid) {varstr = ' INSERT into TableName (brandname, name, Carseriesid) VALUES ("' +subbrandname+ '", "' +model+ '", "' +carseriesid+ '"); '; Console.log (str);}

Enter, shown below.

So I got all the car brands, sub-brands and car lines.

But the specific models containing the year and the displacement have not been able to get it. Like the Audi A6L. There are 2011 years of 2.0L, with 2005 years of 4.2L.

The website was made to display in the pop-up window.

For example, click A6L. Send an AJAX request with the request address: http://www.autozi.com/carmodels.do?carSeriesId=425&_=1462335011762

When you click on the second page, a new AJAX request is launched and the request address is: http://www.autozi.com/carmodelsAjax.do?currentPageNo=2&carSeriesId=425&_= 1462335011762

Audi A6L A total of four pages carseriesid=425 just got it. To obtain a6l for all years and displacement. To initiate four requests, the address is:

http://www.autozi.com/carmodelsajax.do?currentpageno=[#page]&carseriesid=425

[#page] is 1-4. Each time you change the value of the paging parameter. When the request does not exist for the http://www.autozi.com/carmodelsAjax.do?currentPageNo=5&carSeriesId=425. will return an empty page.

Think about it. Use Python's BeautifulSoup class library to capture Web content. It just came in handy here.

Get content from a page using Python

Getsoup is to open the page and return the HTML, the initial PageNo parameter in the request page address is 1. Determines whether the returned HTML is empty. If there is inside content then pageno+1. Continue to request this address.

If not, request the address of the next car system.

Pause for 10 seconds between each of the two car systems. Because I found that if the operation is too frequent the server will return an empty

 fromUrllib.requestImportUrlopen fromBs4ImportBeautifulSoup fromTimeImportSleep#CarlistdefgetList (Carlist): Fo= Open ("Cars.txt","A +")     forLinkinchSoup.find_all ("a", class_="Link"):        #Print (Link.get (' title '))Fo.write (Link.get ('title')+'\ n') Fo.close ()defGetsoup (ModelID, pagenumber): Tpl_url="http://www.autozi.com/carmodelsajax.do?carseriesid=[#id]&currentpageno=[#page]"Real_url= Tpl_url.replace ('[#id]', str (modelid)) Real_url= Real_url.replace ('[#page]', str (pagenumber)) From_url= Urlopen (Real_url). Read (). Decode ('Utf-8') Soup= BeautifulSoup (From_url,"Html5lib")    returnSoupmodelids= [741,1121,357,1055] forModelIDinchModelids:flag=True i= 1 whileFlag:soup=Getsoup (ModelID, i) Carlist= Soup.find_all ('Li', Limit=1)        ifLen (Carlist): GetList (Carlist) I=i+1Else: Flag=False Sleep (10)

Record acquisition data through

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.