Python combat 1_3: Crawling Rental information __python

Source: Internet
Author: User
information that needs to be crawled

URL: http://bj.xiaozhu.com/
Crawl information:
Crawl 300 listings on the page, including title, address, daily rent, the first listing pictures link, landlord picture link, landlord sex, landlord name
Code

From BS4 import BeautifulSoup import requests # judge sex def get_sex (sex_icon): if Sex_icon = = [' Member_ico ']: R Eturn "Male" if Sex_icon = = [' Member_ico1 ']: return "female" Else:return "not identified" # get URL link def for each page Get_page_ URL (URL): Web_url = requests.get (URL) web_url_soup = BeautifulSoup (Web_url.text, ' lxml ') Page_urls = Web_url_so
        Up.select (' #page_list > Ul > li > A ') for page_url in Page_urls:each_url = Page_url.get (' href ') Get_detail_info (Each_url) def get_detail_info (URL): web_data = requests.get (URL) soup = BeautifulSoup (web_dat  A.text, ' lxml ') titles = Soup.select (' body > Div.wrap.clearfix.con_bg > Div.con_l > Div.pho_info > H4 > Em ') addresses = Soup.select (' body > Div.wrap.clearfix.con_bg > Div.con_l > Div.pho_info > P > SPAN.PR  5 ') Prices = Soup.select (' #pricePart > div.day_l > Span ') pics1 = Soup.select (' #curBigImage ') owner_pics = Soup.select (' #floatRiGhtbox > Div.js_box.clearfix > Div.member_pic > A > img ') owner_names = Soup.select (' #floatRightBox > D Iv.js_box.clearfix > div.w_240 > H6 > A ') sexes = Soup.select (' #floatRightBox > Div.js_box.clearfix > D  Iv.member_pic > div ') for title, address, Price, Pic1, owner_name, owner_pic, sex in Zip (titles, addresses, prices, PICS1, Owner_names, Owner_pics, sexes): data = 
            {' title ': Title.get_text (), ' Address ': Address.get_text (), ' Price ': Price.get_text (), ' Pic ': pic1.get (' src '), ' owner_pic ': owner_pic.get (' src '), ' name ': Owner_name.get (' Ti Tle '), ' sex ': Get_sex (Sex.get (' class ')} print (data) URL = ["http://bj.xiaozhu.com/search- duanzufang-p{}-0/". Format (number) for number in range (1)] for URL in urls:get_page_url (URL)
Results
{' name ': ' Want ', ' address ': ' On the Wangjing West Garden, Chaoyang District, Beijing \ n ', ' price ': ' 395 ', ' owner_pic ': ' http://image.x Iaozhustatic1.com/21/5,0,44,1477,329,329,ea609ac8.jpg ', ' title ': ' wangjing CLS Line 14 line exquisite luxurious freshman ', ' sex ': ' not identified ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/6,0,39,2965,1800,1200,f17d1a3e.jpg '} {' name ': ' Warm yang Yang sunny ', ' address ': ' Rainbow Road, Chaoyang District, Beijing \ n ', ' price ': ' 798 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/2,0,86,20 6,375,375,d46c51ef.jpg ', ' title ': ' Close to 798, Wangjing, Jiuxianqiao, boutique junior residence. ', ' sex ': ' not identified ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/3,0,34,2819,1800,1200,e051c333.jpg '} {' name ': ' Little tomatoes ', ' address ': ' Taipingqiao 40th, Liu Li Qiao, Fengtai District, Beijing \ n ', ' Price ': ' 368 ', ' owner_pic ': ' Http://image.xia Ozhustatic1.com/21/6,0,72,1777,260,260,887558a2.jpg ', ' title ': ' near Beijing West Station 3 minutes from Electric Power Hospital ', ' sex ': ' Female ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/6,0,28,4451,1800,1200,e3bb1749.jpg '} {' name ': ' Want ', ' address ': '               Guang Shun bei da Jie Li ze xi yuan, Chaoyang District, Peking City \ n                   ', ' price ': ' 395 ', ' owner_pic ': ' Http://image.xiaozhustatic1.com/21/5,0,44,1477,329,329,ea609ac8.jpg ', ' title ': ' Wangjing shopping district, adjacent to the subway 5 minutes, sex theme Big Two habitat ', ' http://image.xiaozhustatic1.com/00,800,533/': ' Not identified ', ' pic ': ' 6,0,66,803,1800,1200,38a4c686.jpg '} {' name ': ' The best time to meet you ', ' address ': ' Chaoyang District, Beijing \ n ', ' PR Ice ': ' 218 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/4,0,84,10730,260,260,6d756363.jpg ', ' title ': ' Hui Xin West Street South Mouth Sunshine Big master bedroom ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/5,0,47,2122,1800,1200,8830e613.jpg '} { ' Name ': ' Warm yang Yang sunny ', ' address ': ' The Rainbow Road, Chaoyang District, Beijing \ n ', ' Price ': ' 268 ', ' owner_pic ': ' Http://image Xiaozhustatic1.com/21/2,0,86,206,375,375,d46c51ef.jpg ', ' title ': ' Independent bathroom 798, Wangjing, Jiuxianqiao more preferential listings. ', ' sex ': ' not identified ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/2,0,71,458,1800,1200,a9c5ea82.jpg '} {' name ': ' Sun-yan ', ' address ': ' North of Apple community, Chaoyang District, Beijing \ n ', ' price ': ' 398 ', ' owner_pic ': ' Http://imaGe.xiaozhustatic1.com/21/5,0,59,2841,363,363,8b6cf3d7.jpg ', ' title ': ' International trade double well line Line 10 Apple serviced Apartment ', ' sex ': ' Female ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/6,0,25,3184,1800,1200,4b993d38.jpg '} {' name ': ' Sister Orchid ', ' address ': ' Shi Li he Zoo Andong, Chaoyang District, Peking City \ n ', ' Price ': ' 279 ', ' owner_pic ': ' HTTP://IMAGE.XIAOZHUSTATIC1.COM/21/4 , 0,2,9806,329,329,4656b7f6.jpg ', ' title ': ' Panjiayuan ten-Li River Metro Tenth # 14th near the international trade ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/                                  00,800,533/6,0,83,2043,1800,1200,cc659348.jpg '} {' name ': ' Alicejy ', ' address ': ' Nan Zhong Yuan, Wangjing, Chaoyang District, Beijing \ \ n ', ' price ': ' 195 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/6,0,3,3065,160,160,ba886bf8.jpg ', ' title ': ' Wangjing pro-water Mini-house, exclusive luxury to the East Bay ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/6,0,62,3156,1800,1200,7d5aa8bc.jpg ' {' name ': ' Small Shosho ', ' address ': ' South Percussion Lane, Dongcheng District, Beijing \ n ', ' Price ': ' 158 ', ' owner_pic ': ' Http://image. Xiaozhustatic1.com/21/5,0,55,1517,260,260,ea96ce11.jpg ', ' title ': 'South Gongs and Drums Lane 0 distance, Drum Tower, Houhai, Imperial Palace, Guijie Street ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/ 1,0,53,2309,825,550,5a3a9a34.jpg '} {' name ': ' Leopard nan ', ' address ': ' The River Bay, Sorghum Bridge Oblique Street, Xicheng District, Beijing \ n ', ' Price ': ' 398 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/1,0,8,5386,333,333,764cfdb4.jpg ', ' title ': ' Xizhimen River Bay Korean Pastoral Warm 2 home ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/1,0,3,5482,825,550,f874984d.jpg '} {' name ' : ' Happy sister ', ' address ': ' Red Barracks South Road, Chaoyang District, Beijing \ n ', ' Price ': ' 128 ', ' owner_pic ': ' Http://image.xiaozhu Static1.com/21/1,0,93,4699,375,375,f8bc8f9b.jpg ', ' title ': ' North Five ring No. No. 5.13 subway upscale warm apartment ', ' sex ': ' Female ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/1,0,42,4845,825,550,57c0343f.jpg '} {' name ': ' Rolling_meng ', ' Address ': ' Song Hua Teng yuan, Chaoyang District, Beijing \ n ', ' Price ': ' 318 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/5,0,96, 1555,375,375,ea000c2a.jpg ', ' title ': ' Line 10 line pine/double well/Guomao CBD daylighting Comfort freshman ', ' sex ': ' Male ', ' pic ': ' Http://image.xiaozhustatiC1.com/00,800,533/6,0,40,333,1800,1200,c8e516a7.jpg ' {' name ': ' Kiss Girl's Home ', ' address ': ' Ten Li Bao, Chaoyang District, Beijing, run maple Ka shang ', ' price ': ' 338 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/1,0,33,4216,366,366,c589a192.jpg ', ' title ':

' Line Line 6 metro apartment ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/6,0,60,2088,1800,1200,3b057da1.jpg '}
 .....
SummaryCommon syntax
List = [variable for variable in range (start number, end number)]
urls = ["http://bj.xiaozhu.com/search-duanzufang-p{}-0/". Format (Numbers) for Number in range (1, 10)]
It is also a matter of thinking, the connection between functions and functions. The relationship is repeatedly clear, programming is simple address this variable in fact, it is best to remove the back of \ n, but for the time being did not think out how to get

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.