information that needs to be crawled
URL: http://bj.xiaozhu.com/
Crawl information:
Crawl 300 listings on the page, including title, address, daily rent, the first listing pictures link, landlord picture link, landlord sex, landlord name
Code
From BS4 import BeautifulSoup import requests # judge sex def get_sex (sex_icon): if Sex_icon = = [' Member_ico ']: R Eturn "Male" if Sex_icon = = [' Member_ico1 ']: return "female" Else:return "not identified" # get URL link def for each page Get_page_ URL (URL): Web_url = requests.get (URL) web_url_soup = BeautifulSoup (Web_url.text, ' lxml ') Page_urls = Web_url_so
Up.select (' #page_list > Ul > li > A ') for page_url in Page_urls:each_url = Page_url.get (' href ') Get_detail_info (Each_url) def get_detail_info (URL): web_data = requests.get (URL) soup = BeautifulSoup (web_dat A.text, ' lxml ') titles = Soup.select (' body > Div.wrap.clearfix.con_bg > Div.con_l > Div.pho_info > H4 > Em ') addresses = Soup.select (' body > Div.wrap.clearfix.con_bg > Div.con_l > Div.pho_info > P > SPAN.PR 5 ') Prices = Soup.select (' #pricePart > div.day_l > Span ') pics1 = Soup.select (' #curBigImage ') owner_pics = Soup.select (' #floatRiGhtbox > Div.js_box.clearfix > Div.member_pic > A > img ') owner_names = Soup.select (' #floatRightBox > D Iv.js_box.clearfix > div.w_240 > H6 > A ') sexes = Soup.select (' #floatRightBox > Div.js_box.clearfix > D Iv.member_pic > div ') for title, address, Price, Pic1, owner_name, owner_pic, sex in Zip (titles, addresses, prices, PICS1, Owner_names, Owner_pics, sexes): data =
{' title ': Title.get_text (), ' Address ': Address.get_text (), ' Price ': Price.get_text (), ' Pic ': pic1.get (' src '), ' owner_pic ': owner_pic.get (' src '), ' name ': Owner_name.get (' Ti Tle '), ' sex ': Get_sex (Sex.get (' class ')} print (data) URL = ["http://bj.xiaozhu.com/search- duanzufang-p{}-0/". Format (number) for number in range (1)] for URL in urls:get_page_url (URL)
Results
{' name ': ' Want ', ' address ': ' On the Wangjing West Garden, Chaoyang District, Beijing \ n ', ' price ': ' 395 ', ' owner_pic ': ' http://image.x Iaozhustatic1.com/21/5,0,44,1477,329,329,ea609ac8.jpg ', ' title ': ' wangjing CLS Line 14 line exquisite luxurious freshman ', ' sex ': ' not identified ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/6,0,39,2965,1800,1200,f17d1a3e.jpg '} {' name ': ' Warm yang Yang sunny ', ' address ': ' Rainbow Road, Chaoyang District, Beijing \ n ', ' price ': ' 798 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/2,0,86,20 6,375,375,d46c51ef.jpg ', ' title ': ' Close to 798, Wangjing, Jiuxianqiao, boutique junior residence. ', ' sex ': ' not identified ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/3,0,34,2819,1800,1200,e051c333.jpg '} {' name ': ' Little tomatoes ', ' address ': ' Taipingqiao 40th, Liu Li Qiao, Fengtai District, Beijing \ n ', ' Price ': ' 368 ', ' owner_pic ': ' Http://image.xia Ozhustatic1.com/21/6,0,72,1777,260,260,887558a2.jpg ', ' title ': ' near Beijing West Station 3 minutes from Electric Power Hospital ', ' sex ': ' Female ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/6,0,28,4451,1800,1200,e3bb1749.jpg '} {' name ': ' Want ', ' address ': ' Guang Shun bei da Jie Li ze xi yuan, Chaoyang District, Peking City \ n ', ' price ': ' 395 ', ' owner_pic ': ' Http://image.xiaozhustatic1.com/21/5,0,44,1477,329,329,ea609ac8.jpg ', ' title ': ' Wangjing shopping district, adjacent to the subway 5 minutes, sex theme Big Two habitat ', ' http://image.xiaozhustatic1.com/00,800,533/': ' Not identified ', ' pic ': ' 6,0,66,803,1800,1200,38a4c686.jpg '} {' name ': ' The best time to meet you ', ' address ': ' Chaoyang District, Beijing \ n ', ' PR Ice ': ' 218 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/4,0,84,10730,260,260,6d756363.jpg ', ' title ': ' Hui Xin West Street South Mouth Sunshine Big master bedroom ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/5,0,47,2122,1800,1200,8830e613.jpg '} { ' Name ': ' Warm yang Yang sunny ', ' address ': ' The Rainbow Road, Chaoyang District, Beijing \ n ', ' Price ': ' 268 ', ' owner_pic ': ' Http://image Xiaozhustatic1.com/21/2,0,86,206,375,375,d46c51ef.jpg ', ' title ': ' Independent bathroom 798, Wangjing, Jiuxianqiao more preferential listings. ', ' sex ': ' not identified ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/2,0,71,458,1800,1200,a9c5ea82.jpg '} {' name ': ' Sun-yan ', ' address ': ' North of Apple community, Chaoyang District, Beijing \ n ', ' price ': ' 398 ', ' owner_pic ': ' Http://imaGe.xiaozhustatic1.com/21/5,0,59,2841,363,363,8b6cf3d7.jpg ', ' title ': ' International trade double well line Line 10 Apple serviced Apartment ', ' sex ': ' Female ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/6,0,25,3184,1800,1200,4b993d38.jpg '} {' name ': ' Sister Orchid ', ' address ': ' Shi Li he Zoo Andong, Chaoyang District, Peking City \ n ', ' Price ': ' 279 ', ' owner_pic ': ' HTTP://IMAGE.XIAOZHUSTATIC1.COM/21/4 , 0,2,9806,329,329,4656b7f6.jpg ', ' title ': ' Panjiayuan ten-Li River Metro Tenth # 14th near the international trade ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/ 00,800,533/6,0,83,2043,1800,1200,cc659348.jpg '} {' name ': ' Alicejy ', ' address ': ' Nan Zhong Yuan, Wangjing, Chaoyang District, Beijing \ \ n ', ' price ': ' 195 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/6,0,3,3065,160,160,ba886bf8.jpg ', ' title ': ' Wangjing pro-water Mini-house, exclusive luxury to the East Bay ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/6,0,62,3156,1800,1200,7d5aa8bc.jpg ' {' name ': ' Small Shosho ', ' address ': ' South Percussion Lane, Dongcheng District, Beijing \ n ', ' Price ': ' 158 ', ' owner_pic ': ' Http://image. Xiaozhustatic1.com/21/5,0,55,1517,260,260,ea96ce11.jpg ', ' title ': 'South Gongs and Drums Lane 0 distance, Drum Tower, Houhai, Imperial Palace, Guijie Street ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/ 1,0,53,2309,825,550,5a3a9a34.jpg '} {' name ': ' Leopard nan ', ' address ': ' The River Bay, Sorghum Bridge Oblique Street, Xicheng District, Beijing \ n ', ' Price ': ' 398 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/1,0,8,5386,333,333,764cfdb4.jpg ', ' title ': ' Xizhimen River Bay Korean Pastoral Warm 2 home ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/1,0,3,5482,825,550,f874984d.jpg '} {' name ' : ' Happy sister ', ' address ': ' Red Barracks South Road, Chaoyang District, Beijing \ n ', ' Price ': ' 128 ', ' owner_pic ': ' Http://image.xiaozhu Static1.com/21/1,0,93,4699,375,375,f8bc8f9b.jpg ', ' title ': ' North Five ring No. No. 5.13 subway upscale warm apartment ', ' sex ': ' Female ', ' pic ': ' http:// Image.xiaozhustatic1.com/00,800,533/1,0,42,4845,825,550,57c0343f.jpg '} {' name ': ' Rolling_meng ', ' Address ': ' Song Hua Teng yuan, Chaoyang District, Beijing \ n ', ' Price ': ' 318 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/5,0,96, 1555,375,375,ea000c2a.jpg ', ' title ': ' Line 10 line pine/double well/Guomao CBD daylighting Comfort freshman ', ' sex ': ' Male ', ' pic ': ' Http://image.xiaozhustatiC1.com/00,800,533/6,0,40,333,1800,1200,c8e516a7.jpg ' {' name ': ' Kiss Girl's Home ', ' address ': ' Ten Li Bao, Chaoyang District, Beijing, run maple Ka shang ', ' price ': ' 338 ', ' owner_pic ': ' http://image.xiaozhustatic1.com/21/1,0,33,4216,366,366,c589a192.jpg ', ' title ':
' Line Line 6 metro apartment ', ' sex ': ' Female ', ' pic ': ' http://image.xiaozhustatic1.com/00,800,533/6,0,60,2088,1800,1200,3b057da1.jpg '}
.....
SummaryCommon syntax
List = [variable for variable in range (start number, end number)]
urls = ["http://bj.xiaozhu.com/search-duanzufang-p{}-0/". Format (Numbers) for Number in range (1, 10)]
It is also a matter of thinking, the connection between functions and functions. The relationship is repeatedly clear, programming is simple address this variable in fact, it is best to remove the back of \ n, but for the time being did not think out how to get