Python執行個體---擷取酷狗音樂Top100

來源:互聯網
上載者:User

標籤:ref   gecko   temp   ongl   soup   尋找   nbsp   image   sel   

擷取酷狗TOP 100

http://www.kugou.com/yy/rank/home/1-8888.html

排名

檔案&&歌手

時間長度

效果:

附源碼:

import timeimport jsonfrom bs4 import BeautifulSoupimport requestsclass Kugou(object):    def __init__(self):        self.header = {            "User-Agent": ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0‘        }    def getInfo(self, url):        html = requests.get(url, headers=self.header)        soup = BeautifulSoup(html.text, ‘html.parser‘)        # print(soup.prettify())        ranks = soup.select(‘.pc_temp_num‘)        titles = soup.select(‘.pc_temp_songlist > ul > li > a‘)  # 層層標籤尋找        times = soup.select(‘.pc_temp_time‘)        for rank, title, songTime in zip(ranks, titles, times):            data = {                # rank 全列印就是帶HTML標籤的                ‘rank‘: rank.get_text().strip(),                ‘title‘: title.get_text().split(‘-‘)[1].strip(),                ‘singer‘: title.get_text().split(‘-‘)[0].strip(),                ‘songTime‘: songTime.get_text().strip()            }            s = str(data)            print(‘rank:%2s\t‘ % data[‘rank‘], ‘title:%2s\t‘ % data[‘title‘], ‘singer:%2s\t‘ %data[‘singer‘], ‘songTime:%2s\t‘ % data[‘songTime‘])            with open(‘hhh.txt‘, ‘a‘, encoding=‘utf8‘) as f:               f.writelines(s + ‘\n‘)if __name__ == ‘__main__‘:    urls = [        ‘http://www.kugou.com/yy/rank/home/{}-8888.html‘.format(str(i)) for i in range(30)    ]    kugou = Kugou()    for url in urls:        kugou.getInfo(url)        time.sleep(1)

 

部分代碼解析

--------------------------------------------------------------------
urls = [‘http://www.kugou.com/yy/rank/home/{}-8888.html‘.format(str(i)) for i in range(1, 5)]
for i in urls:
    print(i)

結果列印:
http://www.kugou.com/yy/rank/home/1-8888.html
http://www.kugou.com/yy/rank/home/2-8888.html
http://www.kugou.com/yy/rank/home/3-8888.html
http://www.kugou.com/yy/rank/home/4-8888.html
--------------------------------------------------------------------
for rank, title, songTime in zip(ranks, titles, times):
    data = {
        # rank 全列印就是帶HTML標籤的
        ‘rank‘: rank.get_text().strip(),
        ‘title‘: title.get_text().split(‘-‘)[0].strip(),
        ‘singer‘: title.get_text().split(‘-‘)[1].strip(),
        ‘songTime‘: songTime.get_text()
    }
    print(data[‘rank‘])
    print(data[‘title‘])
    print(data[‘singer‘])
    print(data[‘songTime‘])

結果列印:
    1
    飛馳於你
    許嵩
    4: 04
--------------------------------------------------------------------  
for rank, title, songTime in zip(ranks, titles, times):
data = {
# rank 全列印就是帶HTML標籤的
‘rank‘: rank,
‘title‘: title,
‘songTime‘: songTime
}
print(data[‘rank‘])
print(data[‘title‘])
print(data[‘songTime‘])
結果列印:
<span class="pc_temp_num">
<strong>1</strong>
</span>
<a class="pc_temp_songname" data-active="playDwn" data-index="0" hidefocus="true" href="http://www.kugou.com/song/pjn5xaa.html" title="許嵩 - 飛馳於你">許嵩 - 飛馳於你</a>
<span class="pc_temp_time">4:04 </span>

Python執行個體---擷取酷狗音樂Top100

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.