Previously wrote a python implementation of the Baidu new song list, hot song List Downloader blog, to achieve the Baidu new song, popular songs Crawl and download. However, the use of a single-threaded, network conditions in general, the first 100 songs to scan the time to get about 40 seconds. and using the PYQT interface, in the process of downloading the window operation, there will be UI blocking phenomenon.
The first two days have time to adjust a bit, made several aspects of improvement:
1. Modify the UI interface blocking problem, the download process can do other UI operations;
2. The crawler uses a main thread, 8 sub-threads of the way to quickly crawl, the network is consistent with the situation, the scan 100 songs to increase the time to 8, 9 seconds or so; (local download speed around 300K)
3. The method of parsing the webpage changed from the previous htmlparser to the present beautifulsoup;
To run this feature you need to install PYQT, BeautifulSoup. Before running, you need to configure your Baidu account and password in the settings.py file.
Username = "Your Baidu acount" #配置你的百度账号password = "Your Baidu password" #配置你的百度密码
Once you have configured your account and password, simply double-click the spiderman.py file to run.
Run the process
1. First, the spiderman.py enters the main program and starts running.
2. The main program will give control to the dispatcher scheduler, the scheduler first login Baidu.
3. If the login is successful, the scheduler opens 8 sub-threads, by which 8 sub-threads crawl Baidu song list or Baidu hot song list of songs link, analysis link, get real, and will, song name, singer information written to a text file.
4. When the child thread finishes executing, the main program reads the generated text file from the previous step and loads the UI form.
The whole process is as follows:
The effect after normal operation is as follows:
GitHub is as follows: