Using Python to implement the video download function instance code

Source: Internet
Author: User
Short video business in the last two years, each video site has its own characteristics of short video content. If there is a program can be the major video site popular users of the latest release of the video download, not only convenient to watch, but also can not copyright video posted on the personal social networking site, to increase their popularity, how good AH

Short video business in the last two years, each video site has its own characteristics of short video content. If there is such a program, the major video sites can be the most popular users of the latest release of the video downloaded, not only for the convenience of their own viewing, but also can not copyright video posted on personal social networking sites, increase their popularity, not beautiful?

Parker is such a project (Project address: Https://github.com/LiuRoy/parker), which uses the celery framework to periodically crawl the user video list, the latest released video through you-get asynchronous download, Distributed deployment is easy to implement. Because the page layout and interface update of each website are more frequent, in order to ensure the high availability of the program, deliberately increased STATSD monitoring, easy to find errors in time.

Code schema

Currently Parker only implements the B-station and second-shot download, from the frame graph can be seen, for each type of website, need to implement two asynchronous interface: from the user Video home page to resolve the release of video playback address, according to the playback address download video. Therefore, to increase the site type, do not need to modify the original code, only need to add new parsing and download interface. After the completion of the video download after the follow-up, I have not achieved, we can according to their own needs free to achieve.

In the run time, celery will be configured to send a good quality user list timed to the corresponding site of the resolution interface asynchronous execution, filter out the latest broadcast video playback address, to the corresponding download interface asynchronous download, the download is completed and then asynchronously invoke subsequent operations. Therefore, it is necessary to start a celery beat process to send timed tasks, as well as several celery asynchronous tasks to perform parsing and downloading operations, for larger videos, the download will be time consuming, and it is recommended to allocate the number of asynchronous tasks according to how much of the task list is reasonable.

Program run

This program is verified to work properly under Ubuntu and Mac and has not been verified in the Windows environment due to the inability of celery to start properly under local windows.

Dependent Library Installation

Python version 3.5, after entering the project directory, executes:

Pip Install-r requirements.txt

Create a database table

Build two tables in the database in advance (SQL:HTTPS://GITHUB.COM/LIUROY/PARKER/BLOB/MASTER/SPIDER/MODELS/TABLES.SQL)

Parameter configuration

Config path Logging.yaml, Params.yaml, sites.yaml respectively corresponding log configuration, run parameter configuration, popular user configuration.

Log configuration

In debug mode, the log is output directly to the standard output stream, and the log content is output to the file in release mode, so the output log file needs to be configured.

Run Configuration

    • The mode debug debug mode, in which the log points to the standard output, and no monitoring data, release mode, the log output to the development of files, and have monitoring data.

    • The broker_url corresponds to the broker_url of celery and can be configured as Redis or RABBITMQ

    • Mysql_url database address, you need to build two tables in advance.

    • Download_path Video Download path

    • Statsd_address Monitoring Address

    • Video_number_per_page the number of video playback addresses from the user's video home page each time, since most users publish fewer videos at a time, just set them to a small value. In the first run, you won't be downloading a lot of old videos.

    • Download_timeout Video Download time-out

Top User Profiles

Parker will generate a list of celery beat scheduler based on this configuration.

    • The name rule is < site Type >-< task Id>,parker will be based on this as scheduler task names

    • URL user's release video home page

    • celery resolves asynchronous tasks corresponding to a task

    • Minute how many minutes to check the user video list

Start a task

Enter the project directory and execute the following command to start the celery worker


Celery-a Spider Worker


Execute the following command to start the Celery beat timer task


Celery-a Spider Beat


Monitoring

Strong Amway A Docker image, one minute with good monitoring environment there are wood. Then just add the execution success and execution of the exception of the RBI data, you can easily monitor whether the program is working properly.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.