Download files and Images
Scrapy provides a reusableitem pipelines
. Thesepipeline
Some common methods and structures are calledmedia pipeline
. Generally, you will useFiles Pipline
OrImages Pipeline
.
Why use
scrapy
Built-in File Download Method
- Avoid downloading data that has been downloaded recently.
- You can easily specify the file storage path.
- You can convert the downloaded image to a common format. For example, PNG or JPG.
- You can easily produce thumbnails.
- The image width and height can be easily detected to ensure that they meet the minimum limit.
- Asynchronous download with high efficiency.
Download file
File Pipeline
:
When usingFiles Pipline
To download an object, follow these steps:
- Define one
Item
And thenitem
Define two properties, respectivelyfile_urls
Andfiles
.file_urls
Is used to store the URL link of the file to be downloaded, and a list is required.
- After the file is downloaded, the downloaded information is stored in
item
Offiles
Attribute. Such as the download path, download URL, and file verification code.
- In the configuration file
settings.py
ConfiguringFILES_STORE
This configuration is used to set the File Download path.
- Start
pipeline
: InITEM_PIPELINES
Set inscrapy.pipelines.files.FilesPipline:1
.
Download Image
Images Pipline
:
When usingImages Pipeline
To download an object, follow these steps:
- Define one
Item
And thenitem
Define two properties, respectivelyimage_urls
Andimages
.image_urls
Is used to store the URL link of the file to be downloaded, and a list is required.
- After the file is downloaded, the downloaded information is stored in
item
Ofimages
Attribute. Such as the download path, download URL, and file verification code.
- In the configuration file
settings.py
ConfiguringIMAGES_STORE
This configuration is used to set the download path of the image.
- Start
pipeline
: InITEM_PIPELINES
Set inscrapy.pipelines.images.ImagesPipline:1
Download files and Images