Objective
Emmm ... I've been playing with reptiles lately, but it's not so good ... May be I choose the difficulty is a bit high, so many pits ...
Today in the study crawl pixiv on the leaderboard picture, is intended to climb their own attention to all the artist's work ... But still from the beginning of a relatively good, climb pixiv pictures Let me encounter a lot of problems, the first is anti-crawler mechanism, the second is the image format processing ... And so on there are some other minor problems.
Theme
Because of the code's algorithmic limitations, so some of the crawled pictures had an error, looking at a nice picture, a bad picture mixed together very uncomfortable, so wrote a script to do the bulk delete the wrong picture.
The code is as follows:
Import osfor name in range (1,150): with open (' c:\\users\\adimin\\desktop\\pixiv_img\\{}.png '. Format (str (name)), ' RB ') as fp: data = Fp.read () if (len (data) = = 58): # Look at the error picture information, see the error image size is 58byte, occupy space 0KB os.remove (' c:\\ Users\\adimin\\desktop\\pixiv_img\\{}.png '. Format (str (name)) # Remove file else: pass fp.close ()
Code does not consider too much, mainly to solve the problem directly, so you can see the code is very high-targeted, but the basic idea is that, then, the focus is to use the Os.remove system function.
But to delete all the duplicate images, the algorithm like this is definitely not working, the image recognition algorithm must be used ... However, not for the time being, so, learn slowly!
Anyway... Very simple small program, well, to solve their own this problem is quite practical.
Python Learning--Bulk Delete error images