Use Python PIL and cPickle to read and save image databases

Source: Internet
Author: User
Tags image processing library

Use Python PIL and cPickle to read and save image databases

 

Computer Vision and machine learning tasks often deal with images. Mature OpenCV can be used in C ++ and a Python Image processing Library PIL (Python Image Library ), of course, PIL is not as versatile as OpenCV (for example, some face detection algorithms), but in Python, we can use PIL to read and save some basic images, because of the algorithm aspect, python has many powerful algorithm libraries (sklearn and theano ).

 

This document uses the image database Olivetti Faces as an example to show how to use the PIL module and the cPickle module to read and save the image database as a pkl file.

About the use of the cPickle module, I have mentioned in this article: DeepLearning tutorial (2) machine learning algorithm stores parameters during training. The following will not be repeated.

 

1. Face Image Library Olivetti Faces introduction Olivetti Faces is a relatively small face database of New York University. It consists of 400 images of 40 people, that is, 10 Faces of each person. The gray level of each image is 8 bits, the gray level of each pixel is between 0 and 25 5, and the size of each image is 64 × 64. For example, the image size is 1190*942, and a total of 20*20 faces, so the size of each face is (1190/20) * (942/20), that is, 57*47 = 2679:



Ii. Use Python PIL and cPickle to read and save Olivetti Faces and use PIL first. image. array type, and each image is spread into a one-dimensional Vector 1*2679. Because there are 400 images, 400*2679 numpy is obtained. array, and then use the cPickle module to convert the numpy. array is saved as a pkl file. Note that this is data without labels. We can manually classify them as 0 ~ 39. Each category has 10 samples. Therefore, a 400*1 label is created as the category of each image.
The Code is as follows:
# Read the face library olivettifaces and store it as a pkl File import numpyfrom PIL import Imageimport cPickle # Read the original image and convert it to numpy. ndarray. Set the grayscale value from 0 ~ 256 to 0 ~ 1img = Image. open ('/home/wepon/olivettifaces.gif') img_ndarray = numpy. asarray (img, dtype = 'float64')/256 # The image size is 1190x942. A total of 20*20 Faces are displayed. Therefore, the size of each face image is (1190/20) * (942/20) means 57*47 = 2679 # store all 400 samples as an array of 400*2679. Each row represents a human face image, and 0th ~ 9. 10 ~ 19, 20 ~ 29... the row belongs to the same face. # In addition, The olivettifaces_label is used to represent the category of each sample. It is a 400-dimensional vector with 0 ~ 39: a total of 40 categories, representing 40 different people. Olivettifaces = numpy. empty () for row in range (20): for column in range (20): olivettifaces [row * 20 + column] = numpy. ndarray. flatten (img_ndarray [row * 57 :( row + 1) * 57, column * 47 :( column + 1) * 47]) # create an olivettifaces_labelolivettifaces_label = numpy. empty (400) for label in range (40): olivettifaces_label [label * 10: label * 10 + 10] = labels = numeric (numpy.int) # Save olivettifaces and olivettifaces_label to olivettifaces. pkl file write_file = open ('/home/wepon/olivettifaces. pkl ', 'wb') cPickle. dump (olivettifaces, write_file,-1) cPickle. dump (olivettifaces_label, write_file,-1) write_file.close ()

In this way, an olivettifaces. pkl file is obtained under the/home/wepon/directory. This file stores a 400*2679 vector and a 400*1 vector, representing the sample and sample category.

Read a single image from olivettifaces. pkl:
To view a single image, you must first reshape the vector representing the 2679 dimension of the image, for example, faces [1]. reshape (57,47 ). Call pylab to display the image.
import cPickleimport pylabread_file=open('/home/wepon/olivettifaces.pkl','rb')  faces=cPickle.load(read_file)read_file.close() img1=faces[1].reshape(57,47)pylab.imshow(img)pylab.gray()pylab.show()



How does one use olivettifaces. pkl in machine learning algorithms?
In machine learning algorithms, we generally split samples into training samples, verification samples, test samples, and corresponding labels. How to split it? The Code is as follows:
Read olivettifaces. pkl file, divided into training set (40*8 samples), Validation Set (40*1 sample), Test Set (40*1 sample) import cPickleread_file = open ('/home/wepon/olivettifaces. pkl ', 'rb') faces = cPickle. load (read_file) label = cPickle. load (read_file) read_file.close () train_data = numpy. empty (320,2679) train_label = numpy. empty (320) valid_data = numpy. empty (40,2679) valid_label = numpy. empty (40) test_data = numpy. empty (40,2679) test_label = numpy. empty (40) for I in range (40): train_data [I * 8: I * 8 + 8] = faces [I * 10: I * 10 + 8] train_label [I * 8: I * 8 + 8] = label [I * 10: I * 10 + 8] valid_data [I] = faces [I * 10 + 8] valid_label [I] = label [I * 10 + 8] test_data [I] = faces [I * 10 + 9] test_label [I] = label [I * 10 + 9]



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.