The data interface of Caffe mainly has original image (ImageData), HDF5, Lmdb/leveldb. Since the Caffe Lmdb interface only supports but label, for multiple label tasks, it is often necessary to use HDF5.
However, Caffe for HDF5 data, the entire H5 file needs to be read in advance, which is not a problem for small data, and it saves the IO overhead of training in a single read memory. However, for the large amount of data, memory may not fit the entire H5 file, you need to divide into a few small H5 files. This can be achieved on the one hand not elegant, on the other hand training needs to be kept in turn to read H5 files. One possible solution is to put the image data into the Lmdb,label data into the H5 file, Prototxt inside the label and data from two data layer respectively. But a person thinks such realization also is not good to see, after all, the code inside wants to do HDF5 and Lmdb storage.
A more direct approach has recently been seen on the web, combining Python's Lmdb library and Caffe's Python interface caffe.io.array_to_datum to store image data and labels in two lmdb files, respectively. And for the storage of good lmdb, and how to write prototxt inside the DataLayer to read it. The current Caffe DataLayer, indicating Lmdb as backend, the default first top is to store lmdb when the datum of data, the second top is the Datum label, in the following code does not specify the Datum label, so , for data and label Lmdb, write a datalayer, each datalayer the first top is the corresponding Lmdb content. The top blob's name can be defined by itself.
The code is as follows:
def write_lmdb (image_name_list,label_array,lmdb_img_name,lmdb_label_name,resize_image = False): For lmdb_name in [LMD
B_img_name, Lmdb_label_name]: Db_path = Os.path.abspath (lmdb_name) if Os.path.exists (Db_path): Shutil.rmtree (db_path) counter_img = 0 Counter_label = 0 Batchsz = fail_cnt = 0 print ("Processing {:d} images and labels ... ". Format (len (image_name_list)) for I in xrange (int (np.ceil) image_name_list (/float SZ)): Image_name_batch = image_name_list[batchsz*i:batchsz* (i+1)] Label_batch = Label_array[batchsz*i:ba tchsz* (I+1),:] Print Label_batch[np.newaxis,np.newaxis,0].dtype raw_input (' R ') IMGs, labels = [],
[] for Idx,image_name in Enumerate (image_name_batch): img = Skimage.io.imread (image_name) If resize_image==true:img = Skimage.transform.resize (img, (96,96)) imgs.append (img) Db_imgs =Lmdb.open (Lmdb_img_name, Map_size=1e12) with Db_imgs.begin (write=true) as Txn_img:for img in IMGs: Datum = Caffe.io.array_to_datum (Np.expand_dims (IMG, axis=0)) Txn_img.put ("{: 0>10d}". forma T (counter_img), Datum. Serializetostring ()) counter_img + = 1 Print ("Processed {:d} images". Format (counter_img)) d B_labels = Lmdb.open (Lmdb_label_name, Map_size=1e12) with Db_labels.begin (write=true) as Txn_label:fo R idx in range (Label_batch.shape[0]): Datum = Caffe.io.array_to_datum (label_batch[np.newaxis,np.newaxis,id X]) Txn_label.put ("{: 0>10d}". Format (Counter_label), Datum. Serializetostring ()) Counter_label + = 1 Print ("Processed {:d} labels". Format (Counter_label)) p Rint fail_cnt, ' images fail reading ' Db_imgs.close () db_labels.close ()