Caffe things really much, the data must be Lmdb or leveldb what to do, if the data is a picture, that with Caffe from the Convert_image.cpp on the line, but if not the picture, you have to write the program. I am not a computer professional, I can understand the source code, and then work hard and Baidu, but there is no very results, so Google, tasted "inside the matter does not decide to ask Baidu, foreign affairs do not decide to ask Google", the ancients sincerity not I bully. In Caffe's Google Group I found this URL: http://deepdish.io/2015/04/28/creating-lmdb-in-python/
The code is as follows:
ImportNumPy as NPImportLmdbImportCaffen= 1000#Let's pretend this is interesting dataX = Np.zeros (N, 3, +, +), dtype=np.uint8) y= Np.zeros (N, dtype=Np.int64)#We need to prepare the database for the size. We ' ll set it#greater than what we theoretically need. There is little drawback to#setting this too big. If you still run to problem after raising#This , you might want-try saving fewer entries in a single#transaction.Map_size = x.nbytes * 10Env= Lmdb.open ('Mylmdb', map_size=map_size) with Env.begin (write=True) as Txn:#Txn is a Transaction object forIinchRange (N): Datum=CAFFE.PROTO.CAFFE_PB2. Datum () Datum.channels= X.shape[1] Datum.height= X.shape[2] Datum.width= X.shape[3] Datum.data= X[i].tobytes ()#or. ToString () If NumPy < 1.9Datum.label =Int (y[i]) str_id='{:-}'. Format (i)#The encode is a essential in Python 3Txn.put (Str_id.encode ('ASCII'), Datum. Serializetostring ())
This is the code that uses Python to convert the data to Lmdb, but I use this to finish the data and then uses Caffe will have the Std::bad_alloc error, later after hard struggle, consulted a lot of data, I found the problem:
The data format for 1.caffe is four-dimensional by default (n_samples, n_channels, height, width)
. So I have to put my data into this format
2. Last line Txn.put (Str_id.encode ('ASCII'), Datum. Serializetostring ()) must add, I start one dimensional python2 do not write this, the result is always wrong, later found this line must write!
3. If mdb_put: MDB_MAP_FULL: Environment mapsize limit reached
the error occurs because Lmdb default map_size is small, I lmdb/cffi.py inside the map_size default value, changed to 1099511627776 (that is, 1Tb), I do not know is not so changed, Then I put the above python program map_size = x.nbytes This sentence changed to Map_size = X.nbytes * 10, and then succeeded!
In the process of looking for information, I also found a Python write leveldb program, the URL here: https://github.com/BVLC/caffe/issues/745 and http://stackoverflow.com/ Questions/32707393/whats-caffes-input-format
The program for writing HDF5 in Python is here: http://stackoverflow.com/questions/31774953/ test-labels-for-regression-caffe-float-not-allowed/31808324#31808324
Reference:
1.http://stackoverflow.com/questions/30983213/how-to-use-1-dim-vector-as-input-for-caffe/30991590#30991590
2. Questions about the map_size size of Lmdb: https://github.com/BVLC/caffe/issues/1298 and http://stackoverflow.com/questions/31820976/ Lmdb-increase-map-size
Caffe use: How to convert one-dimensional data or other non-image data into Lmdb