Python Read mnist
That's actually how Python reads binnary file.
The structure of the mnist is as follows, select Train-images
TRAINING SET IMAGE FILE (train-images-idx3-ubyte):
[ offset] [type] [value] [description]
0000-bit integer 0x00000803 (2051) Magic number
0004-bit integer 60000 number of images
0008-bit integer number of rows
0012-bit integer number of columns
0016 unsigned byte?? Pixel
0017 unsigned byte?? Pixel
........
xxxx unsigned byte?? Pixel
That is, we were going to read the 4-bit integer
Tried a lot of ways, feel the most convenient, at least for me or use
Struct.unpack_from ()
filename =‘train-images.idx3-ubyte‘binfile = open(filename , ‘rb‘)buf =binfile.read() |
Read the files in binary mode first
index =0magic, numImages , numRows , numColumns = struct.unpack_from(‘>IIII‘ , buf , index)index +=struct.calcsize(‘>IIII‘) |
Then use the Struc.unpack_from
' >IIII ' is said to use the big-endian method to read 4 unsinged int32
Then read a picture to test whether the read was successful
im =struct.unpack_from(‘>784B‘,buf, index)index +=struct.calcsize(‘>784B‘)im =np.array(im)im =im.reshape(28,28)fig =plt.figure()plotwindow =fig.add_subplot(111)plt.imshow(im , cmap=‘gray‘)plt.show() |
' >784b ' means reading 784 unsigned byte with the big-endian method.
The complete code is as follows
importnumpy as npimportstructimportmatplotlib.pyplot as pltfilename =‘train-images.idx3-ubyte‘binfile =open(filename , ‘rb‘)buf =binfile.read()index =0magic, numImages , numRows , numColumns =struct.unpack_from(‘>IIII‘, buf , index)index += struct.calcsize(‘>IIII‘)im =struct.unpack_from(‘>784B‘,buf, index)index +=struct.calcsize(‘>784B‘)im = np.array(im)im =im.reshape(28,28)fig =plt.figure()plotwindow =fig.add_subplot(111)plt.imshow(im , cmap=‘gray‘)plt.show() |
Just to test whether it was successful, so I read only one picture.
Bright should be read right ha ...
Python Read mnist