H5py is a module used by the Python language to manipulate HDF5. The following article focuses on H5py's quick Start Guide, translated from H5py's official documentation: Http://docs.h5py.org/en/latest/quick.html. This translation is only for the purpose of individual learning h5py, if there is improper translation, please contact the author or provide the correct translation, thank you very much!
Installation
Use Anaconda or Miniconda:
conda install h5py
With Enthought Canopy, you can use the GUI installation package to install or
enpkg h5py
Installation. For installation with Pip or setup.py, refer to the installation method.
Core Concepts
A HDF5 file is a container for storing two classes of objects: Datasets, an array-like collection of data, groups, a folder-like container that can store datasets and other groups. When using H5py, the most basic criteria are:
Groups is similar to a dictionary (dictionaries), and a dataset is similar to an array in numpy (arrays).
Suppose someone sent you a HDF5 file, Mytestfile.hdf5 (how to create this file, please refer to: Appendix: Creating a File). The first thing you need to do is open this file to read the data:
>>> import h5py>>> f = h5py.File(‘mytestfile.hdf5‘, ‘r‘)
This file object is your starting point. So what's stored in this file? Remember, h5py. File is like a Python dictionary, so we can look at these key values,
>>> list(f.keys())[‘mydataset‘]
According to our observation, there is a dataset in this file, namely myDataSet. Let's examine this dataset as a DataSet object.
>>> dset = f[‘mydataset‘]
The object we get is not an array, but a HDF5 dataset. Just like the data in NumPy, Datasets has shape and data type
>>> dset.shape(100,)>>> dset.dtypedtype(‘int32‘)
They also support array-style slicing operations. Here is how you can complete the read and write method of a dataset in this file
>>> dset[...] = np.arange(100)>>> dset[0]0>>> dset[10]10>>> dset[0:100:10]array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
For more information, go to file objects and datasets.
Appendix: Creating a File
At this point, you may wonder how Mytestdata.hdf5 was created. When the file object is initialized, we create a file by setting the mode to W. Other modes (mode) are a (for reading, writing, new) and r+ (for reading and writing). A full file schema and a list of their meanings refer to the file object.
>>> import h5py>>> import numpy as np>>> f = h5py.File("mytestfile.hdf5", "w")
The file object has several interesting ways to look at. One is Create_dataset, as the name implies, is to create a dataset by the given shape and data type
>>> dset = f.create_dataset("mydataset", (100,), dtype=‘i‘)
The file object is the context manager, so the following code can also run
>>> import h5py>>> import numpy as np>>> with h5py.File("mytestfile.hdf5", "w") as f:>>> dset = f.create_dataset("mydataset", (100,), dtype=‘i‘)
Groups and hierarchical structure
"HDF" is the abbreviation for "Hierarchical Data Format". Each object in the HDF5 file has a name (name), which is stored in a POSIX-style hierarchical structure separated by/delimiter
>>> dset.nameu‘/mydataset‘
In this system the "folder" (folders) is named groups. The file object we create is itself a group, in this case the root group (root group), with the name/:
>>> f.nameu‘/‘
Creating a sub Group (subgroup) can be done with a clever command create_group. However, we first need to open the file in read/write mode
>>> f = h5py.File(‘mydataset.hdf5‘, ‘r+‘)>>> grp = f.create_group("subgroup")
All group objects, like the file object, also have the Create_* method:
>>> dset2 = grp.create_dataset("another_dataset", (50,), dtype=‘f‘)>>> dset2.nameu‘/subgroup/another_dataset‘
By the way, you don't need to create all the intermediate groups manually. Specifying a complete path is also possible
>>> dset3 = f.create_dataset(‘subgroup2/dataset_three‘, (10,), dtype=‘i‘)>>> dset3.nameu‘/subgroup2/dataset_three‘
Groups supports most of the Python dictionary-style interfaces. You can use the syntax of the entry get (Item-retrieval) to get the object in this file:
>>> dataset_three = f[‘subgroup2/dataset_three‘]
Iterating over a group will produce the names of its members:
>>> for name in f:... print namemydatasetsubgroupsubgroup2
Member relationship detection can also be done by using a name:
>>> "mydataset" in fTrue>>> "somethingelse" in fFalse
You can even use the name of the full path:
>>> "subgroup/another_dataset" in fTrue
It also has the methods you are familiar with keys (), values (), items () and ITER (), and The Get () method.
Because iterating a group only produces its immediate members, so to iterate over a complete file, you can use the Group method visit () and Visititems (), which are implemented by a call (callable):
>>> def printname(name):... print name>>> f.visit(printname)mydatasetsubgroupsubgroup/another_datasetsubgroup2subgroup2/dataset_three
For more information, go to groups.
Property
One of the best features of HDF5 is that you can store meta data (metadata) after the data you describe. All groups and datasets support the subordinate naming of several data bits, called attributes. (All groups and datasets support attached named bits of data called attributes.)
Properties can be obtained by attrs this proxy object, which again executes the Dictionary interface:
>>> dset.attrs[‘temperature‘] = 99.5>>> dset.attrs[‘temperature‘]99.5>>> ‘temperature‘ in dset.attrsTrue
For more information, go to attributes.
H5py Quick Start Guide