H5py Quick Start Guide

Source: Internet
Author: User
Tags enthought canopy

H5py is a module used by the Python language to manipulate HDF5. The following article focuses on H5py's quick Start Guide, translated from H5py's official documentation: Http://docs.h5py.org/en/latest/quick.html. This translation is only for the purpose of individual learning h5py, if there is improper translation, please contact the author or provide the correct translation, thank you very much!

Installation

Use Anaconda or Miniconda:

conda install h5py

With Enthought Canopy, you can use the GUI installation package to install or

enpkg h5py

Installation. For installation with Pip or setup.py, refer to the installation method.

Core Concepts

A HDF5 file is a container for storing two classes of objects: Datasets, an array-like collection of data, groups, a folder-like container that can store datasets and other groups. When using H5py, the most basic criteria are:

Groups is similar to a dictionary (dictionaries), and a dataset is similar to an array in numpy (arrays).

Suppose someone sent you a HDF5 file, Mytestfile.hdf5 (how to create this file, please refer to: Appendix: Creating a File). The first thing you need to do is open this file to read the data:

>>> import h5py>>> f = h5py.File(‘mytestfile.hdf5‘, ‘r‘)

This file object is your starting point. So what's stored in this file? Remember, h5py. File is like a Python dictionary, so we can look at these key values,

>>> list(f.keys())[‘mydataset‘]

According to our observation, there is a dataset in this file, namely myDataSet. Let's examine this dataset as a DataSet object.

>>> dset = f[‘mydataset‘]

The object we get is not an array, but a HDF5 dataset. Just like the data in NumPy, Datasets has shape and data type

>>> dset.shape(100,)>>> dset.dtypedtype(‘int32‘)

They also support array-style slicing operations. Here is how you can complete the read and write method of a dataset in this file

>>> dset[...] = np.arange(100)>>> dset[0]0>>> dset[10]10>>> dset[0:100:10]array([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

For more information, go to file objects and datasets.

Appendix: Creating a File

At this point, you may wonder how Mytestdata.hdf5 was created. When the file object is initialized, we create a file by setting the mode to W. Other modes (mode) are a (for reading, writing, new) and r+ (for reading and writing). A full file schema and a list of their meanings refer to the file object.

>>> import h5py>>> import numpy as np>>> f = h5py.File("mytestfile.hdf5", "w")

The file object has several interesting ways to look at. One is Create_dataset, as the name implies, is to create a dataset by the given shape and data type

>>> dset = f.create_dataset("mydataset", (100,), dtype=‘i‘)

The file object is the context manager, so the following code can also run

>>> import h5py>>> import numpy as np>>> with h5py.File("mytestfile.hdf5", "w") as f:>>>     dset = f.create_dataset("mydataset", (100,), dtype=‘i‘)
Groups and hierarchical structure

"HDF" is the abbreviation for "Hierarchical Data Format". Each object in the HDF5 file has a name (name), which is stored in a POSIX-style hierarchical structure separated by/delimiter

>>> dset.nameu‘/mydataset‘

In this system the "folder" (folders) is named groups. The file object we create is itself a group, in this case the root group (root group), with the name/:

>>> f.nameu‘/‘

Creating a sub Group (subgroup) can be done with a clever command create_group. However, we first need to open the file in read/write mode

>>> f = h5py.File(‘mydataset.hdf5‘, ‘r+‘)>>> grp = f.create_group("subgroup")

All group objects, like the file object, also have the Create_* method:

>>> dset2 = grp.create_dataset("another_dataset", (50,), dtype=‘f‘)>>> dset2.nameu‘/subgroup/another_dataset‘

By the way, you don't need to create all the intermediate groups manually. Specifying a complete path is also possible

>>> dset3 = f.create_dataset(‘subgroup2/dataset_three‘, (10,), dtype=‘i‘)>>> dset3.nameu‘/subgroup2/dataset_three‘

Groups supports most of the Python dictionary-style interfaces. You can use the syntax of the entry get (Item-retrieval) to get the object in this file:

>>> dataset_three = f[‘subgroup2/dataset_three‘]

Iterating over a group will produce the names of its members:

>>> for name in f:...     print namemydatasetsubgroupsubgroup2

Member relationship detection can also be done by using a name:

>>> "mydataset" in fTrue>>> "somethingelse" in fFalse

You can even use the name of the full path:

>>> "subgroup/another_dataset" in fTrue

It also has the methods you are familiar with keys (), values (), items () and ITER (), and The Get () method.

Because iterating a group only produces its immediate members, so to iterate over a complete file, you can use the Group method visit () and Visititems (), which are implemented by a call (callable):

>>> def printname(name):...     print name>>> f.visit(printname)mydatasetsubgroupsubgroup/another_datasetsubgroup2subgroup2/dataset_three

For more information, go to groups.

Property

One of the best features of HDF5 is that you can store meta data (metadata) after the data you describe. All groups and datasets support the subordinate naming of several data bits, called attributes. (All groups and datasets support attached named bits of data called attributes.)

Properties can be obtained by attrs this proxy object, which again executes the Dictionary interface:

>>> dset.attrs[‘temperature‘] = 99.5>>> dset.attrs[‘temperature‘]99.5>>> ‘temperature‘ in dset.attrsTrue

For more information, go to attributes.

H5py Quick Start Guide

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.