Parse HDF files using Python
Some time ago, a file in HDF format needs to be parsed due to a business requirement. Before that, I do not know what HDF files are. Baidu encyclopedia's explanation is as follows:
HDF is a self-describing and multi-object file format used to store and distribute scientific data. HDF was created by NCSA (National Center for Supercomputing Application, A new data format that can efficiently store and distribute scientific data to meet the research needs of various fields. HDF can represent many necessary conditions for scientific data storage and distribution.
Third-party packages will certainly be used for parsing using Python, as shown below:
import mathimport pandas as pdimport xlwt
The first is used for mathematical computation.The math package mainly deals with mathematical operations
. Aboutpandas
For more information, click here. The xlwt package is written into the HDF file.
The code for reading HDF files using Python is as follows:
with closing(pd.HDFStore(HDF_FILR_URL)) as store: df = store[date] # index shoule be end -> region -> group df.reset_index(inplace=True) df.set_index(["end", "region", "group"], inplace=True) df.sort_index(inplace=True)
In fact, after obtaining the datapandas
Function provided to obtain the data you need.
slice_df = df.loc[dt] rtt = slice_df.rtt.unstack(level=0) / 1000 cwnd = slice_df.cwnd.unstack(level=0) total = slice_df.total.unstack(level=0) rows = rtt.index.tolist() columns = rtt.columns.tolist()
Finally, write the Excel file. The Code is as follows:
Def writexcel (listname, name, time): # write data into Excel saveurl = EXCEL_FILR_URL + 's_s_%s_%s.xls '% (AVG_RTT, time, name) excel_file = xlwt. workbook () table = excel_file.add_sheet ('tcpinfo') index_row = 0 for item in listname: for item_key, item_value in item. items (): table. write (index_row, 0, str (item_key) table. write (index_row, 1, str (item_value [1] [0]) table. write (index_row, 2, str (item_value [1] [1]) table. write (index_row, 3, str (item_value [0]). decode ('utf-8') index_row + = 1 excel_file.save (saveurl)