You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of numerical data. It is an open-source library and file format for storing large amounts of numerical data, originally developed at NCSA.\n",
"\n",
"In python you can use the **h5py** package to edit the HDF5 file. For installation issue, please consult [here](http://docs.h5py.org/en/2.3/build.html). If you are new to python, you can easily install the [Anaconda](http://continuum.io/downloads) and it will contains this package and many more commonly used packages.\n",
"\n",
"\n",
"The HDF5 file is just like a file system that stores data. It has only two kinds of objects, the **group** and the **dataset**. The group is just like the folders in a file system, while the dataset is used to store different types of data, like the NumPy array. \n",
"\n",
"The data set are saved in the HDF5 file in a way that is similar to the regular file system: `/Folder/SubFolder/DataName`.\n",
"\n",
"## 2. HDF5 in Python\n",
"\n",
"Let us assume that we have already installed h5py on your computer. We will see how to work with the h5py module. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import h5py"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could create a HDF5 file object by using the `h5py.File()` function. We could specify the mode as \"r\"(read) or \"w\"(write). By default, it is \"a\"(read and write)."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myfile = h5py.File(\"ex1.hdf5\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.1 Creating groups\n",
"\n",
"Now, we only create an empty HDF5 file `myfile`. We need to add some elements in it. For example, we could use the `myfile.create_group()` function to create a new group(or \"folder\"). "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myfile.create_group(\"grp1\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"<HDF5 group \"/grp1\" (0 members)>"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also create a group by setting it equals to a variable."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"group2=myfile.create_group(\"grp2\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For a group object, you could use `keys()` function to get the object(s) name in it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"myfile.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"[u'grp1', u'grp2']"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Moreover, we could create a subgroup by using the same function for `group2`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1=group2.create_group(\"subgroup1\")\n",
"group2.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"[u'subgroup1']"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2.2 Creating data\n",
"\n",
"Now, it is time to make some data in the group. We could create just like a dictionary in python. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1[\"data1\"]=np.arange(0,10)\n",
"s1[\"data1\"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"<HDF5 dataset \"data1\": shape (10,), type \"<i4\">"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data created can be viewed with the `.value`. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1[\"data1\"].value"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the data object can be used in calculation directly. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"np.sum(s1[\"data1\"])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"45"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1[\"data1\"][2]==2"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"True"
]
}
],
"prompt_number": 10
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, we could use the `create_dataset()` fucntion to create a new data set. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1.create_dataset(\"data2\",(3,5),np.int)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"<HDF5 dataset \"data2\": shape (3, 5), type \"<i4\">"
"You could use the `del` key word to delete a group."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"s1.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 14,
"text": [
"[u'data1', u'data2', u'data3']"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"del s1[\"data3\"]\n",
"s1.keys()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"[u'data1', u'data2']"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Save as CSV file\n",
"\n",
"If you want to save the data set in the HDF5 file as the csv file, you could use the **csv** package in python. For example, we create a 5 by 5 matrix under `s1`. And then, we could use the `csv.writer()` and `.writerows()` to edit the csv file. "