liyi-1989 · July 31, 2014 20:00 · Jul 31, 2014
diff --git a/hdf5 b/hdf5
@@ -0,0 +1,443 @@
+{
+ "metadata": {
+  "name": "",
+  "signature": "sha256:65aa7ff8ea053a7d34e0d7496f676d92274ad6a7602765834e8e441d7a1d2f7b"
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "# Working with HDF5 files in Python\n",
+      "\n",
+      "## 1. Introduction\n",
+      "\n",
+      "Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of numerical data. It is an open-source library and file format for storing large amounts of numerical data, originally developed at NCSA.\n",
+      "\n",
+      "In python you can use the **h5py** package to edit the HDF5 file. For installation issue, please consult [here](http://docs.h5py.org/en/2.3/build.html). If you are new to python, you can easily install the [Anaconda](http://continuum.io/downloads) and it will contains this package and many more commonly used packages.\n",
+      "\n",
+      "\n",
+      "The HDF5 file is just like a file system that stores data. It has only two kinds of objects, the **group** and the **dataset**. The group is just like the folders in a file system, while the dataset is used to store different types of data, like the NumPy array. \n",
+      "\n",
+      "The data set are saved in the HDF5 file in a way that is similar to the regular file system: `/Folder/SubFolder/DataName`.\n",
+      "\n",
+      "## 2. HDF5 in Python\n",
+      "\n",
+      "Let us assume that we have already installed h5py on your computer. We will see how to work with the h5py module. "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "import numpy as np\n",
+      "import h5py"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 1
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "We could create a HDF5 file object by using the `h5py.File()` function. We could specify the mode as \"r\"(read) or \"w\"(write). By default, it is \"a\"(read and write)."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "myfile = h5py.File(\"ex1.hdf5\")"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 2
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### 2.1 Creating groups\n",
+      "\n",
+      "Now, we only create an empty HDF5 file `myfile`. We need to add some elements in it. For example, we could use the `myfile.create_group()` function to create a new group(or \"folder\"). "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "myfile.create_group(\"grp1\")"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 3,
+       "text": [
+        "<HDF5 group \"/grp1\" (0 members)>"
+       ]
+      }
+     ],
+     "prompt_number": 3
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "You can also create a group by setting it equals to a variable."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "group2=myfile.create_group(\"grp2\")"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 4
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "For a group object, you could use `keys()` function to get the object(s) name in it."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "myfile.keys()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 5,
+       "text": [
+        "[u'grp1', u'grp2']"
+       ]
+      }
+     ],
+     "prompt_number": 5
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Moreover, we could create a subgroup by using the same function for `group2`."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1=group2.create_group(\"subgroup1\")\n",
+      "group2.keys()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 6,
+       "text": [
+        "[u'subgroup1']"
+       ]
+      }
+     ],
+     "prompt_number": 6
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### 2.2 Creating data\n",
+      "\n",
+      "Now, it is time to make some data in the group. We could create just like a dictionary in python.  "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1[\"data1\"]=np.arange(0,10)\n",
+      "s1[\"data1\"]"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 7,
+       "text": [
+        "<HDF5 dataset \"data1\": shape (10,), type \"<i4\">"
+       ]
+      }
+     ],
+     "prompt_number": 7
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The data created can be viewed with the `.value`. "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1[\"data1\"].value"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 8,
+       "text": [
+        "array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
+       ]
+      }
+     ],
+     "prompt_number": 8
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Note that the data object can be used in calculation directly. "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "np.sum(s1[\"data1\"])"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 9,
+       "text": [
+        "45"
+       ]
+      }
+     ],
+     "prompt_number": 9
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1[\"data1\"][2]==2"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 10,
+       "text": [
+        "True"
+       ]
+      }
+     ],
+     "prompt_number": 10
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Also, we could use the `create_dataset()` fucntion to create a new data set. "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1.create_dataset(\"data2\",(3,5),np.int)"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 11,
+       "text": [
+        "<HDF5 dataset \"data2\": shape (3, 5), type \"<i4\">"
+       ]
+      }
+     ],
+     "prompt_number": 11
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1[\"data2\"].value"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 12,
+       "text": [
+        "array([[0, 0, 0, 0, 0],\n",
+        "       [0, 0, 0, 0, 0],\n",
+        "       [0, 0, 0, 0, 0]])"
+       ]
+      }
+     ],
+     "prompt_number": 12
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1.create_dataset(\"data3\",data=np.arange(15))\n",
+      "s1[\"data3\"].value"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 13,
+       "text": [
+        "array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])"
+       ]
+      }
+     ],
+     "prompt_number": 13
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "### 2.3 Deleting groups\n",
+      "\n",
+      "You could use the `del` key word to delete a group."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "s1.keys()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 14,
+       "text": [
+        "[u'data1', u'data2', u'data3']"
+       ]
+      }
+     ],
+     "prompt_number": 14
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "del s1[\"data3\"]\n",
+      "s1.keys()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 15,
+       "text": [
+        "[u'data1', u'data2']"
+       ]
+      }
+     ],
+     "prompt_number": 15
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## 3. Save as CSV file\n",
+      "\n",
+      "If you want to save the data set in the HDF5 file as the csv file, you could use the **csv** package in python. For example, we create a 5 by 5 matrix under `s1`. And then, we could use the `csv.writer()` and `.writerows()` to edit the csv file. "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "import csv\n",
+      "\n",
+      "s1[\"data4\"]=np.random.rand(5,5)\n",
+      "\n",
+      "csvfile = file('csv_test.csv', 'wb')\n",
+      "writer = csv.writer(csvfile)\n",
+      "writer.writerows(s1[\"data4\"])\n",
+      "\n",
+      "csvfile.close()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 16
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "myfile.close()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [],
+     "prompt_number": 17
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## 4. Reference\n",
+      "\n",
+      "- [**h5py.org**](http://docs.h5py.org/en/2.3/index.html)\n",
+      "\n",
+      "- [CSV package in Python](https://docs.python.org/2/library/csv.html)"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
No results found