Skip to content

Instantly share code, notes, and snippets.

@andrewxiechina
Created December 4, 2017 05:22
Show Gist options
  • Select an option

  • Save andrewxiechina/2481c0e8d7d5e3a9b7bcc943f948ed56 to your computer and use it in GitHub Desktop.

Select an option

Save andrewxiechina/2481c0e8d7d5e3a9b7bcc943f948ed56 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas Exercise 01 - Getting and Knowing your Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This time we are going to pull data directly from the internet.\n",
"Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
"\n",
"### Step 1. Import the necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3. Assign it to a variable called chipo."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"chipo = pd.read_csv(\"https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv\", sep='\\t')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4. See the first 10 entries"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>order_id</th>\n",
" <th>quantity</th>\n",
" <th>item_name</th>\n",
" <th>choice_description</th>\n",
" <th>item_price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Chips and Fresh Tomato Salsa</td>\n",
" <td>NaN</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Izze</td>\n",
" <td>[Clementine]</td>\n",
" <td>$3.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Nantucket Nectar</td>\n",
" <td>[Apple]</td>\n",
" <td>$3.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Chips and Tomatillo-Green Chili Salsa</td>\n",
" <td>NaN</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>Chicken Bowl</td>\n",
" <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n",
" <td>$16.98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>Chicken Bowl</td>\n",
" <td>[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...</td>\n",
" <td>$10.98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>Side of Chips</td>\n",
" <td>NaN</td>\n",
" <td>$1.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>Steak Burrito</td>\n",
" <td>[Tomatillo Red Chili Salsa, [Fajita Vegetables...</td>\n",
" <td>$11.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>Steak Soft Tacos</td>\n",
" <td>[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...</td>\n",
" <td>$9.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>Steak Burrito</td>\n",
" <td>[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...</td>\n",
" <td>$9.25</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" order_id quantity item_name \\\n",
"0 1 1 Chips and Fresh Tomato Salsa \n",
"1 1 1 Izze \n",
"2 1 1 Nantucket Nectar \n",
"3 1 1 Chips and Tomatillo-Green Chili Salsa \n",
"4 2 2 Chicken Bowl \n",
"5 3 1 Chicken Bowl \n",
"6 3 1 Side of Chips \n",
"7 4 1 Steak Burrito \n",
"8 4 1 Steak Soft Tacos \n",
"9 5 1 Steak Burrito \n",
"\n",
" choice_description item_price \n",
"0 NaN $2.39 \n",
"1 [Clementine] $3.39 \n",
"2 [Apple] $3.39 \n",
"3 NaN $2.39 \n",
"4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 \n",
"5 [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou... $10.98 \n",
"6 NaN $1.69 \n",
"7 [Tomatillo Red Chili Salsa, [Fajita Vegetables... $11.75 \n",
"8 [Tomatillo Green Chili Salsa, [Pinto Beans, Ch... $9.25 \n",
"9 [Fresh Tomato Salsa, [Rice, Black Beans, Pinto... $9.25 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 5. What is the number of observations in the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4622"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.shape[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 6. What is the number of columns in the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.shape[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 7. Print the name of all the columns."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['order_id', 'quantity', 'item_name', 'choice_description',\n",
" 'item_price'],\n",
" dtype='object')"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 8. How is the dataset indexed?"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RangeIndex(start=0, stop=4622, step=1)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 9. Which was the most ordered item?"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>order_id</th>\n",
" <th>quantity</th>\n",
" </tr>\n",
" <tr>\n",
" <th>item_name</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chicken Bowl</th>\n",
" <td>713926</td>\n",
" <td>761</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chicken Burrito</th>\n",
" <td>497303</td>\n",
" <td>591</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chips and Guacamole</th>\n",
" <td>449959</td>\n",
" <td>506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Steak Burrito</th>\n",
" <td>328437</td>\n",
" <td>386</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Canned Soft Drink</th>\n",
" <td>304753</td>\n",
" <td>351</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" order_id quantity\n",
"item_name \n",
"Chicken Bowl 713926 761\n",
"Chicken Burrito 497303 591\n",
"Chips and Guacamole 449959 506\n",
"Steak Burrito 328437 386\n",
"Canned Soft Drink 304753 351"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.groupby(\"item_name\").sum().sort_values([\"quantity\"], ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 10. How many items were ordered?"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>order_id</th>\n",
" <th>quantity</th>\n",
" </tr>\n",
" <tr>\n",
" <th>item_name</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chicken Bowl</th>\n",
" <td>713926</td>\n",
" <td>761</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chicken Burrito</th>\n",
" <td>497303</td>\n",
" <td>591</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chips and Guacamole</th>\n",
" <td>449959</td>\n",
" <td>506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Steak Burrito</th>\n",
" <td>328437</td>\n",
" <td>386</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Canned Soft Drink</th>\n",
" <td>304753</td>\n",
" <td>351</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" order_id quantity\n",
"item_name \n",
"Chicken Bowl 713926 761\n",
"Chicken Burrito 497303 591\n",
"Chips and Guacamole 449959 506\n",
"Steak Burrito 328437 386\n",
"Canned Soft Drink 304753 351"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.groupby(\"item_name\").sum().sort_values([\"quantity\"], ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 11. What was the most ordered item in the choice_description column?"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>order_id</th>\n",
" <th>quantity</th>\n",
" </tr>\n",
" <tr>\n",
" <th>choice_description</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>[Diet Coke]</th>\n",
" <td>123455</td>\n",
" <td>159</td>\n",
" </tr>\n",
" <tr>\n",
" <th>[Coke]</th>\n",
" <td>122752</td>\n",
" <td>143</td>\n",
" </tr>\n",
" <tr>\n",
" <th>[Sprite]</th>\n",
" <td>80426</td>\n",
" <td>89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream, Lettuce]]</th>\n",
" <td>43088</td>\n",
" <td>49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream]]</th>\n",
" <td>36041</td>\n",
" <td>42</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" order_id quantity\n",
"choice_description \n",
"[Diet Coke] 123455 159\n",
"[Coke] 122752 143\n",
"[Sprite] 80426 89\n",
"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese... 43088 49\n",
"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese... 36041 42"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.groupby(\"choice_description\").sum().sort_values([\"quantity\"], ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 12. How many items were orderd in total?"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>order_id</th>\n",
" <th>quantity</th>\n",
" </tr>\n",
" <tr>\n",
" <th>item_name</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Chicken Bowl</th>\n",
" <td>713926</td>\n",
" <td>761</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chicken Burrito</th>\n",
" <td>497303</td>\n",
" <td>591</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Chips and Guacamole</th>\n",
" <td>449959</td>\n",
" <td>506</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Steak Burrito</th>\n",
" <td>328437</td>\n",
" <td>386</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Canned Soft Drink</th>\n",
" <td>304753</td>\n",
" <td>351</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" order_id quantity\n",
"item_name \n",
"Chicken Bowl 713926 761\n",
"Chicken Burrito 497303 591\n",
"Chips and Guacamole 449959 506\n",
"Steak Burrito 328437 386\n",
"Canned Soft Drink 304753 351"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.groupby(\"item_name\").sum().sort_values([\"quantity\"], ascending=False).head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 13. Turn the item price into a float"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"chipo[\"price\"] = chipo[\"item_price\"].apply(lambda x: float(x[1:-1]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 14. How much was the revenue for the period in the dataset?"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"39237.020000000055"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo[\"revenue\"] = chipo[\"quantity\"] * chipo[\"price\"]\n",
"chipo[\"revenue\"].sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 15. How many orders were made in the period?"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1834"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo[\"order_id\"].value_counts().count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 16. What is the average amount per order?"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"21.394231188658654"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.groupby(\"order_id\").sum()[\"revenue\"].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 17. How many different items are sold?"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"50"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo[\"item_name\"].value_counts().count()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment