Skip to content

Instantly share code, notes, and snippets.

@andrewxiechina
Created December 4, 2017 05:51
Show Gist options
  • Select an option

  • Save andrewxiechina/634160dd6cfadefb5a1dbf86f821c141 to your computer and use it in GitHub Desktop.

Select an option

Save andrewxiechina/634160dd6cfadefb5a1dbf86f821c141 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas Exercise 04 - Filtering and Sorting Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This time we are going to pull data directly from the internet.\n",
"Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.\n",
"\n",
"### Step 1. Import the necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3. Assign it to a variable called chipo."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>order_id</th>\n",
" <th>quantity</th>\n",
" <th>item_name</th>\n",
" <th>choice_description</th>\n",
" <th>item_price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Chips and Fresh Tomato Salsa</td>\n",
" <td>NaN</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Izze</td>\n",
" <td>[Clementine]</td>\n",
" <td>$3.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Nantucket Nectar</td>\n",
" <td>[Apple]</td>\n",
" <td>$3.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>Chips and Tomatillo-Green Chili Salsa</td>\n",
" <td>NaN</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>Chicken Bowl</td>\n",
" <td>[Tomatillo-Red Chili Salsa (Hot), [Black Beans...</td>\n",
" <td>$16.98</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" order_id quantity item_name \\\n",
"0 1 1 Chips and Fresh Tomato Salsa \n",
"1 1 1 Izze \n",
"2 1 1 Nantucket Nectar \n",
"3 1 1 Chips and Tomatillo-Green Chili Salsa \n",
"4 2 2 Chicken Bowl \n",
"\n",
" choice_description item_price \n",
"0 NaN $2.39 \n",
"1 [Clementine] $3.39 \n",
"2 [Apple] $3.39 \n",
"3 NaN $2.39 \n",
"4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans... $16.98 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo = pd.read_csv(\"https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv\", sep='\\t')\n",
"chipo.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4. How many products cost more than $10.00?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"31"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo[\"price\"] = chipo[\"item_price\"].apply(lambda x: float(x[1:-1]))\n",
"chipo[chipo[\"price\"] > 10.00].item_name.value_counts().count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 5. What is the price of each item? \n",
"###### print a data frame with only two columns item_name and item_price"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>item_name</th>\n",
" <th>item_price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>281</th>\n",
" <td>Steak Salad Bowl</td>\n",
" <td>$23.78</td>\n",
" </tr>\n",
" <tr>\n",
" <th>445</th>\n",
" <td>Bowl</td>\n",
" <td>$22.20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Chicken Bowl</td>\n",
" <td>$16.98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1229</th>\n",
" <td>Barbacoa Salad Bowl</td>\n",
" <td>$11.89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1132</th>\n",
" <td>Carnitas Salad Bowl</td>\n",
" <td>$11.89</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Steak Burrito</td>\n",
" <td>$11.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>Barbacoa Bowl</td>\n",
" <td>$11.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>168</th>\n",
" <td>Barbacoa Crispy Tacos</td>\n",
" <td>$11.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>186</th>\n",
" <td>Veggie Salad Bowl</td>\n",
" <td>$11.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>738</th>\n",
" <td>Veggie Soft Tacos</td>\n",
" <td>$11.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>Veggie Burrito</td>\n",
" <td>$11.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>62</th>\n",
" <td>Veggie Bowl</td>\n",
" <td>$11.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>250</th>\n",
" <td>Chicken Salad</td>\n",
" <td>$10.98</td>\n",
" </tr>\n",
" <tr>\n",
" <th>92</th>\n",
" <td>Steak Crispy Tacos</td>\n",
" <td>$9.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>237</th>\n",
" <td>Carnitas Soft Tacos</td>\n",
" <td>$9.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>554</th>\n",
" <td>Carnitas Crispy Tacos</td>\n",
" <td>$9.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Steak Soft Tacos</td>\n",
" <td>$9.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>Barbacoa Soft Tacos</td>\n",
" <td>$9.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>Steak Bowl</td>\n",
" <td>$8.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>664</th>\n",
" <td>Steak Salad</td>\n",
" <td>$8.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3750</th>\n",
" <td>Carnitas Salad</td>\n",
" <td>$8.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>Carnitas Bowl</td>\n",
" <td>$8.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>Carnitas Burrito</td>\n",
" <td>$8.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Barbacoa Burrito</td>\n",
" <td>$8.99</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>Chicken Salad Bowl</td>\n",
" <td>$8.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Chicken Crispy Tacos</td>\n",
" <td>$8.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Chicken Soft Tacos</td>\n",
" <td>$8.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Chicken Burrito</td>\n",
" <td>$8.49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1694</th>\n",
" <td>Veggie Salad</td>\n",
" <td>$8.49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1653</th>\n",
" <td>Veggie Crispy Tacos</td>\n",
" <td>$8.49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1414</th>\n",
" <td>Salad</td>\n",
" <td>$7.40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>520</th>\n",
" <td>Crispy Tacos</td>\n",
" <td>$7.40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>510</th>\n",
" <td>Burrito</td>\n",
" <td>$7.40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>298</th>\n",
" <td>6 Pack Soft Drink</td>\n",
" <td>$6.49</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Chips and Guacamole</td>\n",
" <td>$4.45</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Nantucket Nectar</td>\n",
" <td>$3.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Izze</td>\n",
" <td>$3.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>674</th>\n",
" <td>Chips and Mild Fresh Tomato Salsa</td>\n",
" <td>$3.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>233</th>\n",
" <td>Chips and Roasted Chili Corn Salsa</td>\n",
" <td>$2.95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>111</th>\n",
" <td>Chips and Tomatillo Red Chili Salsa</td>\n",
" <td>$2.95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>Chips and Tomatillo Green Chili Salsa</td>\n",
" <td>$2.95</td>\n",
" </tr>\n",
" <tr>\n",
" <th>300</th>\n",
" <td>Chips and Tomatillo-Red Chili Salsa</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>191</th>\n",
" <td>Chips and Roasted Chili-Corn Salsa</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Chips and Tomatillo-Green Chili Salsa</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Chips and Fresh Tomato Salsa</td>\n",
" <td>$2.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Canned Soda</td>\n",
" <td>$2.18</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>Chips</td>\n",
" <td>$2.15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Side of Chips</td>\n",
" <td>$1.69</td>\n",
" </tr>\n",
" <tr>\n",
" <th>263</th>\n",
" <td>Canned Soft Drink</td>\n",
" <td>$1.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>Bottled Water</td>\n",
" <td>$1.09</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" item_name item_price\n",
"281 Steak Salad Bowl $23.78 \n",
"445 Bowl $22.20 \n",
"4 Chicken Bowl $16.98 \n",
"1229 Barbacoa Salad Bowl $11.89 \n",
"1132 Carnitas Salad Bowl $11.89 \n",
"7 Steak Burrito $11.75 \n",
"39 Barbacoa Bowl $11.75 \n",
"168 Barbacoa Crispy Tacos $11.75 \n",
"186 Veggie Salad Bowl $11.25 \n",
"738 Veggie Soft Tacos $11.25 \n",
"57 Veggie Burrito $11.25 \n",
"62 Veggie Bowl $11.25 \n",
"250 Chicken Salad $10.98 \n",
"92 Steak Crispy Tacos $9.25 \n",
"237 Carnitas Soft Tacos $9.25 \n",
"554 Carnitas Crispy Tacos $9.25 \n",
"8 Steak Soft Tacos $9.25 \n",
"56 Barbacoa Soft Tacos $9.25 \n",
"54 Steak Bowl $8.99 \n",
"664 Steak Salad $8.99 \n",
"3750 Carnitas Salad $8.99 \n",
"33 Carnitas Bowl $8.99 \n",
"27 Carnitas Burrito $8.99 \n",
"21 Barbacoa Burrito $8.99 \n",
"44 Chicken Salad Bowl $8.75 \n",
"11 Chicken Crispy Tacos $8.75 \n",
"12 Chicken Soft Tacos $8.75 \n",
"16 Chicken Burrito $8.49 \n",
"1694 Veggie Salad $8.49 \n",
"1653 Veggie Crispy Tacos $8.49 \n",
"1414 Salad $7.40 \n",
"520 Crispy Tacos $7.40 \n",
"510 Burrito $7.40 \n",
"298 6 Pack Soft Drink $6.49 \n",
"10 Chips and Guacamole $4.45 \n",
"2 Nantucket Nectar $3.39 \n",
"1 Izze $3.39 \n",
"674 Chips and Mild Fresh Tomato Salsa $3.00 \n",
"233 Chips and Roasted Chili Corn Salsa $2.95 \n",
"111 Chips and Tomatillo Red Chili Salsa $2.95 \n",
"38 Chips and Tomatillo Green Chili Salsa $2.95 \n",
"300 Chips and Tomatillo-Red Chili Salsa $2.39 \n",
"191 Chips and Roasted Chili-Corn Salsa $2.39 \n",
"3 Chips and Tomatillo-Green Chili Salsa $2.39 \n",
"0 Chips and Fresh Tomato Salsa $2.39 \n",
"18 Canned Soda $2.18 \n",
"40 Chips $2.15 \n",
"6 Side of Chips $1.69 \n",
"263 Canned Soft Drink $1.25 \n",
"34 Bottled Water $1.09 "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.drop_duplicates([\"item_name\"]).sort_values(\"price\", ascending=False)[[\"item_name\", \"item_price\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 6. Sort by the name of the item"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"298 6 Pack Soft Drink\n",
"39 Barbacoa Bowl\n",
"21 Barbacoa Burrito\n",
"168 Barbacoa Crispy Tacos\n",
"1229 Barbacoa Salad Bowl\n",
"56 Barbacoa Soft Tacos\n",
"34 Bottled Water\n",
"445 Bowl\n",
"510 Burrito\n",
"18 Canned Soda\n",
"263 Canned Soft Drink\n",
"33 Carnitas Bowl\n",
"27 Carnitas Burrito\n",
"554 Carnitas Crispy Tacos\n",
"3750 Carnitas Salad\n",
"1132 Carnitas Salad Bowl\n",
"237 Carnitas Soft Tacos\n",
"4 Chicken Bowl\n",
"16 Chicken Burrito\n",
"11 Chicken Crispy Tacos\n",
"250 Chicken Salad\n",
"44 Chicken Salad Bowl\n",
"12 Chicken Soft Tacos\n",
"40 Chips\n",
"0 Chips and Fresh Tomato Salsa\n",
"10 Chips and Guacamole\n",
"674 Chips and Mild Fresh Tomato Salsa\n",
"233 Chips and Roasted Chili Corn Salsa\n",
"191 Chips and Roasted Chili-Corn Salsa\n",
"38 Chips and Tomatillo Green Chili Salsa\n",
"111 Chips and Tomatillo Red Chili Salsa\n",
"3 Chips and Tomatillo-Green Chili Salsa\n",
"300 Chips and Tomatillo-Red Chili Salsa\n",
"520 Crispy Tacos\n",
"1 Izze\n",
"2 Nantucket Nectar\n",
"1414 Salad\n",
"6 Side of Chips\n",
"54 Steak Bowl\n",
"7 Steak Burrito\n",
"92 Steak Crispy Tacos\n",
"664 Steak Salad\n",
"281 Steak Salad Bowl\n",
"8 Steak Soft Tacos\n",
"62 Veggie Bowl\n",
"57 Veggie Burrito\n",
"1653 Veggie Crispy Tacos\n",
"1694 Veggie Salad\n",
"186 Veggie Salad Bowl\n",
"738 Veggie Soft Tacos\n",
"Name: item_name, dtype: object"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chipo.drop_duplicates([\"item_name\"]).sort_values(\"item_name\").item_name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 7. What was the quantity of the most expensive item ordered?"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"31"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"item = chipo.drop_duplicates([\"item_name\"]).sort_values(\"price\", ascending=False)[\"item_name\"].values[0]\n",
"chipo[chipo[\"item_name\"] == item].quantity.sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 8. How many times were a Veggie Salad Bowl ordered?"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"18"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"item = \"Veggie Salad Bowl\"\n",
"chipo[chipo[\"item_name\"] == item].quantity.sum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 9. How many times people orderd more than one Canned Soda?"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"20"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"item = \"Canned Soda\"\n",
"temp = chipo[chipo[\"item_name\"] == item]\n",
"temp[temp[\"quantity\"] > 1].order_id.count()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment