lundeen-bryan · March 25, 2024 19:57
diff --git a/Boolean_Indexing_Masking_Tutorial.ipynb b/Boolean_Indexing_Masking_Tutorial.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# UNDERSTANDING HOW BOOLEAN INDEXING WORKS\n",
    "\n",
    "Boolean Indexing or masking is a technique used to select a subset of the data based on some criteria. This is a difficult concept to grasp, but it is a powerful tool for data analysis. The following exercises were created by ChatGPT to help better understand how boolean indexing works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "URL = \"https://gist.githubusercontent.com/sh7ata/e075ff35b51ebb0d2d577fbe1d19ebc9/raw/b966d02c7c26bcca60703acb1390e938a65a35cb/drinks.csv\"\n",
    "df = pd.read_csv(URL)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1: Filtering Countries with Low Wine Servings\n",
    "#### Task:\n",
    "Filter the dataset to identify countries with wine servings below a certain threshold.\n",
    "\n",
    "#### Instructions:\n",
    "1. Set a threshold for the minimum number of wine servings.\n",
    "2. Use boolean indexing to create a mask that identifies countries with wine servings below the threshold.\n",
    "3. Apply the mask to filter the dataset and display the countries with low wine servings.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the threshold for minimum wine servings\n",
    "threshold = 50"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Personal Comments**:\n",
    "\n",
    "I think the starting point here is to create a threashold value that we can test against in our filter. So for example, if we want to find wine servings below 50, we would set the threshold to 50.\n",
    "\n",
    "ChatGPT gives the following definition of a \"threshold\":\n",
    "\n",
    "A threshold is a value that is used to determine whether a value is \"above\" or \"below\" a certain value. For example, a threshold of 50 means that if a value is 50 or greater, it is considered to be \"above\" the threshold. If a value is less than 50, it is considered to be \"below\" the threshold...The threshold is a specific value or condition used to define the criteria for filtering the data. For example, in Exercise 1, the threshold could represent the minimum number of wine servings below which countries are considered to have low wine consumption."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a boolean mask for countries with wine servings below the threshold\n",
    "low_wine_mask = df['wine_servings'] < threshold\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Personal Comments**:\n",
    "\n",
    "Here we create a variable called low_wine_mask that is a boolean mask that identifies countries with wine servings below the threshold."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply the mask to filter the dataset\n",
    "countries_with_low_wine = df[low_wine_mask]\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# What is the data type of countries_with_low_wine?\n",
    "type(countries_with_low_wine)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.0</td>\n",
       "      <td>0.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217.0</td>\n",
       "      <td>57.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>5.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Antigua &amp; Barbuda</td>\n",
       "      <td>12.0</td>\n",
       "      <td>128.0</td>\n",
       "      <td>45.0</td>\n",
       "      <td>4.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Armenia</td>\n",
       "      <td>21.0</td>\n",
       "      <td>179.0</td>\n",
       "      <td>11.0</td>\n",
       "      <td>3.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Azerbaijan</td>\n",
       "      <td>21.0</td>\n",
       "      <td>46.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>1.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              country  beer_servings  spirit_servings  wine_servings  \\\n",
       "2             Algeria           25.0              NaN           14.0   \n",
       "4              Angola          217.0             57.0           45.0   \n",
       "5   Antigua & Barbuda           12.0            128.0           45.0   \n",
       "7             Armenia           21.0            179.0           11.0   \n",
       "10         Azerbaijan           21.0             46.0            5.0   \n",
       "\n",
       "    total_litres_of_pure_alcohol  \n",
       "2                            0.7  \n",
       "4                            5.9  \n",
       "5                            4.9  \n",
       "7                            3.8  \n",
       "10                           1.3  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "countries_with_low_wine.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Personal Comments**:\n",
    "\n",
    "After creating the threashold mask, we apply the mask to the dataset in a variable called countries_with_low_wine. I was curious to verify the data type of countries_with_low_wine which turns out to be a dataframe.\n",
    "\n",
    "I noticed that there are a few `NaN` values in the resulting dataframe. Since the focus of this exercise is to filter the dataset, we can safely ignore these `NaN` values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Countries with wine servings below 50 :\n",
      "                country  beer_servings  spirit_servings  wine_servings  \\\n",
      "2              Algeria           25.0              NaN           14.0   \n",
      "4               Angola          217.0             57.0           45.0   \n",
      "5    Antigua & Barbuda           12.0            128.0           45.0   \n",
      "7              Armenia           21.0            179.0           11.0   \n",
      "10          Azerbaijan           21.0             46.0            5.0   \n",
      "..                 ...            ...              ...            ...   \n",
      "187            Vanuatu           21.0             18.0           11.0   \n",
      "188          Venezuela          333.0              1.0            3.0   \n",
      "189            Vietnam          111.0              2.0            1.0   \n",
      "191             Zambia           32.0             19.0            4.0   \n",
      "192           Zimbabwe           64.0             18.0            4.0   \n",
      "\n",
      "     total_litres_of_pure_alcohol  \n",
      "2                             0.7  \n",
      "4                             5.9  \n",
      "5                             4.9  \n",
      "7                             3.8  \n",
      "10                            1.3  \n",
      "..                            ...  \n",
      "187                           0.9  \n",
      "188                           7.7  \n",
      "189                           2.0  \n",
      "191                           2.5  \n",
      "192                           4.7  \n",
      "\n",
      "[114 rows x 5 columns]\n"
     ]
    }
   ],
   "source": [
    "# Display the filtered dataset\n",
    "print(\n",
    "    \"Countries with wine servings below\", \n",
    "    threshold, \n",
    "    \":\\n\", \n",
    "    countries_with_low_wine\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2: Finding Countries with High Alcohol Consumption\n",
    "#### Task:\n",
    "Identify countries with high total alcohol consumption based on the sum of beer, spirit, and wine servings.\n",
    "\n",
    "#### Instructions:\n",
    "1. Calculate the total alcohol consumption for each country by summing the servings of beer, spirit, and wine.\n",
    "2. Set a threshold for the minimum total alcohol consumption.\n",
    "3. Create a boolean mask to identify countries with total alcohol consumption above the threshold.\n",
    "4. Apply the mask to filter the dataset and display the countries with high alcohol consumption.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate total alcohol consumption for each country\n",
    "df['total_alcohol'] = df['beer_servings'] + df['spirit_servings'] + df['wine_servings']\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the threshold for minimum total alcohol consumption\n",
    "threshold = 400\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Personal Comments**:\n",
    "\n",
    "Again we'll set a threshold value that we can test against in our filter. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a boolean mask for countries with total alcohol consumption above the threshold\n",
    "high_alcohol_mask = df['total_alcohol'] > threshold\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply the mask to filter the dataset\n",
    "countries_with_high_alcohol = df[high_alcohol_mask]\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Countries with total alcohol consumption above 400 :\n",
      "                 country  beer_servings  spirit_servings  wine_servings  \\\n",
      "3               Andorra          245.0            138.0          312.0   \n",
      "6             Argentina          193.0             25.0          221.0   \n",
      "8             Australia          261.0             72.0          212.0   \n",
      "9               Austria          279.0             75.0          191.0   \n",
      "15              Belarus          142.0            373.0           42.0   \n",
      "16              Belgium          295.0             84.0          212.0   \n",
      "23               Brazil          245.0            145.0           16.0   \n",
      "25             Bulgaria          231.0            252.0           94.0   \n",
      "44               Cyprus          192.0            154.0          113.0   \n",
      "45       Czech Republic          361.0             17.0          134.0   \n",
      "48              Denmark          224.0             81.0          278.0   \n",
      "57              Estonia          224.0            194.0           59.0   \n",
      "60              Finland          263.0            133.0           97.0   \n",
      "62                Gabon          347.0             98.0           59.0   \n",
      "65              Germany          346.0            117.0          175.0   \n",
      "67               Greece          133.0            112.0          218.0   \n",
      "68              Grenada          199.0            438.0           28.0   \n",
      "75              Hungary          234.0            215.0          185.0   \n",
      "81              Ireland          313.0            118.0          165.0   \n",
      "93               Latvia          281.0            216.0           62.0   \n",
      "98            Lithuania          343.0            244.0           56.0   \n",
      "99           Luxembourg          236.0            133.0          271.0   \n",
      "132            Paraguay          213.0            117.0           74.0   \n",
      "135              Poland          343.0            215.0           56.0   \n",
      "136            Portugal          194.0             67.0          339.0   \n",
      "140             Romania          297.0            122.0          167.0   \n",
      "141  Russian Federation          247.0            326.0           73.0   \n",
      "144           St. Lucia          171.0            315.0           71.0   \n",
      "151              Serbia          283.0            131.0          127.0   \n",
      "155            Slovakia          196.0            293.0          116.0   \n",
      "160               Spain          284.0            157.0          112.0   \n",
      "182      United Kingdom          219.0            126.0          195.0   \n",
      "184                 USA          249.0            158.0           84.0   \n",
      "\n",
      "     total_litres_of_pure_alcohol  total_alcohol  \n",
      "3                            12.4          695.0  \n",
      "6                             8.3          439.0  \n",
      "8                             1.4          545.0  \n",
      "9                             9.7          545.0  \n",
      "15                           14.4          557.0  \n",
      "16                            1.5          591.0  \n",
      "23                            7.2          406.0  \n",
      "25                            1.3          577.0  \n",
      "44                            8.2          459.0  \n",
      "45                           11.8          512.0  \n",
      "48                            1.4          583.0  \n",
      "57                            9.5          477.0  \n",
      "60                            1.0          493.0  \n",
      "62                            8.9          504.0  \n",
      "65                           11.3          638.0  \n",
      "67                            8.3          463.0  \n",
      "68                           11.9          665.0  \n",
      "75                           11.3          634.0  \n",
      "81                           11.4          596.0  \n",
      "93                            1.5          559.0  \n",
      "98                           12.9          643.0  \n",
      "99                           11.4          640.0  \n",
      "132                           7.3          404.0  \n",
      "135                           1.9          614.0  \n",
      "136                          11.0          600.0  \n",
      "140                           1.4          586.0  \n",
      "141                          11.5          646.0  \n",
      "144                           1.1          557.0  \n",
      "151                           9.6          541.0  \n",
      "155                          11.4          605.0  \n",
      "160                           1.0          553.0  \n",
      "182                           1.4          540.0  \n",
      "184                           8.7          491.0  \n"
     ]
    }
   ],
   "source": [
    "# Display the filtered dataset\n",
    "print(\"Countries with total alcohol consumption above\", threshold, \":\\n\", countries_with_high_alcohol)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Personal Comments**:\n",
    "\n",
    "Note that we are only using alcohol consumption as a measurement for all other columns. It looks like columns of data in a dataset are called 'measures' or 'features' in the context of this exercise. For comparison, Power BI uses the term 'measures' to refer to columns of data that are used to define a 'feature' in a Python dataset."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# UNDERSTANDING HOW BOOLEAN INDEXING WORKS\n",
	"\n",
	"Boolean Indexing or masking is a technique used to select a subset of the data based on some criteria. This is a difficult concept to grasp, but it is a powerful tool for data analysis. The following exercises were created by ChatGPT to help better understand how boolean indexing works."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"import pandas as pd"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"URL = \"https://gist.githubusercontent.com/sh7ata/e075ff35b51ebb0d2d577fbe1d19ebc9/raw/b966d02c7c26bcca60703acb1390e938a65a35cb/drinks.csv\"\n",
	"df = pd.read_csv(URL)\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Exercise 1: Filtering Countries with Low Wine Servings\n",
	"#### Task:\n",
	"Filter the dataset to identify countries with wine servings below a certain threshold.\n",
	"\n",
	"#### Instructions:\n",
	"1. Set a threshold for the minimum number of wine servings.\n",
	"2. Use boolean indexing to create a mask that identifies countries with wine servings below the threshold.\n",
	"3. Apply the mask to filter the dataset and display the countries with low wine servings.\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Set the threshold for minimum wine servings\n",
	"threshold = 50"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Personal Comments:\n",
	"\n",
	"I think the starting point here is to create a threashold value that we can test against in our filter. So for example, if we want to find wine servings below 50, we would set the threshold to 50.\n",
	"\n",
	"ChatGPT gives the following definition of a \"threshold\":\n",
	"\n",
	"A threshold is a value that is used to determine whether a value is \"above\" or \"below\" a certain value. For example, a threshold of 50 means that if a value is 50 or greater, it is considered to be \"above\" the threshold. If a value is less than 50, it is considered to be \"below\" the threshold...The threshold is a specific value or condition used to define the criteria for filtering the data. For example, in Exercise 1, the threshold could represent the minimum number of wine servings below which countries are considered to have low wine consumption."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Create a boolean mask for countries with wine servings below the threshold\n",
	"low_wine_mask = df['wine_servings'] < threshold\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Personal Comments:\n",
	"\n",
	"Here we create a variable called low_wine_mask that is a boolean mask that identifies countries with wine servings below the threshold."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Apply the mask to filter the dataset\n",
	"countries_with_low_wine = df[low_wine_mask]\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"pandas.core.frame.DataFrame"
	]
	},
	"execution_count": 6,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"# What is the data type of countries_with_low_wine?\n",
	"type(countries_with_low_wine)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>country</th>\n",
	" <th>beer_servings</th>\n",
	" <th>spirit_servings</th>\n",
	" <th>wine_servings</th>\n",
	" <th>total_litres_of_pure_alcohol</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>Algeria</td>\n",
	" <td>25.0</td>\n",
	" <td>NaN</td>\n",
	" <td>14.0</td>\n",
	" <td>0.7</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>Angola</td>\n",
	" <td>217.0</td>\n",
	" <td>57.0</td>\n",
	" <td>45.0</td>\n",
	" <td>5.9</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>5</th>\n",
	" <td>Antigua & Barbuda</td>\n",
	" <td>12.0</td>\n",
	" <td>128.0</td>\n",
	" <td>45.0</td>\n",
	" <td>4.9</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>7</th>\n",
	" <td>Armenia</td>\n",
	" <td>21.0</td>\n",
	" <td>179.0</td>\n",
	" <td>11.0</td>\n",
	" <td>3.8</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>10</th>\n",
	" <td>Azerbaijan</td>\n",
	" <td>21.0</td>\n",
	" <td>46.0</td>\n",
	" <td>5.0</td>\n",
	" <td>1.3</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" country beer_servings spirit_servings wine_servings \\\n",
	"2 Algeria 25.0 NaN 14.0 \n",
	"4 Angola 217.0 57.0 45.0 \n",
	"5 Antigua & Barbuda 12.0 128.0 45.0 \n",
	"7 Armenia 21.0 179.0 11.0 \n",
	"10 Azerbaijan 21.0 46.0 5.0 \n",
	"\n",
	" total_litres_of_pure_alcohol \n",
	"2 0.7 \n",
	"4 5.9 \n",
	"5 4.9 \n",
	"7 3.8 \n",
	"10 1.3 "
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"countries_with_low_wine.head()"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Personal Comments:\n",
	"\n",
	"After creating the threashold mask, we apply the mask to the dataset in a variable called countries_with_low_wine. I was curious to verify the data type of countries_with_low_wine which turns out to be a dataframe.\n",
	"\n",
	"I noticed that there are a few `NaN` values in the resulting dataframe. Since the focus of this exercise is to filter the dataset, we can safely ignore these `NaN` values."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Countries with wine servings below 50 :\n",
	" country beer_servings spirit_servings wine_servings \\\n",
	"2 Algeria 25.0 NaN 14.0 \n",
	"4 Angola 217.0 57.0 45.0 \n",
	"5 Antigua & Barbuda 12.0 128.0 45.0 \n",
	"7 Armenia 21.0 179.0 11.0 \n",
	"10 Azerbaijan 21.0 46.0 5.0 \n",
	".. ... ... ... ... \n",
	"187 Vanuatu 21.0 18.0 11.0 \n",
	"188 Venezuela 333.0 1.0 3.0 \n",
	"189 Vietnam 111.0 2.0 1.0 \n",
	"191 Zambia 32.0 19.0 4.0 \n",
	"192 Zimbabwe 64.0 18.0 4.0 \n",
	"\n",
	" total_litres_of_pure_alcohol \n",
	"2 0.7 \n",
	"4 5.9 \n",
	"5 4.9 \n",
	"7 3.8 \n",
	"10 1.3 \n",
	".. ... \n",
	"187 0.9 \n",
	"188 7.7 \n",
	"189 2.0 \n",
	"191 2.5 \n",
	"192 4.7 \n",
	"\n",
	"[114 rows x 5 columns]\n"
	]
	}
	],
	"source": [
	"# Display the filtered dataset\n",
	"print(\n",
	" \"Countries with wine servings below\", \n",
	" threshold, \n",
	" \":\\n\", \n",
	" countries_with_low_wine\n",
	")\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"### Exercise 2: Finding Countries with High Alcohol Consumption\n",
	"#### Task:\n",
	"Identify countries with high total alcohol consumption based on the sum of beer, spirit, and wine servings.\n",
	"\n",
	"#### Instructions:\n",
	"1. Calculate the total alcohol consumption for each country by summing the servings of beer, spirit, and wine.\n",
	"2. Set a threshold for the minimum total alcohol consumption.\n",
	"3. Create a boolean mask to identify countries with total alcohol consumption above the threshold.\n",
	"4. Apply the mask to filter the dataset and display the countries with high alcohol consumption.\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Calculate total alcohol consumption for each country\n",
	"df['total_alcohol'] = df['beer_servings'] + df['spirit_servings'] + df['wine_servings']\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Set the threshold for minimum total alcohol consumption\n",
	"threshold = 400\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Personal Comments:\n",
	"\n",
	"Again we'll set a threshold value that we can test against in our filter. "
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Create a boolean mask for countries with total alcohol consumption above the threshold\n",
	"high_alcohol_mask = df['total_alcohol'] > threshold\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Apply the mask to filter the dataset\n",
	"countries_with_high_alcohol = df[high_alcohol_mask]\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Countries with total alcohol consumption above 400 :\n",
	" country beer_servings spirit_servings wine_servings \\\n",
	"3 Andorra 245.0 138.0 312.0 \n",
	"6 Argentina 193.0 25.0 221.0 \n",
	"8 Australia 261.0 72.0 212.0 \n",
	"9 Austria 279.0 75.0 191.0 \n",
	"15 Belarus 142.0 373.0 42.0 \n",
	"16 Belgium 295.0 84.0 212.0 \n",
	"23 Brazil 245.0 145.0 16.0 \n",
	"25 Bulgaria 231.0 252.0 94.0 \n",
	"44 Cyprus 192.0 154.0 113.0 \n",
	"45 Czech Republic 361.0 17.0 134.0 \n",
	"48 Denmark 224.0 81.0 278.0 \n",
	"57 Estonia 224.0 194.0 59.0 \n",
	"60 Finland 263.0 133.0 97.0 \n",
	"62 Gabon 347.0 98.0 59.0 \n",
	"65 Germany 346.0 117.0 175.0 \n",
	"67 Greece 133.0 112.0 218.0 \n",
	"68 Grenada 199.0 438.0 28.0 \n",
	"75 Hungary 234.0 215.0 185.0 \n",
	"81 Ireland 313.0 118.0 165.0 \n",
	"93 Latvia 281.0 216.0 62.0 \n",
	"98 Lithuania 343.0 244.0 56.0 \n",
	"99 Luxembourg 236.0 133.0 271.0 \n",
	"132 Paraguay 213.0 117.0 74.0 \n",
	"135 Poland 343.0 215.0 56.0 \n",
	"136 Portugal 194.0 67.0 339.0 \n",
	"140 Romania 297.0 122.0 167.0 \n",
	"141 Russian Federation 247.0 326.0 73.0 \n",
	"144 St. Lucia 171.0 315.0 71.0 \n",
	"151 Serbia 283.0 131.0 127.0 \n",
	"155 Slovakia 196.0 293.0 116.0 \n",
	"160 Spain 284.0 157.0 112.0 \n",
	"182 United Kingdom 219.0 126.0 195.0 \n",
	"184 USA 249.0 158.0 84.0 \n",
	"\n",
	" total_litres_of_pure_alcohol total_alcohol \n",
	"3 12.4 695.0 \n",
	"6 8.3 439.0 \n",
	"8 1.4 545.0 \n",
	"9 9.7 545.0 \n",
	"15 14.4 557.0 \n",
	"16 1.5 591.0 \n",
	"23 7.2 406.0 \n",
	"25 1.3 577.0 \n",
	"44 8.2 459.0 \n",
	"45 11.8 512.0 \n",
	"48 1.4 583.0 \n",
	"57 9.5 477.0 \n",
	"60 1.0 493.0 \n",
	"62 8.9 504.0 \n",
	"65 11.3 638.0 \n",
	"67 8.3 463.0 \n",
	"68 11.9 665.0 \n",
	"75 11.3 634.0 \n",
	"81 11.4 596.0 \n",
	"93 1.5 559.0 \n",
	"98 12.9 643.0 \n",
	"99 11.4 640.0 \n",
	"132 7.3 404.0 \n",
	"135 1.9 614.0 \n",
	"136 11.0 600.0 \n",
	"140 1.4 586.0 \n",
	"141 11.5 646.0 \n",
	"144 1.1 557.0 \n",
	"151 9.6 541.0 \n",
	"155 11.4 605.0 \n",
	"160 1.0 553.0 \n",
	"182 1.4 540.0 \n",
	"184 8.7 491.0 \n"
	]
	}
	],
	"source": [
	"# Display the filtered dataset\n",
	"print(\"Countries with total alcohol consumption above\", threshold, \":\\n\", countries_with_high_alcohol)\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Personal Comments:\n",
	"\n",
	"Note that we are only using alcohol consumption as a measurement for all other columns. It looks like columns of data in a dataset are called 'measures' or 'features' in the context of this exercise. For comparison, Power BI uses the term 'measures' to refer to columns of data that are used to define a 'feature' in a Python dataset."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.11.8"
	},
	"orig_nbformat": 4
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}
No results found