akatore · December 25, 2020 17:22
diff --git a/Assignment 1.2 (ADSIS).ipynb b/Assignment 1.2 (ADSIS).ipynb
 {
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "# Warmup Assignment 2\nBelow you see some ApacheSpark code written in Python. You don't have to change code now, the only thing we want you to do is to make sure that you have a proper Apache Spark Notebook environment available for this course\n\n"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "This notebook is designed to run in a IBM Watson Studio default runtime (NOT the Watson Studio Apache Spark Runtime as the default runtime with 1 vCPU is free of charge). Therefore, we install Apache Spark in local mode for test purposes only. Please don't use it in production.\n\nIn case you are facing issues, please read the following two documents first:\n\nhttps://github.com/IBM/skillsnetwork/wiki/Environment-Setup\n\nhttps://github.com/IBM/skillsnetwork/wiki/FAQ\n\nThen, please feel free to ask:\n\nhttps://coursera.org/learn/machine-learning-big-data-apache-spark/discussions/all\n\nPlease make sure to follow the guidelines before asking a question:\n\nhttps://github.com/IBM/skillsnetwork/wiki/FAQ#im-feeling-lost-and-confused-please-help-me\n\n\nIf running outside Watson Studio, this should work as well. In case you are running in an Apache Spark context outside Watson Studio, please remove the Apache Spark setup in the first notebook cells."
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "Waiting for a Spark session to start...\nSpark Initialization Done! ApplicationId = app-20201225172102-0000\nKERNEL_ID = 7c74e1b7-bb3c-4288-8dd7-aea43c42d7a8\n"
                },
                {
                    "data": {
                        "text/markdown": "# <span style=\"color:red\"><<<<<!!!!! It seems that you are running in a IBM Watson Studio Apache Spark Notebook. Please run it in an IBM Watson Studio Default Runtime (without Apache Spark) !!!!!>>>>></span>",
                        "text/plain": "<IPython.core.display.Markdown object>"
                    },
                    "metadata": {},
                    "output_type": "display_data"
                }
            ],
            "source": "from IPython.display import Markdown, display\ndef printmd(string):\n    display(Markdown('# <span style=\"color:red\">'+string+'</span>'))\n\n\nif ('sc' in locals() or 'sc' in globals()):\n    printmd('<<<<<!!!!! It seems that you are running in a IBM Watson Studio Apache Spark Notebook. Please run it in an IBM Watson Studio Default Runtime (without Apache Spark) !!!!!>>>>>')\n"
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": "Collecting pyspark==2.4.5\n  Downloading pyspark-2.4.5.tar.gz (217.8 MB)\n\u001b[K     |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 217.8 MB 61.5 MB/s eta 0:00:01\n\u001b[?25hCollecting py4j==0.10.7\n  Downloading py4j-0.10.7-py2.py3-none-any.whl (197 kB)\n\u001b[K     |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 197 kB 49.6 MB/s eta 0:00:01\n\u001b[?25hBuilding wheels for collected packages: pyspark\n  Building wheel for pyspark (setup.py) ... \u001b[?25ldone\n\u001b[?25h  Created wheel for pyspark: filename=pyspark-2.4.5-py2.py3-none-any.whl size=218257927 sha256=7149c23ee868e1c9844b983eb24f3b87f73c4995a958ae11090b5745e2c472ba\n  Stored in directory: /home/spark/shared/.cache/pip/wheels/01/c0/03/1c241c9c482b647d4d99412a98a5c7f87472728ad41ae55e1e\nSuccessfully built pyspark\n\u001b[31mERROR: sparktspy-nojars 2.0.5.0 has requirement pyspark==3.0.1, but you'll have pyspark 2.4.5 which is incompatible.\u001b[0m\nInstalling collected packages: py4j, pyspark\nSuccessfully installed py4j-0.10.7 pyspark-2.4.5\n"
                }
            ],
            "source": "!pip install pyspark==2.4.5"
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {},
            "outputs": [],
            "source": "try:\n    from pyspark import SparkContext, SparkConf\n    from pyspark.sql import SparkSession\nexcept ImportError as e:\n    printmd('<<<<<!!!!! Please restart your kernel after installing Apache Spark !!!!!>>>>>')"
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "metadata": {},
            "outputs": [],
            "source": "sc = SparkContext.getOrCreate(SparkConf().setMaster(\"local[*]\"))\n\nspark = SparkSession \\\n    .builder \\\n    .getOrCreate()"
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {},
            "outputs": [],
            "source": "def assignment1(sc):\n    rdd = sc.parallelize(list(range(100)))\n    return rdd.count()"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "print(assignment1(sc))"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "!rm -f rklib.py\n!wget https://raw.githubusercontent.com/IBM/coursera/master/rklib.py"
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": "Please provide your email address and obtain a submission token on the grader\u2019s submission page in coursera, then execute the cell"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": "from rklib import submit\nimport json\n\nkey = \"R1eDmiHNEei9kxIYdin0mA\"\npart = \"fnFg7\"\nemail =\"abhijeetkatore007@gmail.com\"###_YOUR_CODE_GOES_HERE_###\ntoken = \"xErqjaSG0DTfLN3m\"  ###_YOUR_CODE_GOES_HERE_### #you can obtain it from the grader page on Coursera (have a look here if you need more information https://youtu.be/GcDo0Rwe06U?t=276)\n\n\nsubmit(email, token, key, part, [part], json.dumps(assignment1(sc)))"
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": ""
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3.7 with Spark",
            "language": "python3",
            "name": "python37"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.7.9"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 1
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "# Warmup Assignment 2\nBelow you see some ApacheSpark code written in Python. You don't have to change code now, the only thing we want you to do is to make sure that you have a proper Apache Spark Notebook environment available for this course\n\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "This notebook is designed to run in a IBM Watson Studio default runtime (NOT the Watson Studio Apache Spark Runtime as the default runtime with 1 vCPU is free of charge). Therefore, we install Apache Spark in local mode for test purposes only. Please don't use it in production.\n\nIn case you are facing issues, please read the following two documents first:\n\nhttps://github.com/IBM/skillsnetwork/wiki/Environment-Setup\n\nhttps://github.com/IBM/skillsnetwork/wiki/FAQ\n\nThen, please feel free to ask:\n\nhttps://coursera.org/learn/machine-learning-big-data-apache-spark/discussions/all\n\nPlease make sure to follow the guidelines before asking a question:\n\nhttps://github.com/IBM/skillsnetwork/wiki/FAQ#im-feeling-lost-and-confused-please-help-me\n\n\nIf running outside Watson Studio, this should work as well. In case you are running in an Apache Spark context outside Watson Studio, please remove the Apache Spark setup in the first notebook cells."
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "Waiting for a Spark session to start...\nSpark Initialization Done! ApplicationId = app-20201225172102-0000\nKERNEL_ID = 7c74e1b7-bb3c-4288-8dd7-aea43c42d7a8\n"
	},
	{
	"data": {
	"text/markdown": "# <span style=\"color:red\"><<<<<!!!!! It seems that you are running in a IBM Watson Studio Apache Spark Notebook. Please run it in an IBM Watson Studio Default Runtime (without Apache Spark) !!!!!>>>>></span>",
	"text/plain": "<IPython.core.display.Markdown object>"
	},
	"metadata": {},
	"output_type": "display_data"
	}
	],
	"source": "from IPython.display import Markdown, display\ndef printmd(string):\n display(Markdown('# <span style=\"color:red\">'+string+'</span>'))\n\n\nif ('sc' in locals() or 'sc' in globals()):\n printmd('<<<<<!!!!! It seems that you are running in a IBM Watson Studio Apache Spark Notebook. Please run it in an IBM Watson Studio Default Runtime (without Apache Spark) !!!!!>>>>>')\n"
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": "Collecting pyspark==2.4.5\n Downloading pyspark-2.4.5.tar.gz (217.8 MB)\n\u001b[K \|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\| 217.8 MB 61.5 MB/s eta 0:00:01\n\u001b[?25hCollecting py4j==0.10.7\n Downloading py4j-0.10.7-py2.py3-none-any.whl (197 kB)\n\u001b[K \|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\| 197 kB 49.6 MB/s eta 0:00:01\n\u001b[?25hBuilding wheels for collected packages: pyspark\n Building wheel for pyspark (setup.py) ... \u001b[?25ldone\n\u001b[?25h Created wheel for pyspark: filename=pyspark-2.4.5-py2.py3-none-any.whl size=218257927 sha256=7149c23ee868e1c9844b983eb24f3b87f73c4995a958ae11090b5745e2c472ba\n Stored in directory: /home/spark/shared/.cache/pip/wheels/01/c0/03/1c241c9c482b647d4d99412a98a5c7f87472728ad41ae55e1e\nSuccessfully built pyspark\n\u001b[31mERROR: sparktspy-nojars 2.0.5.0 has requirement pyspark==3.0.1, but you'll have pyspark 2.4.5 which is incompatible.\u001b[0m\nInstalling collected packages: py4j, pyspark\nSuccessfully installed py4j-0.10.7 pyspark-2.4.5\n"
	}
	],
	"source": "!pip install pyspark==2.4.5"
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": "try:\n from pyspark import SparkContext, SparkConf\n from pyspark.sql import SparkSession\nexcept ImportError as e:\n printmd('<<<<<!!!!! Please restart your kernel after installing Apache Spark !!!!!>>>>>')"
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [],
	"source": "sc = SparkContext.getOrCreate(SparkConf().setMaster(\"local[*]\"))\n\nspark = SparkSession \\\n .builder \\\n .getOrCreate()"
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [],
	"source": "def assignment1(sc):\n rdd = sc.parallelize(list(range(100)))\n return rdd.count()"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "print(assignment1(sc))"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "!rm -f rklib.py\n!wget https://raw.githubusercontent.com/IBM/coursera/master/rklib.py"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Please provide your email address and obtain a submission token on the grader\u2019s submission page in coursera, then execute the cell"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": "from rklib import submit\nimport json\n\nkey = \"R1eDmiHNEei9kxIYdin0mA\"\npart = \"fnFg7\"\nemail =\"abhijeetkatore007@gmail.com\"###_YOUR_CODE_GOES_HERE_###\ntoken = \"xErqjaSG0DTfLN3m\" ###_YOUR_CODE_GOES_HERE_### #you can obtain it from the grader page on Coursera (have a look here if you need more information https://youtu.be/GcDo0Rwe06U?t=276)\n\n\nsubmit(email, token, key, part, [part], json.dumps(assignment1(sc)))"
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": ""
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3.7 with Spark",
	"language": "python3",
	"name": "python37"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.9"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 1
	}
No results found