Skip to content

Instantly share code, notes, and snippets.

@akatore
Created December 25, 2020 17:22
Show Gist options
  • Select an option

  • Save akatore/6e11c46929b3357f913b82ed278dc585 to your computer and use it in GitHub Desktop.

Select an option

Save akatore/6e11c46929b3357f913b82ed278dc585 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "# Warmup Assignment 2\nBelow you see some ApacheSpark code written in Python. You don't have to change code now, the only thing we want you to do is to make sure that you have a proper Apache Spark Notebook environment available for this course\n\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "This notebook is designed to run in a IBM Watson Studio default runtime (NOT the Watson Studio Apache Spark Runtime as the default runtime with 1 vCPU is free of charge). Therefore, we install Apache Spark in local mode for test purposes only. Please don't use it in production.\n\nIn case you are facing issues, please read the following two documents first:\n\nhttps://github.com/IBM/skillsnetwork/wiki/Environment-Setup\n\nhttps://github.com/IBM/skillsnetwork/wiki/FAQ\n\nThen, please feel free to ask:\n\nhttps://coursera.org/learn/machine-learning-big-data-apache-spark/discussions/all\n\nPlease make sure to follow the guidelines before asking a question:\n\nhttps://github.com/IBM/skillsnetwork/wiki/FAQ#im-feeling-lost-and-confused-please-help-me\n\n\nIf running outside Watson Studio, this should work as well. In case you are running in an Apache Spark context outside Watson Studio, please remove the Apache Spark setup in the first notebook cells."
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "Waiting for a Spark session to start...\nSpark Initialization Done! ApplicationId = app-20201225172102-0000\nKERNEL_ID = 7c74e1b7-bb3c-4288-8dd7-aea43c42d7a8\n"
},
{
"data": {
"text/markdown": "# <span style=\"color:red\"><<<<<!!!!! It seems that you are running in a IBM Watson Studio Apache Spark Notebook. Please run it in an IBM Watson Studio Default Runtime (without Apache Spark) !!!!!>>>>></span>",
"text/plain": "<IPython.core.display.Markdown object>"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": "from IPython.display import Markdown, display\ndef printmd(string):\n display(Markdown('# <span style=\"color:red\">'+string+'</span>'))\n\n\nif ('sc' in locals() or 'sc' in globals()):\n printmd('<<<<<!!!!! It seems that you are running in a IBM Watson Studio Apache Spark Notebook. Please run it in an IBM Watson Studio Default Runtime (without Apache Spark) !!!!!>>>>>')\n"
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "Collecting pyspark==2.4.5\n Downloading pyspark-2.4.5.tar.gz (217.8 MB)\n\u001b[K |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 217.8 MB 61.5 MB/s eta 0:00:01\n\u001b[?25hCollecting py4j==0.10.7\n Downloading py4j-0.10.7-py2.py3-none-any.whl (197 kB)\n\u001b[K |\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 197 kB 49.6 MB/s eta 0:00:01\n\u001b[?25hBuilding wheels for collected packages: pyspark\n Building wheel for pyspark (setup.py) ... \u001b[?25ldone\n\u001b[?25h Created wheel for pyspark: filename=pyspark-2.4.5-py2.py3-none-any.whl size=218257927 sha256=7149c23ee868e1c9844b983eb24f3b87f73c4995a958ae11090b5745e2c472ba\n Stored in directory: /home/spark/shared/.cache/pip/wheels/01/c0/03/1c241c9c482b647d4d99412a98a5c7f87472728ad41ae55e1e\nSuccessfully built pyspark\n\u001b[31mERROR: sparktspy-nojars 2.0.5.0 has requirement pyspark==3.0.1, but you'll have pyspark 2.4.5 which is incompatible.\u001b[0m\nInstalling collected packages: py4j, pyspark\nSuccessfully installed py4j-0.10.7 pyspark-2.4.5\n"
}
],
"source": "!pip install pyspark==2.4.5"
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": "try:\n from pyspark import SparkContext, SparkConf\n from pyspark.sql import SparkSession\nexcept ImportError as e:\n printmd('<<<<<!!!!! Please restart your kernel after installing Apache Spark !!!!!>>>>>')"
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": "sc = SparkContext.getOrCreate(SparkConf().setMaster(\"local[*]\"))\n\nspark = SparkSession \\\n .builder \\\n .getOrCreate()"
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": "def assignment1(sc):\n rdd = sc.parallelize(list(range(100)))\n return rdd.count()"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "print(assignment1(sc))"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "!rm -f rklib.py\n!wget https://raw.githubusercontent.com/IBM/coursera/master/rklib.py"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "Please provide your email address and obtain a submission token on the grader\u2019s submission page in coursera, then execute the cell"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "from rklib import submit\nimport json\n\nkey = \"R1eDmiHNEei9kxIYdin0mA\"\npart = \"fnFg7\"\nemail =\"abhijeetkatore007@gmail.com\"###_YOUR_CODE_GOES_HERE_###\ntoken = \"xErqjaSG0DTfLN3m\" ###_YOUR_CODE_GOES_HERE_### #you can obtain it from the grader page on Coursera (have a look here if you need more information https://youtu.be/GcDo0Rwe06U?t=276)\n\n\nsubmit(email, token, key, part, [part], json.dumps(assignment1(sc)))"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": ""
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7 with Spark",
"language": "python3",
"name": "python37"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment