Skip to content

Instantly share code, notes, and snippets.

@isuhendro
Created March 8, 2020 14:18
Show Gist options
  • Select an option

  • Save isuhendro/00ca43d313fa4a52c66d56c9fec71312 to your computer and use it in GitHub Desktop.

Select an option

Save isuhendro/00ca43d313fa4a52c66d56c9fec71312 to your computer and use it in GitHub Desktop.
Failed at model.fit
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the second assignment for the Coursera course \"Advanced Machine Learning and Signal Processing\"\n",
"\n",
"\n",
"Just execute all cells one after the other and you are done - just note that in the last one you have to update your email address (the one you've used for coursera) and obtain a submission token, you get this from the programming assignment directly on coursera.\n",
"\n",
"Please fill in the sections labelled with \"###YOUR_CODE_GOES_HERE###\""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Waiting for a Spark session to start...\n",
"Spark Initialization Done! ApplicationId = app-20200308135741-0000\n",
"KERNEL_ID = c44da1e5-8cb5-4a81-97d0-ccf2a05e6c41\n",
"--2020-03-08 13:57:44-- https://github.com/IBM/coursera/raw/master/coursera_ml/a2.parquet\n",
"Resolving github.com (github.com)... 140.82.113.4\n",
"Connecting to github.com (github.com)|140.82.113.4|:443... connected.\n",
"HTTP request sent, awaiting response... 302 Found\n",
"Location: https://raw.githubusercontent.com/IBM/coursera/master/coursera_ml/a2.parquet [following]\n",
"--2020-03-08 13:57:44-- https://raw.githubusercontent.com/IBM/coursera/master/coursera_ml/a2.parquet\n",
"Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.8.133\n",
"Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.8.133|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 59032 (58K) [application/octet-stream]\n",
"Saving to: 'a2.parquet'\n",
"\n",
"100%[======================================>] 59,032 --.-K/s in 0.003s \n",
"\n",
"2020-03-08 13:57:44 (19.6 MB/s) - 'a2.parquet' saved [59032/59032]\n",
"\n"
]
}
],
"source": [
"!wget https://github.com/IBM/coursera/raw/master/coursera_ml/a2.parquet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now it’s time to have a look at the recorded sensor data. You should see data similar to the one exemplified below….\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"+-----+-----------+-------------------+-------------------+-------------------+\n",
"|CLASS| SENSORID| X| Y| Z|\n",
"+-----+-----------+-------------------+-------------------+-------------------+\n",
"| 0| 26| 380.66434005495194| -139.3470983812975|-247.93697521077704|\n",
"| 0| 29| 104.74324299209692| -32.27421440203938|-25.105013725863852|\n",
"| 0| 8589934658| 118.11469236129976| 45.916682927433534| -87.97203782706572|\n",
"| 0|34359738398| 246.55394030642543|-0.6122810693132044|-398.18662513951506|\n",
"| 0|17179869241|-190.32584900181487| 234.7849657520335|-206.34483804019288|\n",
"| 0|25769803830| 178.62396382387422| -47.07529438881511| 84.38310769821979|\n",
"| 0|25769803831| 85.03128805189493|-4.3024316644854546|-1.1841857567516714|\n",
"| 0|34359738411| 26.786262674736566| -46.33193951911338| 20.880756008396055|\n",
"| 0| 8589934592|-16.203752396859194| 51.080957032176954| -96.80526656416971|\n",
"| 0|25769803852| 47.2048142440404| -78.2950899652916| 181.99604091494786|\n",
"| 0|34359738369| 15.608872398939273| -79.90322809181754| 69.62150711098005|\n",
"| 0| 19|-4.8281721129789315| -67.38050508399905| 221.24876396496404|\n",
"| 0| 54| -98.40725712852762|-19.989364074314732| -302.695196085276|\n",
"| 0|17179869313| 22.835845394816594| 17.1633660118843| 32.877914832011385|\n",
"| 0|34359738454| 84.20178070080324| -32.81572075916947| -48.63517643958031|\n",
"| 0| 0| 56.54732521345129| -7.980106018032676| 95.05162719436447|\n",
"| 0|17179869201| -57.6008655247749| 5.135393798773895| 236.99158698947267|\n",
"| 0|17179869308| -65.59264738389012| -48.92660057215126| -61.58970715383383|\n",
"| 0|25769803790| 34.82337351291005| 9.483542084393937| 197.6066372962772|\n",
"| 0|25769803825| 39.80573823439121|-0.7955236412785212| -79.66652640650325|\n",
"+-----+-----------+-------------------+-------------------+-------------------+\n",
"only showing top 20 rows\n",
"\n"
]
}
],
"source": [
"df=spark.read.load('a2.parquet')\n",
"\n",
"df.createOrReplaceTempView(\"df\")\n",
"spark.sql(\"SELECT * from df\").show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please create a VectorAssembler which consumes columns X, Y and Z and produces a column “features”\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.ml.feature import VectorAssembler\n",
"vectorAssembler = VectorAssembler( inputCols=[\"X\",\"Y\",\"Z\"], outputCol=\"CLASS\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please instantiate a classifier from the SparkML package and assign it to the classifier variable. Make sure to either\n",
"1.\tRename the “CLASS” column to “label” or\n",
"2.\tSpecify the label-column correctly to be “CLASS”\n"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.ml.classification import LogisticRegression\n",
"\n",
"classifier = LogisticRegression(labelCol='CLASS', maxIter=10, regParam=0.3, elasticNetParam=0.8 )\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let’s train and evaluate…\n"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.ml import Pipeline\n",
"pipeline = Pipeline(stages=[vectorAssembler, classifier])"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"ename": "IllegalArgumentException",
"evalue": "'Output column CLASS already exists.'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mPy4JJavaError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/opt/ibm/spark/python/pyspark/sql/utils.py\u001b[0m in \u001b[0;36mdeco\u001b[0;34m(*a, **kw)\u001b[0m\n\u001b[1;32m 62\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 63\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 64\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mpy4j\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mprotocol\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mPy4JJavaError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/ibm/conda/miniconda3.6/lib/python3.6/site-packages/py4j/protocol.py\u001b[0m in \u001b[0;36mget_return_value\u001b[0;34m(answer, gateway_client, target_id, name)\u001b[0m\n\u001b[1;32m 327\u001b[0m \u001b[0;34m\"An error occurred while calling {0}{1}{2}.\\n\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 328\u001b[0;31m format(target_id, \".\", name), value)\n\u001b[0m\u001b[1;32m 329\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mPy4JJavaError\u001b[0m: An error occurred while calling o105.transform.\n: java.lang.IllegalArgumentException: Output column CLASS already exists.\n\tat org.apache.spark.ml.feature.VectorAssembler.transformSchema(VectorAssembler.scala:172)\n\tat org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:74)\n\tat org.apache.spark.ml.feature.VectorAssembler.transform(VectorAssembler.scala:86)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:819)\n",
"\nDuring handling of the above exception, another exception occurred:\n",
"\u001b[0;31mIllegalArgumentException\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-21-5a6ceed6e361>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpipeline\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/opt/ibm/spark/python/pyspark/ml/base.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, dataset, params)\u001b[0m\n\u001b[1;32m 130\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_fit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 132\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_fit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 133\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 134\u001b[0m raise ValueError(\"Params must be either a param map or a list/tuple of param maps, \"\n",
"\u001b[0;32m/opt/ibm/spark/python/pyspark/ml/pipeline.py\u001b[0m in \u001b[0;36m_fit\u001b[0;34m(self, dataset)\u001b[0m\n\u001b[1;32m 105\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstage\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mTransformer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 106\u001b[0m \u001b[0mtransformers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstage\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 107\u001b[0;31m \u001b[0mdataset\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstage\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtransform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 108\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# must be an Estimator\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 109\u001b[0m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mstage\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/ibm/spark/python/pyspark/ml/base.py\u001b[0m in \u001b[0;36mtransform\u001b[0;34m(self, dataset, params)\u001b[0m\n\u001b[1;32m 171\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_transform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 172\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 173\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_transform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 174\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 175\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Params must be a param map but got %s.\"\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparams\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/ibm/spark/python/pyspark/ml/wrapper.py\u001b[0m in \u001b[0;36m_transform\u001b[0;34m(self, dataset)\u001b[0m\n\u001b[1;32m 310\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_transform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdataset\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 311\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_transfer_params_to_java\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 312\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mDataFrame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_java_obj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtransform\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdataset\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_jdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdataset\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msql_ctx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 313\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 314\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/ibm/conda/miniconda3.6/lib/python3.6/site-packages/py4j/java_gateway.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m 1284\u001b[0m \u001b[0manswer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgateway_client\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend_command\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcommand\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1285\u001b[0m return_value = get_return_value(\n\u001b[0;32m-> 1286\u001b[0;31m answer, self.gateway_client, self.target_id, self.name)\n\u001b[0m\u001b[1;32m 1287\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1288\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mtemp_arg\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtemp_args\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/opt/ibm/spark/python/pyspark/sql/utils.py\u001b[0m in \u001b[0;36mdeco\u001b[0;34m(*a, **kw)\u001b[0m\n\u001b[1;32m 77\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mQueryExecutionException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m': '\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstackTrace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstartswith\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'java.lang.IllegalArgumentException: '\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 79\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mIllegalArgumentException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msplit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m': '\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstackTrace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 80\u001b[0m \u001b[0;32mraise\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 81\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mdeco\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mIllegalArgumentException\u001b[0m: 'Output column CLASS already exists.'"
]
}
],
"source": [
"model = pipeline.fit(df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"prediction = model.transform(df)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"prediction.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pyspark.ml.evaluation import MulticlassClassificationEvaluator\n",
"binEval = MulticlassClassificationEvaluator().setMetricName(\"accuracy\") .setPredictionCol(\"prediction\").setLabelCol(\"CLASS\")\n",
" \n",
"binEval.evaluate(prediction) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are happy with the result (I’m happy with > 0.55) please submit your solution to the grader by executing the following cells, please don’t forget to obtain an assignment submission token (secret) from the Coursera’s graders web page and paste it to the “secret” variable below, including your email address you’ve used for Coursera. (0.55 means that you are performing better than random guesses)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!rm -Rf a2_m2.json"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"prediction = prediction.repartition(1)\n",
"prediction.write.json('a2_m2.json')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!rm -f rklib.py\n",
"!wget https://raw.githubusercontent.com/IBM/coursera/master/rklib.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import zipfile\n",
"\n",
"def zipdir(path, ziph):\n",
" for root, dirs, files in os.walk(path):\n",
" for file in files:\n",
" ziph.write(os.path.join(root, file))\n",
"\n",
"zipf = zipfile.ZipFile('a2_m2.json.zip', 'w', zipfile.ZIP_DEFLATED)\n",
"zipdir('a2_m2.json', zipf)\n",
"zipf.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!base64 a2_m2.json.zip > a2_m2.json.zip.base64"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from rklib import submit\n",
"key = \"J3sDL2J8EeiaXhILFWw2-g\"\n",
"part = \"G4P6f\"\n",
"email = None###YOUR_CODE_GOES_HERE###\"\n",
"token = None###YOUR_CODE_GOES_HERE###\" # (have a look here if you need more information on how to obtain the token https://youtu.be/GcDo0Rwe06U?t=276)\n",
"\n",
"with open('a2_m2.json.zip.base64', 'r') as myfile:\n",
" data=myfile.read()\n",
"submit(email, token, key, part, [part], data)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.6 with Spark",
"language": "python3",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment