Created
January 6, 2017 15:59
-
-
Save tegansnyder/3abdc68679a259f868e026a7a619dcfd to your computer and use it in GitHub Desktop.
Revisions
-
tegansnyder created this gist
Jan 6, 2017 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,137 @@ I'm recieving a strange error on a new install of Spark. I have setup a small 3 node spark cluster on top of an existing hadoop instance. The error I get is the same for any command I try to run on pyspark shell I get the following error: ```bash Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/spark/python/pyspark/rdd.py", line 1041, in count return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/opt/spark/python/pyspark/rdd.py", line 1032, in sum return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add) File "/opt/spark/python/pyspark/rdd.py", line 906, in fold vals = self.mapPartitions(func).collect() File "/opt/spark/python/pyspark/rdd.py", line 809, in collect port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/opt/spark/python/pyspark/rdd.py", line 2439, in _jrdd self._jrdd_deserializer, profiler) File "/opt/spark/python/pyspark/rdd.py", line 2374, in _wrap_function sc.pythonVer, broadcast_vars, sc._javaAccumulator) File "/usr/local/lib/python3.5/site-packages/py4j/java_gateway.py", line 1414, in __call__ answer, self._gateway_client, None, self._fqn) File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/local/lib/python3.5/site-packages/py4j/protocol.py", line 324, in get_return_value format(target_id, ".", name, value)) py4j.protocol.Py4JError: An error occurred while calling None.org.apache.spark.api.python.PythonFunction. Trace: py4j.Py4JException: Constructor org.apache.spark.api.python.PythonFunction([class [B, class java.util.HashMap, class java.util.ArrayList, class java.lang.String, class java.lang.String, class java.util.ArrayList, class org.apache.spark.api.python.PythonAccumulatorV2]) does not exist at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:179) at py4j.reflection.ReflectionEngine.getConstructor(ReflectionEngine.java:196) at py4j.Gateway.invoke(Gateway.java:235) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) ``` It appears the pyspark is unable to find the class `org.apache.spark.api.python.PythonFunction`. If I'm reading the code correctly pyspark uses py4j to connect to an existing JVM, in this case I'm guessing there is a Scala file it is trying to gain access to, but it fails. Any ideas? In an effort to understand what calls are being made by py4j to java I manually added some debugging calls to: `py4j/java_gateway.py` ```python command = proto.CONSTRUCTOR_COMMAND_NAME +\ self._command_header +\ args_command +\ proto.END_COMMAND_PART print(" ") print("proto.CONSTRUCTOR_COMMAND_NAME") print("%s", proto.CONSTRUCTOR_COMMAND_NAME) print("self._command_header") print("%s", self._command_header) print("args_command") print("%s", args_command) print("proto.END_COMMAND_PART") print("%s", proto.END_COMMAND_PART) print(" ") print(" ") ``` When I run pyspark shell after adding the debug prints above this is the ouput I get on a simple command: ```bash Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.0.2 /_/ Using Python version 3.5.2 (default, Dec 5 2016 08:51:55) SparkSession available as 'spark'. >>> sc.parallelize(range(1000)).count() proto.CONSTRUCTOR_COMMAND_NAME %s i self._command_header %s java.util.HashMap args_command %s proto.END_COMMAND_PART %s e proto.CONSTRUCTOR_COMMAND_NAME %s i self._command_header %s java.util.ArrayList args_command %s proto.END_COMMAND_PART %s e proto.CONSTRUCTOR_COMMAND_NAME %s i self._command_header %s java.util.ArrayList args_command %s proto.END_COMMAND_PART %s e proto.CONSTRUCTOR_COMMAND_NAME %s i self._command_header %s org.apache.spark.api.python.PythonFunction args_command %s jgAIoY3B5c3BhcmsuY2xvdWRwaWNrbGUKX2ZpbGxfZnVuY3Rpb24KcQAoY3B5c3BhcmsuY2xvdWRwaWNrbGUKX21ha2Vfc2tlbF9mdW5jCnEBY3B5c3BhcmsuY2xvdWRwaWNrbGUKX2J1aWx0aW5fdHlwZQpxAlgIAAAAQ29kZVR5cGVxA4VxBFJxBShLAksASwJLBUsTY19jb2RlY3MKZW5jb2RlCnEGWBoAAADCiAAAfAAAwogBAHwAAHwBAMKDAgDCgwIAU3EHWAYAAABsYXRpbjFxCIZxCVJxCk6FcQspWAUAAABzcGxpdHEMWAgAAABpdGVyYXRvcnENhnEOWCAAAAAvb3B0L3NwYXJrL3B5dGhvbi9weXNwYXJrL3JkZC5weXEPWA0AAABwaXBlbGluZV9mdW5jcRBNZgloBlgCAAAAAAFxEWgIhnESUnETWAQAAABmdW5jcRRYCQAAAHByZXZfZnVuY3EVhnEWKXRxF1JxGF1xGShoAChoAWgFKEsCSwBLAksCSxNoBlgMAAAAwogAAHwBAMKDAQBTcRpoCIZxG1JxHE6FcR0pWAEAAABzcR5oDYZxH2gPaBRNWQFoBlgCAAAAAAFxIGgIhnEhUnEiWAEAAABmcSOFcSQpdHElUnEmXXEnaAAoaAFoBShLAUsASwNLBEszaAZYMgAAAMKIAQB9AQB4HQB8AABEXRUAfQIAwogAAHwBAHwCAMKDAgB9AQBxDQBXfAEAVgFkAABTcShoCIZxKVJxKk6FcSspaA1YAwAAAGFjY3EsWAMAAABvYmpxLYdxLmgPaBRNggNoBlgIAAAAAAEGAQ0BEwFxL2gIhnEwUnExWAIAAABvcHEyWAkAAAB6ZXJvVmFsdWVxM4ZxNCl0cTVScTZdcTcoY19vcGVyYXRvcgphZGQKcThLAGV9cTmHcTpScTt9cTxOfXE9WAsAAABweXNwYXJrLnJkZHE+dFJhaDmHcT9ScUB9cUFOfXFCaD50UmgAKGgBaBhdcUMoaAAoaAFoJl1xRGgAKGgBaAUoSwFLAEsBSwJLU2gGWA4AAAB0AAB8AADCgwEAZwEAU3FFaAiGcUZScUdOhXFIWAMAAABzdW1xSYVxSlgBAAAAeHFLhXFMaA9YCAAAADxsYW1iZGE+cU1NCARjX19idWlsdGluX18KYnl0ZXMKcU4pUnFPKSl0cVBScVFdcVJoOYdxU1JxVH1xVU59cVZoPnRSYWg5h3FXUnFYfXFZTn1xWmg+dFJoAChoAWgYXXFbKGgAKGgBaCZdcVxoAChoAWgFKEsBSwBLAUsDS1NoBlgdAAAAdAAAZAEAZAIAwoQAAHwAAETCgwEAwoMBAGcBAFNxXWgIhnFeUnFfTmgFKEsBSwBLAksCS3NoBlgVAAAAfAAAXQsAfQEAZAAAVgFxAwBkAQBTcWBoCIZxYVJxYksBToZxYylYAgAAAC4wcWRYAQAAAF9xZYZxZmgPWAkAAAA8Z2VuZXhwcj5xZ00RBGgGWAIAAAAGAHFoaAiGcWlScWopKXRxa1JxbFguAAAAUkRELmNvdW50Ljxsb2NhbHM+LjxsYW1iZGE+Ljxsb2NhbHM+LjxnZW5leHByPnFth3FuaEmFcW9YAQAAAGlxcIVxcWgPaE1NEQRoTykpdHFyUnFzXXF0aDmHcXVScXZ9cXdOfXF4aD50UmFoOYdxeVJxen1xe059cXxoPnRSaAAoaAFoBShLAksASwJLBUsTaAZYJgAAAHQAAMKIAAB8AADCgwEAwogAAHwAAGQBABfCgwEAwogBAMKDAwBTcX1oCIZxflJxf05LAYZxgFgGAAAAeHJhbmdlcYGFcYJoDGgNhnGDWCQAAAAvb3B0L3NwYXJrL3B5dGhvbi9weXNwYXJrL2NvbnRleHQucHlxhGgjTcIBaAZYAgAAAAABcYVoCIZxhlJxh1gIAAAAZ2V0U3RhcnRxiFgEAAAAc3RlcHGJhnGKKXRxi1JxjF1xjShoAChoAWgFKEsBSwBLAUsESxNoBlgfAAAAwogCAHQAAHwAAMKIAQAUwogAABvCgwEAwogDABQXU3GOaAiGcY9ScZBOhXGRWAMAAABpbnRxkoVxk2gMhXGUaIRoiE2/AWgGWAIAAAAAAXGVaAiGcZZScZcoWAkAAABudW1TbGljZXNxmFgEAAAAc2l6ZXGZWAYAAABzdGFydDBxmmiJdHGbKXRxnFJxnV1xnihLAk3oA0sASwFlfXGfh3GgUnGhfXGiTn1xo1gPAAAAcHlzcGFyay5jb250ZXh0caR0UksBZWifh3GlUnGmfXGnaIFjX19idWlsdGluX18KeHJhbmdlCnGoc059calopHRSZWg5h3GqUnGrfXGsTn1xrWg+dFJlaDmHca5Sca99cbBOfXGxaD50UmVoOYdxslJxs31xtE59cbVoPnRSTmNweXNwYXJrLnNlcmlhbGl6ZXJzCkJhdGNoZWRTZXJpYWxpemVyCnG2KYFxt31xuChYCQAAAGJhdGNoU2l6ZXG5SwFYCgAAAHNlcmlhbGl6ZXJxumNweXNwYXJrLnNlcmlhbGl6ZXJzClBpY2tsZVNlcmlhbGl6ZXIKcbspgXG8fXG9WBMAAABfb25seV93cml0ZV9zdHJpbmdzcb6Jc2J1YmNweXNwYXJrLnNlcmlhbGl6ZXJzCkF1dG9CYXRjaGVkU2VyaWFsaXplcgpxvymBccB9ccEoaLlLAGi6aLxYCAAAAGJlc3RTaXplccJKAAABAHVidHHDLg== ro24 ro25 s/usr/local/bin/python3.5 s3.5 ro26 ro14 proto.END_COMMAND_PART %s e # ERROR # Then the error from above prints here... ``` Any ideas?