Skip to content

Instantly share code, notes, and snippets.

@ssimeonov
Last active April 16, 2019 08:25
Show Gist options
  • Select an option

  • Save ssimeonov/72c8a9b01f99e35ba470 to your computer and use it in GitHub Desktop.

Select an option

Save ssimeonov/72c8a9b01f99e35ba470 to your computer and use it in GitHub Desktop.

Revisions

  1. ssimeonov revised this gist Jul 22, 2015. 1 changed file with 3 additions and 0 deletions.
    3 changes: 3 additions & 0 deletions a_shell_test.scala
    Original file line number Diff line number Diff line change
    @@ -1,3 +1,6 @@
    // This code is designed to be pasted in spark-shell in a *nix environment
    // On Windows, replace sys.env("HOME") with a directory of your choice

    import java.io.File
    import java.io.PrintWriter
    import org.apache.spark.sql.hive.HiveContext
  2. ssimeonov created this gist Jul 22, 2015.
    29 changes: 29 additions & 0 deletions a_shell_test.scala
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,29 @@
    import java.io.File
    import java.io.PrintWriter
    import org.apache.spark.sql.hive.HiveContext
    import org.apache.spark.sql.SaveMode
    import org.apache.spark.sql.SaveMode

    val ctx = sqlContext.asInstanceOf[HiveContext]
    import ctx.implicits._

    val json = """{"category" : "A", "num" : 5}"""
    val path = sys.env("HOME") + "/spark_sql_first.jsonlines"
    new PrintWriter(path) { write(json); close }
    ctx.read.json("file://" + path).registerTempTable("test_first")

    // OK, proof that the data was loaded correctly
    ctx.sql("select * from test_first").show

    // org.apache.spark.sql.AnalysisException: expression 'num' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.
    ctx.sql("select num from test_first group by category").show

    // ERROR RetryingHMSHandler: MetaException(message:NoSuchObjectException(message:Function default.first does not exist))
    // INFO FunctionRegistry: Unable to lookup UDF in metastore: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:NoSuchObjectException(message:Function default.first does not exist))
    // java.lang.RuntimeException: Couldn't find function first
    ctx.sql("select first(num) from test_first group by category").show

    // OK
    ctx.sql("select first_value(num) from test_first group by category").show

    new File(path).delete()
    1,907 changes: 1,907 additions & 0 deletions shell_output.txt
    1,907 additions, 0 deletions not shown because the diff is too large. Please use a local Git client to view these changes.