Skip to content

Instantly share code, notes, and snippets.

@skliew
Last active April 14, 2016 15:15
Show Gist options
  • Select an option

  • Save skliew/6582ca1920e8684bda14ab2a6e39c275 to your computer and use it in GitHub Desktop.

Select an option

Save skliew/6582ca1920e8684bda14ab2a6e39c275 to your computer and use it in GitHub Desktop.
Create Avro-backed Hive table (and write Avro files directly into its storage location)
#!/usr/bin/env perl
use strict;
use warnings;
use Avro::Schema;
use Avro::DataFileWriter;
my $value = shift || "1234";
my $fn = shift || "0";
my $schema = Avro::Schema->parse(<<EOP);
{
"namespace": "com.example",
"name": "schema",
"type": "record",
"fields": [ { "name":"string1","type":"string"}]
}
EOP
open my $fh, '>', $fn or die;
my $write_file = Avro::DataFileWriter->new(
fh => $fh,
writer_schema => $schema
);
$write_file->print({ string1 => $value });
$write_file->flush;
$write_file->close;
CREATE TABLE TEST
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/hive/test'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "com.example",
"name": "schema",
"type": "record",
"fields": [ { "name":"string1","type":"string"}]
}')
hive -f create_table.sql
perl create_avro_file.pl content1 file1
perl create_avro_file.pl content2 file1
hdfs dfs -put file1 /user/hive/test/
hdfs dfs -put file2 /user/hive/test/
hive -f select_all.sql
select * from test;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment