Hive与Hbase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类
一、将hbase 下相关的jar包拷贝到/home/centosm/hive/lib文件夹下面,如果已存在不同版本的则删除hive中的再复制上去。
具体操作步骤如下:
1、备份hive下的lib包:
zip -r lib.zip lib
2、将hbase相关jar包复制到hive/lib中
$ more hbase-c
hbase-client-.jar hbase-common-.jar hbase-common--tests.jar
$ cp hbase-c* ../../hive/lib
$ cp hbase-server-.jar ../../hive/lib
$ cp hbase-protocol-.jar ../../hive/lib
注:如果存在多版本jar包,则移除hive/lib下不同版本的jar
3、修改 hive-site.xml
<property>
<name>hive.aux.jars.path</name>
<value>file:////home/centosm/hive/lib/hive-hbase-handler-2.1.0.jar,file:////home/centosm/hive/lib/hbase-client-1.2.4.jar,file:////home/centosm/hive/lib/hbase-common-1.2.4.ja,file:////home/centosm/hive/lib/hbase-server-1.2.4.jar,file:////home/centosm/hive/lib/hbase-protocol-1.2.4.jar,file:////home/centosm/hive/lib/zookeeper-3.4.6.jar</value>
</property>
4、拷贝hbase/conf下的hbase-site.xml文件到所有hadoop节点(包括master)的hadoop/conf下。
配置完成,下面是进行测试:
hbase(main)::> create 'student1',{NAME => 'info',VERSIONS => }
row(s) in seconds
=> Hbase::Table - user1
hbase(main)::> desc 'user1'
Table user1 is ENABLED
user1
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
row(s) in seconds
hbase(main)::> put 'user1','1','info:name','zhangsan'
row(s) in seconds
hbase(main)::> put 'user1','1','info:age','25'
row(s) in seconds
hbase(main)::> put 'user1','2','info:name','lisi'
row(s) in seconds
hbase(main)::> put 'user1','2','info:age','22'
row(s) in seconds
hbase(main)::> put 'user1','3','info:name','wangswu'
row(s) in seconds
hbase(main)::> put 'user1','3','info:age','21'
row(s) in seconds
hbase(main)::> scan 'user1'
ROW COLUMN+CELL
column=info:age, timestamp=, value=
column=info:name, timestamp=, value=zhangsan
column=info:age, timestamp=, value=
column=info:name, timestamp=, value=lisi
column=info:age, timestamp=, value=
column=info:name, timestamp=, value=wangswu
row(s) in seconds
hbase(main)::>
hive>
> CREATE EXTERNAL TABLE user1 (
> rowkey string,
> info map<STRING,STRING>
> ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
> TBLPROPERTIES ("hbase.table.name" = "user1");
OK
Time taken: seconds
hive> show tables;
OK
test
user1
Time taken: seconds, Fetched: row(s)
hive> select * from user1;
OK
{"age":"25","name":"zhangsan"}
{"age":"22","name":"lisi"}
{"age":"21","name":"wangswu"}
Time taken: seconds, Fetched: row(s)
hive>
通过 Hive查询Hbase中现存的表
1、查看hbase中该表的结构
hbase(main)::> desc 'student'
Table student is ENABLED
student
COLUMN FAMILIES DESCRIPTION
{NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCK
CACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'grade', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKC
ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
row(s) in seconds
hbase(main)::> scan 'student'
ROW COLUMN+CELL
hehe column=course:English, timestamp=, value=
rowKey0 column=grade:sid, timestamp=, value=
rowKey1 column=grade:sid, timestamp=, value=
rowKey2 column=grade:sid, timestamp=, value=
rowKey3 column=grade:sid, timestamp=, value=
rowKey4 column=grade:sid, timestamp=, value=
rowKey5 column=grade:sid, timestamp=, value=
rowKey6 column=grade:sid, timestamp=, value=
rowKey7 column=grade:sid, timestamp=, value=
rowKey8 column=grade:sid, timestamp=, value=
rowKey9 column=grade:sid, timestamp=, value=
ycb column=course:English, timestamp=, value=
row(s) in seconds
如上所述hbase的student表,根据上述可设计在hive中的建表语句以及其查询结果如下:
hive> CREATE EXTERNAL TABLE student(key string, English string,sid string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,course:English,grade:sid")
> TBLPROPERTIES("hbase.table.name" = "student");
OK
Time taken: seconds
==================================================
hive> desc student;
OK
key string
english string
sid string
Time taken: seconds, Fetched: row(s)
===================================================
hive> select * from student;
OK
hehe NULL
rowKey0 NULL
rowKey1 NULL
rowKey2 NULL
rowKey3 NULL
rowKey4 NULL
rowKey5 NULL
rowKey6 NULL
rowKey7 NULL
rowKey8 NULL
rowKey9 NULL
ycb NULL
Time taken: seconds, Fetched: row(s)
hive>