淘先锋技术网

首页 1 2 3 4 5 6 7

Hive与Hbase的整合功能的实现是利用两者本身对外的API接口互相进行通信,相互通信主要是依靠hive_hbase-handler.jar工具类

一、将hbase 下相关的jar包拷贝到/home/centosm/hive/lib文件夹下面,如果已存在不同版本的则删除hive中的再复制上去。
具体操作步骤如下:

1、备份hive下的lib包:
zip -r lib.zip lib

2、将hbase相关jar包复制到hive/lib中

$ more hbase-c
hbase-client-.jar        hbase-common-.jar        hbase-common--tests.jar  
$ cp hbase-c* ../../hive/lib

 $ cp hbase-server-.jar ../../hive/lib
 $ cp hbase-protocol-.jar ../../hive/lib

注:如果存在多版本jar包,则移除hive/lib下不同版本的jar

3、修改 hive-site.xml

<property>
     <name>hive.aux.jars.path</name>
     <value>file:////home/centosm/hive/lib/hive-hbase-handler-2.1.0.jar,file:////home/centosm/hive/lib/hbase-client-1.2.4.jar,file:////home/centosm/hive/lib/hbase-common-1.2.4.ja,file:////home/centosm/hive/lib/hbase-server-1.2.4.jar,file:////home/centosm/hive/lib/hbase-protocol-1.2.4.jar,file:////home/centosm/hive/lib/zookeeper-3.4.6.jar</value>
</property>  

4、拷贝hbase/conf下的hbase-site.xml文件到所有hadoop节点(包括master)的hadoop/conf下。

配置完成,下面是进行测试:

hbase(main)::> create 'student1',{NAME => 'info',VERSIONS => }
 row(s) in  seconds

=> Hbase::Table - user1
hbase(main)::> desc 'user1'
Table user1 is ENABLED                                                                                                                                                                 
user1                                                                                                                                                                                  
COLUMN FAMILIES DESCRIPTION                                                                                                                                                            
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MI
N_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                               
 row(s) in  seconds

hbase(main)::> put 'user1','1','info:name','zhangsan'
 row(s) in  seconds

hbase(main)::> put 'user1','1','info:age','25'
 row(s) in  seconds

hbase(main)::> put 'user1','2','info:name','lisi'
 row(s) in  seconds

hbase(main)::> put 'user1','2','info:age','22'
 row(s) in  seconds

hbase(main)::> put 'user1','3','info:name','wangswu'
 row(s) in  seconds

hbase(main)::> put 'user1','3','info:age','21'
 row(s) in  seconds

hbase(main)::> scan 'user1'
ROW                                            COLUMN+CELL                                                                                                                             
                                              column=info:age, timestamp=, value=                                                                                      
                                              column=info:name, timestamp=, value=zhangsan                                                                               
                                              column=info:age, timestamp=, value=                                                                                      
                                              column=info:name, timestamp=, value=lisi                                                                                   
                                              column=info:age, timestamp=, value=                                                                                      
                                              column=info:name, timestamp=, value=wangswu                                                                                
 row(s) in  seconds

hbase(main)::> 
hive> 
    > CREATE EXTERNAL TABLE user1 (
    > rowkey string,
    > info map<STRING,STRING>
    > ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
    > TBLPROPERTIES ("hbase.table.name" = "user1");

OK
Time taken:  seconds
hive> show tables;
OK
test
user1
Time taken:  seconds, Fetched:  row(s)
hive> select * from user1;
OK
   {"age":"25","name":"zhangsan"}
   {"age":"22","name":"lisi"}
   {"age":"21","name":"wangswu"}
Time taken:  seconds, Fetched:  row(s)
hive> 

通过 Hive查询Hbase中现存的表
1、查看hbase中该表的结构

hbase(main)::> desc 'student'
Table student is ENABLED                                                                                                                                                                                         
student                                                                                                                                                                                                          
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                      
{NAME => 'course', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCK
CACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                 
{NAME => 'grade', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKC
ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                  
 row(s) in  seconds

hbase(main)::>  scan 'student'
ROW                                                   COLUMN+CELL                                                                                                                                                
 hehe                                                 column=course:English, timestamp=, value=                                                                                                   
 rowKey0                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey1                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey2                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey3                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey4                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey5                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey6                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey7                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey8                                              column=grade:sid, timestamp=, value=                                                                                                   
 rowKey9                                              column=grade:sid, timestamp=, value=                                                                                                   
 ycb                                                  column=course:English, timestamp=, value=                                                                                                   
 row(s) in  seconds

如上所述hbase的student表,根据上述可设计在hive中的建表语句以及其查询结果如下:


hive> CREATE EXTERNAL TABLE student(key string, English string,sid string)     
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'     
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,course:English,grade:sid")     
    > TBLPROPERTIES("hbase.table.name" = "student"); 
OK
Time taken:  seconds
==================================================
hive> desc student;
OK
key                     string                                      
english                 string                                      
sid                     string                                      
Time taken:  seconds, Fetched:  row(s)
===================================================
hive> select * from student;
OK
hehe          NULL
rowKey0 NULL    
rowKey1 NULL    
rowKey2 NULL    
rowKey3 NULL    
rowKey4 NULL    
rowKey5 NULL    
rowKey6 NULL    
rowKey7 NULL    
rowKey8 NULL    
rowKey9 NULL    
ycb           NULL
Time taken:  seconds, Fetched:  row(s)
hive>