淘先锋技术网

首页 1 2 3 4 5 6 7

A:安装hadoop和HBase

参考:http://blog.csdn.net/wind520/article/details/39856353

B:安装Hive

 1:下载:wget http://mirrors.hust.edu.cn/apache/hive/stable/apache-hive-0.13.1-bin.tar.gz

 2:解压:[jifeng@feng02 ~]$ tar zxf apache-hive-0.13.1-bin.tar.gz 

 3:改目录:[jifeng@feng02 ~]$ mv apache-hive-0.13.1-bin hive

 4:配置

修改conf目录下的文件

[jifeng@feng02 ~]$ cd hive
[jifeng@feng02 hive]$ ls
bin  conf  examples  hcatalog  lib  LICENSE  NOTICE  README.txt  RELEASE_NOTES.txt  scripts
[jifeng@feng02 hive]$ cd conf
[jifeng@feng02 conf]$ ls
hive-default.xml.template  hive-exec-log4j.properties.template
hive-env.sh.template       hive-log4j.properties.template
[jifeng@feng02 conf]$ cp hive-env.sh.template  hive-env.sh
[jifeng@feng02 conf]$ cp hive-default.xml.template  hive-site.xml  
[jifeng@feng02 conf]$ ls
hive-default.xml.template  hive-env.sh.template                 hive-log4j.properties.template
hive-env.sh                hive-exec-log4j.properties.template  hive-site.xml
[jifeng@feng02 conf]$ 
修改bin目录下的文件hive-config.sh

[jifeng@feng02 bin]$ vi hive-config.sh   

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# processes --config option from command line
#

this="$0"
while [ -h "$this" ]; do
  ls=`ls -ld "$this"`
  link=`expr "$ls" : '.*-> \(.*\)$'`
  if expr "$link" : '.*/.*' > /dev/null; then
    this="$link"
  else
    this=`dirname "$this"`/"$link"
  fi
done

# convert relative path to absolute path
bin=`dirname "$this"`
script=`basename "$this"`
bin=`cd "$bin"; pwd`
this="$bin/$script"

# the root of the Hive installation
if [[ -z $HIVE_HOME ]] ; then
  export HIVE_HOME=`dirname "$bin"`
fi

#check to see if the conf dir is given as an optional argument
while [ $# -gt 0 ]; do    # Until you run out of parameters . . .
  case "$1" in
    --config)
        shift
        confdir=$1
        shift
        HIVE_CONF_DIR=$confdir
        ;;
    --auxpath)
        shift
        HIVE_AUX_JARS_PATH=$1
        shift
        ;;
    *)
        break;
        ;;
  esac
done


# Allow alternate conf dir location.
HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf}"

export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

# Default to use 256MB
export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-256}
export JAVA_HOME=$HOME/jdk1.7.0_45
export HIVE_HOME=$HOME/hive
export HADOOP_HOME=$HOME/hadoop/hadoop-2.4.1
"hive-config.sh" 73L, 2011C 已写入   

最后新加三行
export JAVA_HOME=$HOME/jdk1.7.0_45  
export HIVE_HOME=$HOME/hive 
export HADOOP_HOME=$HOME/hadoop/hadoop-2.4.1  

配置mysql,修改$HIVE_HOME/conf/hive-site.xml 

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://jifengsql:3306/hive?createDatabaseIfNotExist=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.PersistenceManagerFactoryClass</name>
  <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value>
  <description>class implementing the jdo persistence</description>
</property>

<property>
  <name>javax.jdo.option.DetachAllOnCommit</name>
  <value>true</value>
  <description>detaches all objects from session so that they can be used after transaction is committed</
description>
</property>

<property>
  <name>javax.jdo.option.NonTransactionalRead</name>
  <value>true</value>
  <description>reads outside of transactions</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>dss</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>jifeng</value>
  <description>password to use against metastore database</description>
</property>

下载 wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.32.tar.gz  

copy mysql-connector-java-5.1.32-bin.jar到$HIVE_HOME/lib

5:启动Hive

[jifeng@feng02 hive]$ bin/hive
Logging initialized using configuration in jar:file:/home/jifeng/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show tables;  
OK
Time taken: 0.723 seconds
hive> 

C: 整合配置

首先要确保<HIVE_HOME>/lib 下HBase的jar包的版本要和实际环境中HBase的版本一致,需要用<HBASE_HOME>/lib/目录下得jar包:

[jifeng@feng02 lib]$ find -name "htr*jar" 
./htrace-core-2.04.jar
[jifeng@feng02 lib]$ find -name "hbase*jar"   
./hbase-server-0.98.6.1-hadoop2.jar
./hbase-client-0.98.6.1-hadoop2.jar
./hbase-it-0.98.6.1-hadoop2-tests.jar
./hbase-common-0.98.6.1-hadoop2.jar
./hbase-it-0.98.6.1-hadoop2.jar
./hbase-common-0.98.6.1-hadoop2-tests.jar
./hbase-protocol-0.98.6.1-hadoop2.jar
copy这些文件到 /home/jifeng/hive/lib目录

D:测试验证

测试前先依次启动Hadoop、Hbase

参考:https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveHBaseIntegration
启动hive

命令:bin/hive --auxpath ./lib/hive-hbase-handler-0.13.1.jar,./lib/hbase-server-0.98.6.1-hadoop2.jar,./lib/zookeeper-3.4.5.jar,./lib/guava-11.0.2.jar --hiveconf hbase.master=feng01:60000

在Hive中创建HBase关联的表
CREATE TABLE hbase_table_1(key int, value string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz1");

在Hive shell执行过程:

[jifeng@feng02 hive]$ bin/hive --auxpath ./lib/hive-hbase-handler-0.13.1.jar,./lib/hbase-server-0.98.6.1-hadoop2.jar,./lib/zookeeper-3.4.5.jar,./lib/guava-11.0.2.jar --hiveconf hbase.master=feng01:60000
14/10/08 15:59:20 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/home/jifeng/hive/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> CREATE TABLE hbase_table_1(key int, value string) 
    > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    > TBLPROPERTIES ("hbase.table.name" = "xyz1");
OK
Time taken: 2.606 seconds
hive> desc hbase_table_1;
OK
key                     int                     from deserializer   
value                   string                  from deserializer   
Time taken: 0.269 seconds, Fetched: 2 row(s)
在hbase shell中查询:

hbase(main):004:0> list
TABLE                                                                                                             
xyz                                                                                                               
xyz1                                                                                                              
2 row(s) in 0.0260 seconds

=> ["xyz", "xyz1"]
hbase(main):005:0> desc "xyz1"
DESCRIPTION                                                               ENABLED                                 
 'xyz1', {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'R true                                    
 OW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', M                                         
 IN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLO                                         
 CKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                   
1 row(s) in 1.1600 seconds

hbase(main):006:0> scan 'xyz1'
ROW                           COLUMN+CELL                                                                         
0 row(s) in 0.0510 seconds
在hbase中插入数据:

hbase(main):007:0> put 'xyz1','99','cf1:val','test.micmiu.com'
0 row(s) in 0.0770 seconds

hbase(main):008:0> scan 'xyz1'
ROW                           COLUMN+CELL                                                                         
 99                           column=cf1:val, timestamp=1412756927628, value=test.micmiu.com                      
1 row(s) in 0.0160 seconds
在hive中查询

hive> select * from hbase_table_1;                                      
OK
99      test.micmiu.com
Time taken: 0.13 seconds, Fetched: 1 row(s)
hive>