"HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java." Inheriting BigTable's scalable genes, HBase is adopted by many IT giants to store structured data ranging from TB to PB across a large number of nodes.
This tutorial will show you how to set up a pseudo distributed HBase instance running on your Mac. Pseudo distributed means the HBase instance relies on HDFS as file system. So you should read my previous post entitled "Set Up Pseudo-distributed Hadoop Environment on Mac OS X Lion" and set up a pseudo distributed Hadoop environment before you proceed. Though only verified on Mac OS X Lion, this tutorial should work on most UNIX-like OS with slight modifications.
The rest of this tutorial is organized as follows.
- Get HBase
- Configure HBase
- Start HDFS and HBase
- Work with HBase shell
- Stop HBase
You can get stable version of HBase from Beijing Jiaotong University Mirror. The current stable version is hbase-0.92.
$ cd /tmp/
$ wget http://mirror.bjtu.edu.cn/apache/hbase/stable/hbase-0.92.1.tar.gz
$ wget http://mirror.bjtu.edu.cn/apache/hbase/stable/hbase-0.92.1.mds
$ md5 *.tar.gz; cat *.mds
# Check MD5 in mds
For convenience, I recommend running HBase with your normal account. So please extract the hbase package into your home directory and make a symbolic link.
$ tar xzvpf /tmp/*.tar.gz
$ ln -s hbase-0.92.1 hbase$ tar xzvpf /tmp/*.tar.gz
$ ln -s hbase-0.92.1 hbase
Add following lines into
# HBase bin
alias hshell='hbase shell'
hbase-env.sh: Configure Java environment for HBase
haddoop-env.sh in Hadoop, please specify
JAVA_HOME for HBase. We use
/usr/libexec/java_home to determine Mac's
JAVA_HOME one the fly.
# File: hbase/conf/hbase-env.sh
As stated in [Hadoop-7489 bug]((https://issues.apache.org/jira/browse/HADOOP-7489), modify
HBASE_OPTS to avoid interrupting by SCDynamicStore.
# File: hbase/cong/hbase-env.sh
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk" # For Mac only
hbase-site.xml: Configure storage for HBase
HBase can use either local file system or HDFS as the backend storage system. In HBase Quick Start, local file system is used in HBase standalone mode. However, I found that this local-file-system approach doesn't work when HBase coexists with Hadopo on my Mac. I submitted an issue HBASE-5852 and haven't got an answer till now. On the other hand, the HDFS approach for HBase is equally easy to configure, and the point is, it works well.
To use HDFS as HBase's backend storage system, you need to specify HDFS NameNode address in
~/hbase/config/hbase-site.xml. Please refer
core-site.xml in Hadoop for NameNode information.
<description>Check hadoop/conf/core-site.xml in hadoop for NameNode's port number.</description>
Start HDFS and HBase
Pseudo distributed HBase relies on HDFS service which is provided by Hadoop. Start Hadoop first, then start HBase.
Check all the Java processes. Please notice
HMaster, the master node started by HBase.
Work with HBase shell
Now you've got a working HBase instance. Get into HBase Shell and rock your HBase.
$ hbase shell
help: Get help
Note, the quotes are necessary.
hbase> help "COMMAND"
status: Show HBase status
list: List all tables
create: Create a table
Create a table "tbtest" with a initial column family named "property" in HBase Shell.
hbase> create 'tbtest', 'property'
put: Insert a cell into a table
timestamp is given in command, HBase will append the current time as the cell's timestamps.
hbase> put 'tbtest', 'row1', 'property:color', 'red'
hbase> put 'tbtest', 'row2', 'property:lang', 'en_us'
hbase> put 'tbtest', 'row3', 'property:name', 'avatar'
scan: Scan Tables
hbase> scan 'tbtest'
row1 column=property:color, timestamp=1335248257807, value=red
row2 column=property:lang, timestamp=1335248277348, value=en_us
row3 olumn=property:name, timestamp=1335248282001, value=avatar
3 row(s) in 0.0140 seconds
get: Get a single row
hbase> get 'tbtest', 'row1'
property:color timestamp=1335248257807, value=red
1 row(s) in 0.0340 seconds
exit: Exit HBase Shell
Wait too long when starting or stopping HBase
Try to run
stop-hbase.sh in a standalone shell, not in tmux or screen.
SLF4J complains about multiple SLF4J bindings
Ignore it. It doesn't matter.
Can not write into HBase
That's because HDFS takes too long time to recover and is
in safemode. Well, I can safely leave safemode by
$ hadoop dfsadmin -safemode leave
$ hadoop fsck /
There are comments.