Set up Pseudo-distributed HBase Environment on Mac OS X Lion

"HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java." Inheriting BigTable's scalable genes, HBase is adopted by many IT giants to store structured data ranging from TB to PB across a large number of nodes.

This tutorial will show you how to set up a pseudo distributed HBase instance running on your Mac. Pseudo distributed means the HBase instance relies on HDFS as file system. So you should read my previous post entitled "Set Up Pseudo-distributed Hadoop Environment on Mac OS X Lion" and set up a pseudo distributed Hadoop environment before you proceed. Though only verified on Mac OS X Lion, this tutorial should work on most UNIX-like OS with slight modifications.

The rest of this tutorial is organized as follows.

  1. Get HBase
  2. Configure HBase
  3. Start HDFS and HBase
  4. Work with HBase shell
  5. Stop HBase

Get HBase

You can get stable version of HBase from Beijing Jiaotong University Mirror. The current stable version is hbase-0.92.

$ cd /tmp/
$ wget http://mirror.bjtu.edu.cn/apache/hbase/stable/hbase-0.92.1.tar.gz
$ wget http://mirror.bjtu.edu.cn/apache/hbase/stable/hbase-0.92.1.mds
$ md5 *.tar.gz; cat *.mds
# Check MD5 in mds

For convenience, I recommend running HBase with your normal account. So please extract the hbase package into your home directory and make a symbolic link.

$ cd
$ tar xzvpf /tmp/*.tar.gz
$ ln -s hbase-0.92.1 hbase$ tar xzvpf /tmp/*.tar.gz
$ ln -s hbase-0.92.1 hbase

Add following lines into ~/.bash_profile

# HBase bin
export PATH="$HOME/hbase/bin:$PATH"
alias hshell='hbase shell'

Update $PATH

$ source ~/.bash_profile

Configure HBase

hbase-env.sh: Configure Java environment for HBase

Similar to haddoop-env.sh in Hadoop, please specify JAVA_HOME for HBase. We use /usr/libexec/java_home to determine Mac's JAVA_HOME one the fly.

# File: hbase/conf/hbase-env.sh
export JAVA_HOME=$(/usr/libexec/java_home)

As stated in [Hadoop-7489 bug]((https://issues.apache.org/jira/browse/HADOOP-7489), modify HBASE_OPTS to avoid interrupting by SCDynamicStore.

# File: hbase/cong/hbase-env.sh
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk" # For Mac only

hbase-site.xml: Configure storage for HBase

HBase can use either local file system or HDFS as the backend storage system. In HBase Quick Start, local file system is used in HBase standalone mode. However, I found that this local-file-system approach doesn't work when HBase coexists with Hadopo on my Mac. I submitted an issue HBASE-5852 and haven't got an answer till now. On the other hand, the HDFS approach for HBase is equally easy to configure, and the point is, it works well.

To use HDFS as HBase's backend storage system, you need to specify HDFS NameNode address in ~/hbase/config/hbase-site.xml. Please refer core-site.xml in Hadoop for NameNode information.

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://localhost:9000/hbase</value>
    <description>Check hadoop/conf/core-site.xml in hadoop for NameNode's port number.</description>
  </property>
</configuration>

Start HDFS and HBase

Pseudo distributed HBase relies on HDFS service which is provided by Hadoop. Start Hadoop first, then start HBase.

$ start-all.sh
$ start-hbase.sh

Check all the Java processes. Please notice HMaster, the master node started by HBase.

$ jps
15185 TaskTracker
15306 HMaster
14693 DataNode
14903 SecondaryNameNode
17816 Jps
14976 JobTracker
14484 NameNode

Work with HBase shell

Now you've got a working HBase instance. Get into HBase Shell and rock your HBase.

$ hbase shell
hbase(main)>

help: Get help

Note, the quotes are necessary.

hbase> help
hbase> help "COMMAND"

status: Show HBase status

hbase> status

list: List all tables

hbase> list

create: Create a table

Create a table "tbtest" with a initial column family named "property" in HBase Shell.

hbase> create 'tbtest', 'property'

put: Insert a cell into a table

If no timestamp is given in command, HBase will append the current time as the cell's timestamps.

hbase> put 'tbtest', 'row1', 'property:color', 'red' 
hbase> put 'tbtest', 'row2', 'property:lang', 'en_us'
hbase> put 'tbtest', 'row3', 'property:name', 'avatar'

scan: Scan Tables

hbase> scan 'tbtest'
ROW     COLUMN+CELL
row1        column=property:color, timestamp=1335248257807, value=red
row2        column=property:lang, timestamp=1335248277348, value=en_us
row3        olumn=property:name, timestamp=1335248282001, value=avatar 
3 row(s) in 0.0140 seconds

get: Get a single row

hbase> get 'tbtest', 'row1'
COLUMN          CELL
property:color      timestamp=1335248257807, value=red
1 row(s) in 0.0340 seconds

exit: Exit HBase Shell

hbase> exit

Stop HBase

$ stop-hbase.sh

Trouble Shooting

Wait too long when starting or stopping HBase

Try to run start-hbase.sh or stop-hbase.sh in a standalone shell, not in tmux or screen.

SLF4J complains about multiple SLF4J bindings

Ignore it. It doesn't matter.

Can not write into HBase

That's because HDFS takes too long time to recover and is in safemode. Well, I can safely leave safemode by

$ hadoop dfsadmin -safemode leave
$ hadoop fsck /

Reference

Comments !

blogroll

social