Set up Twitter Storm Development Environment on Mac OS X Lion

Twitter Storm is distributed message processing engine enabling large-scale continuous data processing in parallel. Storm's typical use fields include stream processing, continuous computation and distributed RPC.

A local, or pseudo-distributed, Storm environment simulates a cluster's interface on a single server. Rather than worrying about maintenance work of a real distributed system, you can develop and test your Storm program locally, then generate the package suitable for real cluster deployment. This tutorial shows you how to set up a local Storm environment on you machine. Hopefully you can write your own Storm program at the end of this post. For detailed information about Twitter Storm, please refer to its wiki page on github.

The rest of this post is organized as follows. Before you preceed, I hope you've read throught the short introduction about Storm on Twitter Engineering Blog) and got the Storm's terms such as stream, topology, et al.

  1. Get Twitter Storm.
  2. Set up development environment.
  3. Create a Storm project.
  4. Run the Storm program locally.

Get Twitter Storm

You can download Twitter Storm release from https://github.com/nathanmarz/storm/downloads. The lastest release is 0.7.1, it contains the binary executions and Java libraries necessary for development and deployment of Twitter Storm program. Download and extract the package to your home directory.

$ cd
$ wget https://github.com/downloads/nathanmarz/storm/storm-0.7.1.zip
$ unzip *.zip 
$ ln -s storm-0.7.1 storm

Add the following lines into .bash_profile so that Storm binary can be found.

# Twitter Storm
export PATH="$HOME/storm/bin:$PATH"

To make the new PATH variable take effect.

$ source ~/.bash_profile

Set up local development environment

In the step of "Get Twitter Storm", you've got all the things necessary for a local Storm environment. The binary storm is used to communicating with the remote cluster manaager, called Nimbus. As this tutorial will cover the local parts, we ignore storm here and focus on how we can use Storm's lib to write a program.

Twitter Storm supports a wide variety of programming languages. To make things simple, this tutorial uses Java as an example. As far as I know, you have at least two ways to use Storm's capabilities when developing and testing locally.

  • Add Storm's libraries (jar packages) into your Java program's class path or libray path (I am sure about the Java terms).
  • Or you can use Apache Maven, a software management and compression tool, to build your Storm project with declaration of Storm dependency.

In next session, I will show you how these two approaches work.

Create a new Storm project

storm-starter is Twitter demo project that helps you learn how to program in Twitter Storm framework. It contains three topologies:

  • ExclamationTopology: Basic topology written in all Java.
  • WordCountTopology: Basic topology that makes use of multilang by implementing one bolt in Python.
  • ReachTopology: Example of complex DRPC on top of Storm.

To be summerized, this tutorial will focus on the pure Java topology ExclamationTopology. We will run the ExclamationTopology topology in the two approaches seperately. That is,

  • Run ExclamationTopology project in Eclipse, where existing Storm libs in Storm release are used.
  • Build and run ExclamationTopology with Apache Maven.

Before you proceed, you need a copy of storm-starter source code. I aasume that you clone the project to ~/workspace/storm-starter.

$ cd ~/workspace
$ git clone https://github.com/nathanmarz/storm-starter.git

Run ExclamationTopology in Eclipse

  1. Create a Java project with Eclipse in ~/workspace/storm-starter. Remember to select src/jvm as the source file directory.
  2. Opne Eclipse's project property page for storm-sarter, then go to Java Build Path -> Libraries, click Add Library to start the wizard. Follow the wizard to create two libraries containing all your local Storm libs, that is, ~/storm/*.jar and ~/storm/lib/*.jar, namely storm and storm-dev seperately.
  3. Add the two newly created libraries, storm and storm-dev, into storm-starter's build path.
  4. Refresh the source directory in the navigation window, which will ivoke auto rebuild.
  5. Right click on file ExclamationTopology.java, select Run As -> Java Application. It should works.
  6. The demo will last 10 seconds then terminate.

Not that difficult, isn't it?

Build and run ExclamationTopology with Apache Maven

Maven is a choice other than Eclipse to build an run Storm programs. It uses a POM (Project Object Model) file to determine build dependency. Maven will download and build dependent libs on the fly, such as Storm lib in our project. It is highly recommended to use Maven to manage the development process of your Storm program. Luckily, Mac OS X installs Maven by default.

Storm wiki shows how to include Storm as build dependency in your project. n production scenario, Maven can be used to generate jar file which is ready to deploy in Storm clusters. storm-starter ships with a POM file m2-pom.xml, which can be used as referce when writing your POM.

It is very simple to build and run Storm programs with Maven. In this case, the development process does not require a local copy of Storm libs. However, as Maven will download or compile the dependent libs on your first build, you'd better take a coffee to kill the long waiting time.

  1. Build and run Build ExclamationTopology in local mode $ mvn -f m2-pom.xml compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.starter.ExclamationTopology
  2. Generate a jar file ready to deploy in cluster. The jar file will placed in target/ dir. $ mvn -f m2-pom.xml package

Reference

Comments !

blogroll

social